Proffiliadur: Welsh Language Text Profiling Toolkit

May 11, 2026·
Nicolás Gutiérrez-Rolón
,
Jonathan Davies
,
Tomos Williams
,
Dawn Knight
Fernando Alva-Manchego
Fernando Alva-Manchego
· 0 min read
Abstract
Text complexity analysis is crucial for developing educational tools, healthcare communications, and accessibility applications for Welsh-language resources. We present Proffiliadur, an open-source toolkit for automatic text complexity assessment of Welsh texts. It computes 141 linguistic indices across surface, lexical, morphological, and syntactic categories, incorporating Welsh-specific tokenization and handling phenomena such as initial consonant mutation. We demonstrate the toolkit’s utility through complexity classification experiments, where feature-based models achieve F1=0.94, comparable to fine-tuned transformer models at F1=0.97. Proffiliadur is the first comprehensive text profiling toolkit for Welsh and provides a foundation for downstream applications in education, healthcare, and communication accessibility.
Type
Publication
LREC 2026
publication
Fernando Alva-Manchego
Authors
Lecturer in Natural Language Processing
My research interests include text simplification, readability assessment, multilingual NLP, Welsh language technology, and NLP for education and social care.