Proffiliadur: Welsh Language Text Profiling Toolkit
May 11, 2026·,,,
·
0 min read
Nicolás Gutiérrez-Rolón
Jonathan Davies
Tomos Williams
Dawn Knight
Fernando Alva-Manchego
Abstract
Text complexity analysis is crucial for developing educational tools, healthcare communications, and accessibility applications for Welsh-language resources. We present Proffiliadur, an open-source toolkit for automatic text complexity assessment of Welsh texts. It computes 141 linguistic indices across surface, lexical, morphological, and syntactic categories, incorporating Welsh-specific tokenization and handling phenomena such as initial consonant mutation. We demonstrate the toolkit’s utility through complexity classification experiments, where feature-based models achieve F1=0.94, comparable to fine-tuned transformer models at F1=0.97. Proffiliadur is the first comprehensive text profiling toolkit for Welsh and provides a foundation for downstream applications in education, healthcare, and communication accessibility.
Type
Publication
LREC 2026
