NLP Tools for Welsh Language Assessment and Learning
January 1, 2023
·
1 min read
Welsh Government-funded project developing computational tools for Welsh text complexity analysis, CEFR proficiency assessment, and morphological analysis to support Welsh-language education.
Funder: Welsh Government
Period: 2025 – 2026
Role: Principal Investigator
Research theme: Welsh Language Technology & Multilingual NLP
Welsh is spoken by approximately 900,000 people and has unique linguistic features, including initial consonant mutation, that pose significant challenges for standard NLP pipelines. This project develops the foundational NLP infrastructure for Welsh language assessment and learning, in partnership with Welsh-language educational institutions and the Welsh Government.
Key outputs include:
- Proffiliadur: an open-source toolkit computing 141 linguistic complexity indices for Welsh texts, supporting CEFR-level classification and accessibility analysis
- CEFR-Cymraeg: the first CEFR-annotated proficiency dataset for Welsh (A1–B2), enabling automated language proficiency assessment for Welsh learners
