NLP Tools for Welsh Language Assessment and Learning

January 1, 2023 · 1 min read
projects

Welsh Government-funded project developing computational tools for Welsh text complexity analysis, CEFR proficiency assessment, and morphological analysis to support Welsh-language education.

Funder: Welsh Government
Period: 2025 – 2026
Role: Principal Investigator
Research theme: Welsh Language Technology & Multilingual NLP


Welsh is spoken by approximately 900,000 people and has unique linguistic features, including initial consonant mutation, that pose significant challenges for standard NLP pipelines. This project develops the foundational NLP infrastructure for Welsh language assessment and learning, in partnership with Welsh-language educational institutions and the Welsh Government.

Key outputs include:

  • Proffiliadur: an open-source toolkit computing 141 linguistic complexity indices for Welsh texts, supporting CEFR-level classification and accessibility analysis
  • CEFR-Cymraeg: the first CEFR-annotated proficiency dataset for Welsh (A1–B2), enabling automated language proficiency assessment for Welsh learners

Selected Publications

Fernando Alva-Manchego
Authors
Lecturer in Natural Language Processing
My research interests include text simplification, readability assessment, multilingual NLP, Welsh language technology, and NLP for education and social care.