NLP Tools for Welsh Language Assessment and Learning

Sun, 01 Jan 2023 00:00:00 +0000

Welsh Government-funded project developing computational tools for Welsh text complexity analysis, CEFR proficiency assessment, and morphological analysis to support Welsh-language education.

Funder: Welsh Government
Period: 2025 – 2026
Role: Principal Investigator
Research theme:

Welsh is spoken by approximately 900,000 people and has unique linguistic features, including initial consonant mutation, that pose significant challenges for standard NLP pipelines. This project develops the foundational NLP infrastructure for Welsh language assessment and learning, in partnership with Welsh-language educational institutions and the Welsh Government.

Key outputs include:

Proffiliadur: an open-source toolkit computing 141 linguistic complexity indices for Welsh texts, supporting CEFR-level classification and accessibility analysis
CEFR-Cymraeg: the first CEFR-annotated proficiency dataset for Welsh (A1–B2), enabling automated language proficiency assessment for Welsh learners

Low-Resource Languages | Fernando Alva-Manchego

NLP Tools for Welsh Language Assessment and Learning

Selected Publications