<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Welsh NLP | Fernando Alva-Manchego</title><link>https://feralvam.github.io/tags/welsh-nlp/</link><atom:link href="https://feralvam.github.io/tags/welsh-nlp/index.xml" rel="self" type="application/rss+xml"/><description>Welsh NLP</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 11 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://feralvam.github.io/media/icon_hu_ab250a83af8ff43c.png</url><title>Welsh NLP</title><link>https://feralvam.github.io/tags/welsh-nlp/</link></image><item><title>CEFR-Cymraeg: A Dataset and Baseline Models for Language Proficiency Assessment in Welsh</title><link>https://feralvam.github.io/publication/waqar-etal-2026-cefr-cymraeg/</link><pubDate>Mon, 11 May 2026 00:00:00 +0000</pubDate><guid>https://feralvam.github.io/publication/waqar-etal-2026-cefr-cymraeg/</guid><description/></item><item><title>Proffiliadur: Welsh Language Text Profiling Toolkit</title><link>https://feralvam.github.io/publication/gutierrezrolon-etal-2026-proffiliadur/</link><pubDate>Mon, 11 May 2026 00:00:00 +0000</pubDate><guid>https://feralvam.github.io/publication/gutierrezrolon-etal-2026-proffiliadur/</guid><description/></item><item><title>Translation is Not Enough (TINE): Plain Language Adaptation of Multilingual Science</title><link>https://feralvam.github.io/projects/tine/</link><pubDate>Fri, 01 May 2026 00:00:00 +0000</pubDate><guid>https://feralvam.github.io/projects/tine/</guid><description>&lt;p&gt;CHIST-ERA project developing multilingual NLP methods to make scientific documents genuinely accessible, combining translation, simplification, and terminology clarification across multiple languages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Funder:&lt;/strong&gt; CHIST-ERA / UKRI&lt;br&gt;
&lt;strong&gt;Period:&lt;/strong&gt; 2026 – 2029&lt;br&gt;
&lt;strong&gt;Role:&lt;/strong&gt; Principal Investigator (Cardiff University)&lt;br&gt;
&lt;strong&gt;Partners:&lt;/strong&gt; Cardiff University, Manchester Metropolitan University (UK); Universitat Pompeu Fabra (Spain); Institute of Computer Science, Polish Academy of Sciences (Poland); University of Zurich (Switzerland)&lt;br&gt;
&lt;strong&gt;Research theme:&lt;/strong&gt;
&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Scientific knowledge is publicly available, but is it accessible? Two barriers stand in the way. First, most research is published in English, excluding communities who speak other languages even when the research is about their own lives. Second, translation alone is not enough: even a translated text remains full of jargon and technical language that non-expert readers cannot understand.&lt;/p&gt;
&lt;p&gt;TINE addresses both barriers through a three-step pipeline applied to scientific documents in any language:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Understand&lt;/strong&gt; — extract document structure, text, headings, tables, and figures from complex PDFs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Translate&lt;/strong&gt; — produce accurate whole-document translation preserving context and terminology&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adapt&lt;/strong&gt; — simplify style, explain jargon, and fit the language to the reader&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The result is an accurate, accessible document in the target language that real people can read and act on.&lt;/p&gt;
&lt;h3 id="cardiffs-use-case-welsh-social-care-research"&gt;Cardiff&amp;rsquo;s use case: Welsh social care research&lt;/h3&gt;
&lt;p&gt;Welsh-speaking service users routinely receive research consent forms, information sheets, and questionnaires in English, full of technical language. They cannot meaningfully engage with research that directly concerns them, and cannot give fully informed consent. Cardiff&amp;rsquo;s work focuses on producing plain Welsh versions of these materials — not just translated, but genuinely understandable.&lt;/p&gt;
&lt;p&gt;This work is carried out in collaboration with
, the Centre for Social Care and Artificial Intelligence Learning, and feeds directly into
: plain Welsh research materials produced by TINE make evidence accessible to Welsh-speaking social workers and service users.&lt;/p&gt;
&lt;h3 id="what-tine-will-deliver"&gt;What TINE will deliver&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Open-source tools for document structure extraction, whole-document translation, and plain language adaptation&lt;/li&gt;
&lt;li&gt;Multilingual corpora of scientific documents and annotated plain language examples for training&lt;/li&gt;
&lt;li&gt;Language resources for Welsh, Polish, Catalan, and Chinese&lt;/li&gt;
&lt;li&gt;Open benchmarks for evaluation&lt;/li&gt;
&lt;li&gt;All outputs open access and freely reusable&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>NLP Tools for Welsh Language Assessment and Learning</title><link>https://feralvam.github.io/projects/welsh-gov-nlp/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://feralvam.github.io/projects/welsh-gov-nlp/</guid><description>&lt;p&gt;Welsh Government-funded project developing computational tools for Welsh text complexity analysis, CEFR proficiency assessment, and morphological analysis to support Welsh-language education.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Funder:&lt;/strong&gt; Welsh Government&lt;br&gt;
&lt;strong&gt;Period:&lt;/strong&gt; 2025 – 2026&lt;br&gt;
&lt;strong&gt;Role:&lt;/strong&gt; Principal Investigator&lt;br&gt;
&lt;strong&gt;Research theme:&lt;/strong&gt;
&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Welsh is spoken by approximately 900,000 people and has unique linguistic features, including initial consonant mutation, that pose significant challenges for standard NLP pipelines. This project develops the foundational NLP infrastructure for Welsh language assessment and learning, in partnership with Welsh-language educational institutions and the Welsh Government.&lt;/p&gt;
&lt;p&gt;Key outputs include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Proffiliadur&lt;/strong&gt;: an open-source toolkit computing 141 linguistic complexity indices for Welsh texts, supporting CEFR-level classification and accessibility analysis&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CEFR-Cymraeg&lt;/strong&gt;: the first CEFR-annotated proficiency dataset for Welsh (A1–B2), enabling automated language proficiency assessment for Welsh learners&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3 id="selected-publications"&gt;Selected Publications&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>