Projects | Fernando Alva-Manchego

Translation is Not Enough (TINE): Plain Language Adaptation of Multilingual Science

Fri, 01 May 2026 00:00:00 +0000

CHIST-ERA project developing multilingual NLP methods to make scientific documents genuinely accessible, combining translation, simplification, and terminology clarification across multiple languages.

Funder: CHIST-ERA / UKRI
Period: 2026 – 2029
Role: Principal Investigator (Cardiff University)
Partners: Cardiff University, Manchester Metropolitan University (UK); Universitat Pompeu Fabra (Spain); Institute of Computer Science, Polish Academy of Sciences (Poland); University of Zurich (Switzerland)
Research theme:

Scientific knowledge is publicly available, but is it accessible? Two barriers stand in the way. First, most research is published in English, excluding communities who speak other languages even when the research is about their own lives. Second, translation alone is not enough: even a translated text remains full of jargon and technical language that non-expert readers cannot understand.

TINE addresses both barriers through a three-step pipeline applied to scientific documents in any language:

Understand — extract document structure, text, headings, tables, and figures from complex PDFs
Translate — produce accurate whole-document translation preserving context and terminology
Adapt — simplify style, explain jargon, and fit the language to the reader

The result is an accurate, accessible document in the target language that real people can read and act on.

Welsh-speaking service users routinely receive research consent forms, information sheets, and questionnaires in English, full of technical language. They cannot meaningfully engage with research that directly concerns them, and cannot give fully informed consent. Cardiff’s work focuses on producing plain Welsh versions of these materials — not just translated, but genuinely understandable.

This work is carried out in collaboration with , the Centre for Social Care and Artificial Intelligence Learning, and feeds directly into : plain Welsh research materials produced by TINE make evidence accessible to Welsh-speaking social workers and service users.

What TINE will deliver

Open-source tools for document structure extraction, whole-document translation, and plain language adaptation
Multilingual corpora of scientific documents and annotated plain language examples for training
Language resources for Welsh, Polish, Catalan, and Chinese
Open benchmarks for evaluation
All outputs open access and freely reusable

SWEET: The Social Work Evidence Engagement Tool

Sun, 01 Feb 2026 00:00:00 +0000

NIHR-funded project developing an AI tool to help social workers find and apply relevant research evidence, legislation, and practice guidance in their day-to-day work.

Funder: National Institute for Health and Care Research (NIHR)
Period: 2026 – 2028
Role: Work Package Lead (system development, evaluation, and human-in-the-loop testing)
Partners: Cardiff University (Computer Science and Social Sciences), Neath Port Talbot Children’s Social Care
Research theme:

Social workers face significant practical barriers to using research evidence: time pressure, fragmented information sources, and the challenge of translating general findings into specific practice decisions. SWEET addresses this by developing a specialised AI tool for reliable retrieval and summarisation of relevant research, legislation, and practice guidance in response to social worker queries.

The project uses design ethnography to understand evidence needs in real practice settings, working closely with Neath Port Talbot Children’s Social Care. At Cardiff Computer Science, the focus is on NLP system development, evaluation, and the design of human-in-the-loop testing protocols to ensure the tool supports rather than replaces professional judgement.

NLP Tools for Welsh Language Assessment and Learning

Sun, 01 Jan 2023 00:00:00 +0000

Welsh Government-funded project developing computational tools for Welsh text complexity analysis, CEFR proficiency assessment, and morphological analysis to support Welsh-language education.

Funder: Welsh Government
Period: 2025 – 2026
Role: Principal Investigator
Research theme:

Welsh is spoken by approximately 900,000 people and has unique linguistic features, including initial consonant mutation, that pose significant challenges for standard NLP pipelines. This project develops the foundational NLP infrastructure for Welsh language assessment and learning, in partnership with Welsh-language educational institutions and the Welsh Government.

Key outputs include:

Proffiliadur: an open-source toolkit computing 141 linguistic complexity indices for Welsh texts, supporting CEFR-level classification and accessibility analysis
CEFR-Cymraeg: the first CEFR-annotated proficiency dataset for Welsh (A1–B2), enabling automated language proficiency assessment for Welsh learners

Projects | Fernando Alva-Manchego

Translation is Not Enough (TINE): Plain Language Adaptation of Multilingual Science

Cardiff’s use case: Welsh social care research

What TINE will deliver

SWEET: The Social Work Evidence Engagement Tool

NLP Tools for Welsh Language Assessment and Learning

Selected Publications