ComplexityMT: Benchmarking the Interaction Between Text Complexity and Machine Translation
June 3, 2026·,,,,,,,,,
,·
0 min read
Joseph Marvin Imperial
Junhong Liang
Belal Shoer
Abdullah Barayan
Rodrigo Wilkens
Omar Mussa
Dawn Knight
Eugénio Ribeiro
Ekaterina Kochmar
Sowmya Vajjala
Fernando Alva-Manchego
Harish Tayyar Madabushi
Abstract
When a text is translated, does the translation retain the complexity of the original? We introduce ComplexityMT, a benchmark that uses CEFR proficiency levels to assess how text complexity interacts with machine translation across six languages: Arabic, Dutch, English, French, Hindi, and Russian. We systematically evaluate multiple translation models and find that higher source complexity increases translation difficulty and that MT systems shift target text complexity relative to source texts. Our benchmark provides a novel lens for evaluating MT quality through the dimension of text complexity, with implications for accessibility and language learning applications.
Type
Publication
arXiv preprint
