Knowledge Distillation for Quality Estimation

July 26, 2021·

Amit Gajbhiye

Marina Fomicheva

Fernando Alva-Manchego

Frédéric Blain

Abiola Obamuyide

Nikolaos Aletras

Lucia Specia

· 0 min read

ACL Anthology DOI Code PDF

Abstract

Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, disk and memory requirements of such models do not allow for wide usage in the real world. Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. We instead propose to directly transfer knowledge from a strong QE teacher model to a much smaller model with a different, shallower architecture. We show that this approach, in combination with data augmentation, leads to light-weight QE models that perform competitively with distilled pre-trained representations with 8x fewer parameters.

Type

Conference paper

Publication

Findings of ACL-IJCNLP 2021

Last updated on July 26, 2021

Authors

Fernando Alva-Manchego

Researcher in Natural Language Processing

My research interests include text simplification, readability assessment, multilingual NLP, Welsh language technology, and NLP for education and social care.

← IAPUCP at SemEval-2021 Task 1: Stacking Fine-Tuned Transformers is Almost All You Need for Lexical Complexity Prediction July 26, 2021

Controllable Text Simplification with Explicit Paraphrasing June 1, 2021 →