Neural Readability Pairwise Ranking for Sentences in Italian Administrative Language

November 20, 2022·

Martina Miliani

Serena Auriemma

Fernando Alva-Manchego

Alessandro Lenci

· 0 min read

ACL Anthology Dataset PDF

Abstract

Automatic Readability Assessment aims at assigning a complexity level to a given text, which could help improve the accessibility to information in specific domains, such as the administrative one. In this paper, we investigate the behavior of a Neural Pairwise Ranking Model (NPRM) for sentence-level readability assessment of Italian administrative texts. To deal with data scarcity, we experiment with cross-lingual, cross- and in-domain approaches, and test our models on Admin-It, a new parallel corpus in the Italian administrative language, containing sentences simplified using three different rewriting strategies. We show that NPRMs are effective in zero-shot scenarios (~0.78 ranking accuracy), especially with ranking pairs containing simplifications produced by overall rewriting at the sentence-level, and that the best results are obtained by adding in-domain data (achieving perfect performance for such sentence pairs). Finally, we investigate where NPRMs failed, showing that the characteristics of the data used for fine-tuning, rather than its size, have a bigger effect on a model’s performance.

Type

Conference paper

Publication

AACL-IJCNLP 2022

Last updated on November 20, 2022