Automatic Sentence Simplification with Multiple Rewriting Transformations


Sentence Simplification aims to rewrite a sentence in order to make it easier to read and understand, while preserving as much as possible of its original meaning. In order to do so, human editors perform several text transformations, such as replacing complex terms by simpler synonyms, reordering words or phrases, removing non-essential information, and splitting long sentences. However, executing these rewriting operations automatically while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. Considering that simplifications produced by humans encompass a variety of text transformations, we should expect automatic simplifications to be produced in a similar fashion. However, current data-driven models for the task leverage datasets that do not necessarily contain training instances that exhibit this variety of operations. As such, they tend to copy most of the original content, with only small changes focused on lexical paraphrasing. Furthermore, it is unclear whether this implicit learning of multi-operation simplifications results in automatic outputs with such characteristics, since current automatic evaluation resources (i.e. metrics and test sets) focus on single-operation simplifications. In this Thesis, we tackle these limitations in Sentence Simplification research in four aspects. First, we develop novel annotation algorithms that are able to identify the simplification operations that were performed by automatic models at word, phrase and sentence levels. We propose to use these algorithms in an operation-based error analysis method, that measures the correctness of executing specific operations based on reference simplifications. This functionality is incorporated into EASSE, our new software package for standard automatic evaluation of simplification systems. We use EASSE to benchmark several simplification systems, and show that our proposed operation-based error analysis serves to better understand the scores computed using automatic metrics. Second, we introduce ASSET, a new multi-reference dataset for tuning and evaluation of Sentence Simplification models. Reference simplifications in ASSET were produced by human editors applying multiple rewriting transformations. We show that simplifications in ASSET offer more variability than other commonly-used evaluation datasets. In addition, we perform a human evaluation study that demonstrates that multi-operation simplifications are judged simpler than single-operation ones. We also motivate the need to develop new metrics suitable for multi-operation simplification assessment, since we show that judgements on simplicity do not have strong correlations with commonly-used multi-reference metrics computed using multi-operation simplification references. Third, we carry out the first meta-evaluation of automatic evaluation metrics in Sentence Simplification. We collect a new more reliable dataset for evaluating the behaviour of metrics and human judgements of simplicity. We use this data (and other existing datasets) to analyse the variation of the correlation of automatic metrics and simplicity judgements across three dimensions: the perceived simplicity level, the system type and the set of references used for computation. We show that these three aspects affect the correlations and, in particular, highlight the limitations of commonly-used simplification-specific metrics. Based on our findings, we elaborate a set of recommendations for automatic evaluation of multi-operation simplification, indicating which metrics to compute and how to interpret their scores. Finally, we implement MulTSS, a multi-operation Sentence Simplification model based on a multi-task learning architecture. We leverage training data from related text rewriting tasks (lexical paraphrasing, extractive compression and split-and-rephrase) to enhance the multi-operation capabilities of a standard simplification model. We show that our multi-task approach can generate better simplifications than strong single-task and pipeline baselines.

PhD Thesis