Text Simplification consists of rewriting sentences to make them easier to read and understand, while preserving as much as possible of their original meaning. Human editors simplify by performing several text transformations, such as replacing complex terms by simpler synonyms, reordering words or phrases, removing non-essential information, and splitting long sentences. Current models for Automatic Text Simplification are data-driven: given a large dataset of parallel original-simplified sentences, models are trained to implicitly learn how to perform a variety of editing operations that aim to make a text easier to read and understand. However, how do we know if this implicit learning of multi-operation simplifications results in automatic outputs with such characteristics? and how can we verify that an automatic output is actually ‘simpler’ than its original version? In this talk, I will shed some light in these questions by: (1) introducing ASSET, a new dataset for tuning and testing of simplification models with multi-operation reference simplifications; and (2) presenting the first meta-evaluation of automatic metrics for Automatic Sentence Simplification focused on simplicity, which shows how much the correlation between metrics and human judgements is affected by factors such as the perceived simplicity of the outputs, the system type, and the set of references used for computation.