Text Simplification consists of rewriting sentences to make them easier to read and understand, while preserving as much as possible of their original meaning. Human editors simplify by performing several text transformations, such as replacing complex terms by simpler synonyms, reordering words or phrases, removing non-essential information, and splitting long sentences. Despite this multi-operation nature, evaluation of automatic simplification systems relies on metrics that moderately correlate with human judgements on the simplicity achieved by executing specific operations (e.g. simplicity gain based on lexical replacements). In this talk, I will present the results of the first meta-evaluation of automatic metrics for Sentence Simplification, focused on simplicity judgements. I will first introduce a newly-collected dataset for evaluating the correlation of metrics and human judgements. Then, I will present an analysis on the variation of the correlation between metrics’ scores and human assessments across three dimensions: the perceived simplicity level, the system type and the set of references used for computation. Results show that these three aspects affect the correlations and, in particular, highlight the limitations of commonly-used operation-specific metrics. Finally, based on these findings, I will outline a set of recommendations for automatic evaluation of multi-operation simplification, suggesting which metrics to compute and how to interpret their scores.