Most current models for Automatic Text Simplification are data-driven: given a large dataset of parallel original-simplified sentences, models are trained to implicitly learn how to perform a variety of editing operations that aim to make a text easier to read and understand. However, how do we verify that an automatic output is actually ‘simpler’ than its original version? As is the case for many Natural Language Processing tasks, this should be done using both automatic and manual assessments. In this talk, I will first present the results of a meta-evaluation of automatic metrics for Automatic Sentence Simplification, and will show how much the correlation between metrics and human judgements is affected by factors such as the perceived simplicity of the outputs, the system type, and the set of references used for computation. After that, I will present some preliminary results on a study of joint Translation and Simplification, and show how difficult it can be for lay users to manually assess simplicity. I will conclude with some recommendations and ideas for future work in evaluation of automatic simplifications.