Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish

May 1, 2016·

Andre Quispesaravia

Walter Perez

Marco Sobrevilla Cabezudo

Fernando Alva-Manchego

· 0 min read

ACL Anthology Code PDF

Abstract

Text Complexity Analysis is an useful task in Education. For example, it can help teachers select appropriate texts for their students according to their educational level. This task requires the analysis of several text features that people do mostly manually (e.g. syntactic complexity, words variety, etc.). In this paper, we present a tool useful for Complexity Analysis, called Coh-Metrix-Esp. This is the Spanish version of Coh-Metrix and is able to calculate 45 readability indices. We analyse how these indices behave in a corpus of “simple” and “complex” documents, and also use them as features in a complexity binary classifier for texts in Spanish. After some experiments with machine learning algorithms, we got 0.9 F-measure for a corpus that contains tales for kids and adults and 0.82 F-measure for a corpus with texts written for students of Spanish as a foreign language.

Type

Conference paper

Publication

LREC 2016

Last updated on October 1, 2020

Authors

Fernando Alva-Manchego

Researcher in Natural Language Processing

My research interests include text simplification, readability assessment, multilingual NLP, Welsh language technology, and NLP for education and social care.

← MASSAlign: Alignment and Annotation of Comparable Documents November 1, 2017

SciEsp: Structural Analysis of Abstracts Written in Spanish January 1, 2016 →