Semantic Role Labeling for Brazilian Portuguese: A Benchmark

Abstract

One of the main research challenges in Semantic Role Labeling (SRL) is the development of systems for languages other than English. For Brazilian Portuguese, a corpus with appropriate manually-annotated data, PropBank.Br, has recently become available. Proposals for implementing SRL systems using this corpus have already been made, but no standard way of comparing their results is available. We present a benchmark for comparing SRL systems for Brazilian Portuguese, based on the CoNLL Shared Tasks on SRL for English. Training and test data sets, evaluation metrics and a baseline system are provided as part of this benchmark. These resources have been used to implement a supervised SRL system which outperforms the baseline (17 points better in F 1 measure). Most importantly, the benchmark proved to be useful for evaluating the performance of SRL systems for Brazilian Portuguese.

Publication
IBERAMIA 2012
Fernando Alva-Manchego
Fernando Alva-Manchego
Lecturer

My research interests include text simplification, readability assessment, evaluation of natural language generation, and writing assistance.