Header

UZH-Logo

Maintenance Infos

Benchmarking Data-driven Automatic Text Simplification for German


Säuberli, Andreas; Ebling, Sarah; Volk, Martin (2020). Benchmarking Data-driven Automatic Text Simplification for German. In: Gala, Nuria; Wilkens, Rodrigo. Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI). Marseille: European Language Resources Association, 41-48.

Abstract

Automatic text simplification is an active research area, and there are first systems for English, Spanish, Portuguese, and Italian. For German, no data-driven approach exists to this date, due to a lack of training data. In this paper, we present a parallel corpus of news items in German with corresponding simplifications on two complexity levels. The simplifications have been produced according to a well-documented set of guidelines. We then report on experiments in automatically simplifying the German news items using state-of-the-art neural machine translation techniques. We demonstrate that despite our small parallel corpus, our neural models were able to learn essential features of simplified language, such as lexical substitutions, deletion of less relevant words and phrases, and sentence shortening.

Abstract

Automatic text simplification is an active research area, and there are first systems for English, Spanish, Portuguese, and Italian. For German, no data-driven approach exists to this date, due to a lack of training data. In this paper, we present a parallel corpus of news items in German with corresponding simplifications on two complexity levels. The simplifications have been produced according to a well-documented set of guidelines. We then report on experiments in automatically simplifying the German news items using state-of-the-art neural machine translation techniques. We demonstrate that despite our small parallel corpus, our neural models were able to learn essential features of simplified language, such as lexical substitutions, deletion of less relevant words and phrases, and sentence shortening.

Statistics

Downloads

147 downloads since deposited on 14 Jul 2020
24 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Book Section, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
08 Research Priority Programs > Digital Society Initiative
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:2020
Deposited On:14 Jul 2020 15:02
Last Modified:26 Sep 2023 14:16
Publisher:European Language Resources Association
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:https://www.aclweb.org/anthology/2020.readi-1.7
  • Content: Published Version
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)