Header

UZH-Logo

Maintenance Infos

A New Dataset and Efficient Baselines for Document-level Text Simplification in German


Rios, Annette; Spring, Nicolas; Kew, Tannon; Kostrzewa, Marek; Säuberli, Andreas; Müller, Mathias; Ebling, Sarah (2021). A New Dataset and Efficient Baselines for Document-level Text Simplification in German. In: Third Workshop on New Frontiers in Summarization, Online and in Dominican Republic, 10 November 2021. ACL Anthology, 152-161.

Abstract

The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity.
We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (`20 Minutes') that consists of full articles paired with simplified summaries.
Furthermore, we present experiments on ATS with the pretrained multilingual mBART and a modified version thereof that is more memory-friendly, using both our new data set and existing simplification corpora.
Our modifications of mBART let us train at a lower memory cost without much loss in performance, in fact, the smaller mBART even improves over the standard model in a setting with multiple simplification levels.

Abstract

The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity.
We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (`20 Minutes') that consists of full articles paired with simplified summaries.
Furthermore, we present experiments on ATS with the pretrained multilingual mBART and a modified version thereof that is more memory-friendly, using both our new data set and existing simplification corpora.
Our modifications of mBART let us train at a lower memory cost without much loss in performance, in fact, the smaller mBART even improves over the standard model in a setting with multiple simplification levels.

Statistics

Downloads

131 downloads since deposited on 11 Nov 2021
34 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Speech), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:10 November 2021
Deposited On:11 Nov 2021 09:01
Last Modified:17 Feb 2022 06:38
Publisher:ACL Anthology
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:https://aclanthology.org/2021.newsum-1.16
Project Information:
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)