Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

20 Minuten: A Multi-task News Summarisation Dataset for German

Kew, Tannon; Kostrzewa, Marek; Ebling, Sarah (2023). 20 Minuten: A Multi-task News Summarisation Dataset for German. In: SwissText 2023: 8th Swiss Text Analytics Conference, Neuchâtel, 12 June 2023 - 14 June 2023. Association for Computational Linguistics, 1-13.

Abstract

Automatic text summarisation (ATS) is a central task in natural language processing that aims to reduce a long document into a shorter, concise summary that conveys its key points. Extractive approaches to ATS, which identify and copy the most important sentences or phrases from the original text, have long been a popular choice, but these summaries suffer from being incohesive and disjointed. More recently, abstractive approaches to ATS have gained popularity thanks to advancements in neural text generation. Yet, much of the research on ATS has been limited to English, due to its high-resource dominance.
This work introduces a new dataset for German- language news summarisation. Aside from summarisation, the dataset also allows for addressing additional NLP tasks such as image caption generation and read- ing time prediction. Furthermore, it is multi-purpose since article summaries cover a range of styles, including headlines, lead paragraphs and bullet-point summaries. In order to showcase the versatility of our dataset for different NLP tasks, we conduct experiments using mT5 [2] and compare the performance on six different tasks under single- and multi-task fine-tuning conditions, providing baselines for future work. Our findings show that dedicated models consistently perform better according to automatic metrics.

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:14 June 2023
Deposited On:29 Jun 2023 10:15
Last Modified:07 Mar 2024 14:11
Publisher:Association for Computational Linguistics
Series Name:Proceedings of the Swiss Text Analytics Conference
Number:8th edit.
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:https://aclanthology.org/2023.swisstext-1.1
Download PDF  '20 Minuten: A Multi-task News Summarisation Dataset for German'.
Preview
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)

Metadata Export

Statistics

Downloads

221 downloads since deposited on 29 Jun 2023
142 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications