Header

UZH-Logo

Maintenance Infos

Data-driven Summarization of Scientific Articles


Nikolov, Nikola I; Pfeiffer, Michael; Hahnloser, Richard H R (2018). Data-driven Summarization of Scientific Articles. In: 7th International Workshop on Mining Scientific Publications, LREC 2018, Miyazaki, 7 May 2018 - 7 May 2018.

Abstract

Data-driven approaches to sequence-to-sequence modelling have been successfully applied to short text summarization of news articles. Such models are typically trained on input-summary pairs consisting of only a single or a few sentences, partially due to limited availability of multi-sentence training data. Here, we propose to use scientific articles as a new milestone for text summarization: large-scale training data come almost for free with two types of high-quality summaries at different levels - the title and the abstract. We generate two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches. Our analysis demonstrates that scientific papers are suitable for data-driven text summarization. Our results could serve as valuable benchmarks for scaling sequence-to-sequence models to very long sequences

Abstract

Data-driven approaches to sequence-to-sequence modelling have been successfully applied to short text summarization of news articles. Such models are typically trained on input-summary pairs consisting of only a single or a few sentences, partially due to limited availability of multi-sentence training data. Here, we propose to use scientific articles as a new milestone for text summarization: large-scale training data come almost for free with two types of high-quality summaries at different levels - the title and the abstract. We generate two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches. Our analysis demonstrates that scientific papers are suitable for data-driven text summarization. Our results could serve as valuable benchmarks for scaling sequence-to-sequence models to very long sequences

Statistics

Downloads

2 downloads since deposited on 08 Mar 2019
2 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Speech), not_refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Neuroinformatics
Dewey Decimal Classification:570 Life sciences; biology
Language:English
Event End Date:7 May 2018
Deposited On:08 Mar 2019 14:41
Last Modified:25 Sep 2019 00:27
Publisher:arxiv
OA Status:Green
Related URLs:https://arxiv.org/abs/1804.08875

Download

Green Open Access

Download PDF  'Data-driven Summarization of Scientific Articles'.
Preview
Content: Published Version
Filetype: PDF
Size: 459kB