Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers

Ehrmann, Maud; Romanello, Matteo; Bircher, Stefan; Clematide, Simon (2020). Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers. In: Jose, Joemon M; Yilmaz, Emine; Magalhães, João; Castells, Pablo; Ferro, Nicola; Silva, Mário J; Martins, Flávio. Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II. Cham: Springer, 524-532.

Abstract

Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. If NE processing tools are increasingly being used in the context of historical documents, performance values are below the ones on contemporary data and are hardly comparable. In this context, this paper introduces the CLEF 2020 Evaluation Lab HIPE (Identifying Historical People, Places and other Entities) on named entity recognition and linking on diachronic historical newspaper material in French, German and English. Our objective is threefold: strengthening the robustness of existing approaches on non-standard inputs, enabling performance comparison of NE processing on historical texts, and, in the long run, fostering efficient semantic indexing of historical documents in order to support scholarship on digital cultural heritage collections.

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Theoretical Computer Science
Physical Sciences > General Computer Science
Uncontrolled Keywords:Named entity processing, Text understanding, Information extraction, Historical newspapers, Digital Humanities
Language:English
Date:2020
Deposited On:15 Feb 2021 06:46
Last Modified:12 Dec 2024 04:35
Publisher:Springer
Series Name:Lecture Notes in Computer Science
Number:12036
ISSN:0302-9743
ISBN:978-3-030-45441-8
OA Status:Closed
Publisher DOI:https://doi.org/10.1007/978-3-030-45442-5_68
Project Information:
  • Funder: SNSF
  • Grant ID: CRSII5_173719
  • Project Title: Media Monitoring of the Past

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

3 downloads since deposited on 15 Feb 2021
1 download since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications