Header

UZH-Logo

Maintenance Infos

Entity recognition in parallel multi-lingual biomedical corpora: The CLEF-ER laboratory overview


Rebholz-Schuhmann, Dietrich; Clematide, Simon; Rinaldi, Fabio; Kafkas, Senay; van Mulligen, Erik M; Bui, Chinh; Hellrich, Johannes; Lewin, Ian; Milward, David; Poprat, Michael; Jimeno-Yepes, Antonio; Hahn, Udo; Kors, Jan (2013). Entity recognition in parallel multi-lingual biomedical corpora: The CLEF-ER laboratory overview. In: Forner, Pamela; Mueller, Henning; Rosso, Paolo; Paredes, Roberto. Information Access Evaluation. Multilinguality, Multimodality, and Visualization. Valencia: Springer, 353-367.

Abstract

The identification and normalisation of biomedical entities from the scientific literature has a long tradition and a number of challenges have contributed to the development of reliable solutions. Increasingly patient records are processed to align their content with other biomedical data resources, but this approach requires analysing documents in different languages across Europe [1,2].
The CLEF-ER challenge has been organized by the Mantra project partners to improve entity recognition (ER) in multilingual documents. Several corpora in different languages, i.e. Medline titles, EMEA documents and patent claims, have been prepared to enable ER in parallel documents. The participants have been ask to annotate entity mentions with concept unique identifiers (CUIs) in the documents of their preferred non-English language.
The evaluation determines the number of correctly identified entity mentions against a silver standard (Task A) and the performance measures for the identification of CUIs in the non-English corpora. The participants could make use of the prepared terminological resources for entity normalisation and of the English silver standard corpora (SSCs) as input for concept candidates in the non-English documents.
The participants used different approaches including translation techniques and word or phrase alignments apart from lexical lookup and other text mining techniques. The performances for task A and B was lower for the patent corpus in comparison to Medline titles and EMEA documents. In the patent documents, chemical entities were identified at higher performance, whereas the other two document types cover a higher portion of medical terms. The number of novel terms provided from all corpora is currently under investigation.
Altogether, the CLEF-ER challenge demonstrates the performances of annotation solutions in different languages against an SSC.

Abstract

The identification and normalisation of biomedical entities from the scientific literature has a long tradition and a number of challenges have contributed to the development of reliable solutions. Increasingly patient records are processed to align their content with other biomedical data resources, but this approach requires analysing documents in different languages across Europe [1,2].
The CLEF-ER challenge has been organized by the Mantra project partners to improve entity recognition (ER) in multilingual documents. Several corpora in different languages, i.e. Medline titles, EMEA documents and patent claims, have been prepared to enable ER in parallel documents. The participants have been ask to annotate entity mentions with concept unique identifiers (CUIs) in the documents of their preferred non-English language.
The evaluation determines the number of correctly identified entity mentions against a silver standard (Task A) and the performance measures for the identification of CUIs in the non-English corpora. The participants could make use of the prepared terminological resources for entity normalisation and of the English silver standard corpora (SSCs) as input for concept candidates in the non-English documents.
The participants used different approaches including translation techniques and word or phrase alignments apart from lexical lookup and other text mining techniques. The performances for task A and B was lower for the patent corpus in comparison to Medline titles and EMEA documents. In the patent documents, chemical entities were identified at higher performance, whereas the other two document types cover a higher portion of medical terms. The number of novel terms provided from all corpora is currently under investigation.
Altogether, the CLEF-ER challenge demonstrates the performances of annotation solutions in different languages against an SSC.

Statistics

Citations

5 citations in Web of Science®
10 citations in Scopus®
Google Scholar™

Altmetrics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Language:English
Date:2013
Deposited On:23 Oct 2013 08:34
Last Modified:07 Dec 2017 23:05
Publisher:Springer
Series Name:Lecture Notes in Computer Science
ISSN:0302-9743
ISBN:978-3-642-40802-1
Publisher DOI:https://doi.org/10.1007/978-3-642-40802-1_32
Related URLs:http://www.springer.com/computer/ai/book/978-3-642-40801-4

Download

Full text not available from this repository.
View at publisher