Header

UZH-Logo

Maintenance Infos

OGER++: hybrid multi-type entity recognition


Furrer, Lenz; Jancso, Anna; Colic, Nicola; Rinaldi, Fabio (2019). OGER++: hybrid multi-type entity recognition. Journal of Cheminformatics, 11(1):7.

Abstract

Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.
Results: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively.
Conclusions: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.

Abstract

Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.
Results: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively.
Conclusions: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.

Statistics

Citations

Dimensions.ai Metrics
2 citations in Web of Science®
4 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

25 downloads since deposited on 25 Jan 2019
13 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
08 Research Priority Programs > Digital Society Initiative
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Computer Science Applications
Physical Sciences > Physical and Theoretical Chemistry
Physical Sciences > Computer Graphics and Computer-Aided Design
Social Sciences & Humanities > Library and Information Sciences
Uncontrolled Keywords:Named entity recognition, Concept recognition, Natural language processing, Machine learning
Language:English
Date:21 January 2019
Deposited On:25 Jan 2019 10:52
Last Modified:03 Sep 2020 09:32
Publisher:BioMed Central
ISSN:1758-2946
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1186/s13321-018-0326-3
Project Information:
  • : FunderSNSF
  • : Grant IDCR30I1_162758
  • : Project TitleMelanoBase

Download

Gold Open Access

Download PDF  'OGER++: hybrid multi-type entity recognition'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 1MB
View at publisher
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)