Header

UZH-Logo

Maintenance Infos

OGER++: hybrid multi-type entity recognition


Furrer, Lenz; Jancso, Anna; Colic, Nicola; Rinaldi, Fabio (2019). OGER++: hybrid multi-type entity recognition. Journal of Cheminformatics, 11(1):7.

Abstract

Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.
Results: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively.
Conclusions: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.

Abstract

Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.
Results: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively.
Conclusions: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

30 downloads since deposited on 25 Jan 2019
30 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Uncontrolled Keywords:Named entity recognition, Concept recognition, Natural language processing, Machine learning
Language:English
Date:21 January 2019
Deposited On:25 Jan 2019 10:52
Last Modified:17 Sep 2019 19:58
Publisher:BioMed Central
ISSN:1758-2946
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1186/s13321-018-0326-3
Project Information:
  • : FunderSNSF
  • : Grant IDCR30I1_162758
  • : Project TitleMelanoBase

Download

Download PDF  'OGER++: hybrid multi-type entity recognition'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 1MB
View at publisher
Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)