UZH-Logo

Maintenance Infos

An environment for relation mining over richly annotated corpora: the case of GENIA


Rinaldi, F; Schneider, G; Kaljurand, K; Hess, M; Romacker, M (2006). An environment for relation mining over richly annotated corpora: the case of GENIA. BMC Bioinformatics, 7(Sup. 3):S3.

Abstract

BACKGROUND: The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information. RESULTS: We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information. CONCLUSION: The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation.

BACKGROUND: The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information. RESULTS: We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information. CONCLUSION: The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation.

Citations

Altmetrics

Downloads

66 downloads since deposited on 11 Feb 2008
6 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:November 2006
Deposited On:11 Feb 2008 12:11
Last Modified:05 Apr 2016 12:12
Publisher:BioMed Central
ISSN:1471-2105
Publisher DOI:10.1186/1471-2105-7-S3-S3
Related URLs:http://www.biomedcentral.com/1471-2105/7/S3/S3
PubMed ID:17134476
Permanent URL: http://doi.org/10.5167/uzh-17

Download

[img]
Preview
Content: Published Version
Filetype: PDF
Size: 715kB
View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations