Header

UZH-Logo

Maintenance Infos

A hybrid approach to finding phenotype candidates in genetic texts - Zurich Open Repository and Archive


Collier, Nigel; Tran, Mai-Vu; Le, Hoang-Quynh; Oellrich, Anika; Kawazoe, Ai; Hall-May, Martin; Rebholz-Schuhmann, Dietrich (2012). A hybrid approach to finding phenotype candidates in genetic texts. In: 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, 10 December 2012 - 14 December 2012.

Abstract

Named entity recognition (NER) has been extensively studied for the names of genes and gene products but there are few proposed solutions for phenotypes. Phenotype terms are expected to play a key role in inferring gene function in complex heritable diseases but are intrinsically difficult to analyse due to their complex semantics and scale. In contrast to previous approaches we evaluate state-of-the-art techniques involving the fusion of machine learning on a rich feature set with evidence from extant domain knowledge-sources. The techniques are validated on two gold standard collections including a novel annotated collection of 112 abstracts derived from a systematic search of the Online Mendelian Inheritance of Man database for auto-immune diseases. Encouragingly the hybrid model outperforms a HMM, a CRF and a pure knowledge-based method to achieve an F1 of 77.07. Disagreement analysis points to further improvements on this emerging NE task. The annotated corpus and guidelines are available on request.

Abstract

Named entity recognition (NER) has been extensively studied for the names of genes and gene products but there are few proposed solutions for phenotypes. Phenotype terms are expected to play a key role in inferring gene function in complex heritable diseases but are intrinsically difficult to analyse due to their complex semantics and scale. In contrast to previous approaches we evaluate state-of-the-art techniques involving the fusion of machine learning on a rich feature set with evidence from extant domain knowledge-sources. The techniques are validated on two gold standard collections including a novel annotated collection of 112 abstracts derived from a systematic search of the Online Mendelian Inheritance of Man database for auto-immune diseases. Encouragingly the hybrid model outperforms a HMM, a CRF and a pure knowledge-based method to achieve an F1 of 77.07. Disagreement analysis points to further improvements on this emerging NE task. The annotated corpus and guidelines are available on request.

Citations

Downloads

307 downloads since deposited on 11 Jan 2013
58 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Uncontrolled Keywords:conditional random fields, biomedicine, machine learning, genetic disorders, text mining
Language:English
Event End Date:14 December 2012
Deposited On:11 Jan 2013 08:28
Last Modified:06 Aug 2017 03:12
Related URLs:http://www.coling2012-iitb.org/

Download

Preview Icon on Download
Preview
Filetype: PDF
Size: 231kB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations