Header

UZH-Logo

Maintenance Infos

Using syntax features and document discourse for relation extraction on PharmGKB and CTD


Schneider, Gerold; Clematide, Simon; Grigonyte, Gintare; Rinaldi, Fabio (2012). Using syntax features and document discourse for relation extraction on PharmGKB and CTD. In: SMBM 2012, Zurich, Switzerland, 3 September 2012 - 4 September 2012, 52-57.

Abstract

We present an approach to the extraction of relations between pharmacogenomics entities like drugs, genes and diseases which is based on syntax and on discourse. Particularly, discourse has not been studied widely for improving Text Mining. We learn syntactic features semi-automatically from lean document-level annotation. We show how a simple Maximum Entropy based machine learning approach helps to estimate the relevance of candidate relations based on dependency-based features found in the syntactic path connecting the involved entities. Maximum Entropy based relevance estimation of candidate pairs conditioned on syntactic features improves relation ranking by 68% relative increase measured by AUCiP/R and by 60% for TAP-k (k=10). We also show that automatically recognizing document-level discourse characteristics to expand and filter acronyms improves term recognition and interaction detection by 12% relative, measured by AUCiP/R and by TAP-k (k=10). Our pilot study uses PharmGKB and CTD as resources.

Abstract

We present an approach to the extraction of relations between pharmacogenomics entities like drugs, genes and diseases which is based on syntax and on discourse. Particularly, discourse has not been studied widely for improving Text Mining. We learn syntactic features semi-automatically from lean document-level annotation. We show how a simple Maximum Entropy based machine learning approach helps to estimate the relevance of candidate relations based on dependency-based features found in the syntactic path connecting the involved entities. Maximum Entropy based relevance estimation of candidate pairs conditioned on syntactic features improves relation ranking by 68% relative increase measured by AUCiP/R and by 60% for TAP-k (k=10). We also show that automatically recognizing document-level discourse characteristics to expand and filter acronyms improves term recognition and interaction detection by 12% relative, measured by AUCiP/R and by TAP-k (k=10). Our pilot study uses PharmGKB and CTD as resources.

Statistics

Citations

Downloads

28 downloads since deposited on 13 Mar 2013
4 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Event End Date:4 September 2012
Deposited On:13 Mar 2013 08:12
Last Modified:07 Dec 2017 20:30
Related URLs:http://www.smbm.eu/

Download

Download PDF  'Using syntax features and document discourse for relation extraction on PharmGKB and CTD'.
Preview
Content: Published Version
Filetype: PDF
Size: 309kB