Header

UZH-Logo

Maintenance Infos

Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction


Ellendorff, Tilia; Rinaldi, Fabio; Clematide, Simon (2014). Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, 2014 - 2014, 3736-3741.

Abstract

We show how to use large biomedical databases in order to obtain a gold standard for training a machine learning system over a corpus of biomedical text. As an example we use the Comparative Toxicogenomics Database (CTD) and describe by means of a short case study how the obtained data can be applied. We explain how we exploit the structure of the database for compiling training material and a testset. Using a Naive Bayes document classification approach based on words, stem bigrams and MeSH descriptors we achieve a macro-average F-score of 61% on a subset of 8 action terms. This outperforms a baseline system based on a lookup of stemmed keywords by more than 20%. Furthermore, we present directions of future work, taking the described system as a vantage point. Future work will be aiming towards a weakly supervised system capable of discovering complete biomedical interactions and events.

Abstract

We show how to use large biomedical databases in order to obtain a gold standard for training a machine learning system over a corpus of biomedical text. As an example we use the Comparative Toxicogenomics Database (CTD) and describe by means of a short case study how the obtained data can be applied. We explain how we exploit the structure of the database for compiling training material and a testset. Using a Naive Bayes document classification approach based on words, stem bigrams and MeSH descriptors we achieve a macro-average F-score of 61% on a subset of 8 action terms. This outperforms a baseline system based on a lookup of stemmed keywords by more than 20%. Furthermore, we present directions of future work, taking the described system as a vantage point. Future work will be aiming towards a weakly supervised system capable of discovering complete biomedical interactions and events.

Statistics

Downloads

74 downloads since deposited on 16 Jan 2015
3 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Social Sciences & Humanities > Linguistics and Language
Social Sciences & Humanities > Library and Information Sciences
Social Sciences & Humanities > Education
Social Sciences & Humanities > Language and Linguistics
Language:English
Event End Date:2014
Deposited On:16 Jan 2015 08:13
Last Modified:30 Jul 2020 16:14
Publisher:European Language Resources Association (ELRA)
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://www.lrec-conf.org/proceedings/lrec2014/pdf/1156_Paper.pdf
Related URLs:http://www.lrec-conf.org/proceedings/lrec2014/index.html

Download

Green Open Access

Download PDF  'Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 302kB