Quick Search:

uzh logo
Browse by:

Zurich Open Repository and Archive

Maintenance: Tuesday, July the 26th 2016, 07:00-10:00

ZORA's new graphical user interface will be relaunched (For further infos watch out slideshow ZORA: Neues Look & Feel). There will be short interrupts on ZORA Service between 07:00am and 10:00 am. Please be patient.

Permanent URL to this publication: http://dx.doi.org/10.5167/uzh-24588

Schneider, Gerold; Kaljurand, Kaarel; Rinaldi, Fabio (2009). Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources. In: Gelbukh, Alexander. Computational Linguistics and Intelligent Text Processing. Berlin: Springer, 406-417.



We describe the task of automatically detecting interactions between proteins in biomedical literature. We use a syntactic parser, a corpus annotated for proteins, and manual decisions as training material.
After automatically parsing the GENIA corpus, which is manually annotated for proteins, all syntactic paths between proteins are extracted. These syntactic paths are manually disambiguated between meaningful paths and irrelevant paths. Meaningful paths are paths that express an interaction between the syntactically connected proteins, irrelevant paths are paths that do not convey any interaction. The resource created by these manual decisions is used in two ways. First, words that appear frequently inside a meaningful paths are learnt using simple machine learning. Second, these resources are applied to the task of automatically detecting interactions between proteins in biomedical literature. We use the IntAct corpus as an application corpus.
After detecting proteins in the IntAct texts, we automatically parse them and classify the syntactic paths between them using the meaningful paths from the resource created on GENIA and addressing sparse data problems by shortening the paths based on the words frequently appearing inside the meaningful paths, so-called transparent words.
We conduct an evaluation showing that we achieve acceptable recall and good precision, and we discuss the importance of transparent words for the task.


8 citations in Web of Science®
6 citations in Scopus®
Google Scholar™


3 downloads since deposited on 04 Feb 2010
0 downloads since 12 months

Detailed statistics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > English Department
Dewey Decimal Classification:000 Computer science, knowledge & systems
820 English & Old English literatures
410 Linguistics
Uncontrolled Keywords:IR, Information Retrieval, NLP, text mining, parsing, biomedicine, named-entity recognition
Date:March 2009
Deposited On:04 Feb 2010 14:15
Last Modified:08 May 2016 13:14
Series Name:Lecture Notes in Computer Science
Funders:Swiss National Science Fund, Grant 100014-118396/1
Additional Information:Best Paper Award (2nd place)
Official URL:http://www.springerlink.com/content/ux76487mn0605811/

Users (please log in): suggest update or correction for this item

Repository Staff Only: item control page