Abstract
The amount of new discoveries (as published in the scientific
literature) in the area of Molecular Biology is currently growing at an
exponential rate. This growth makes it very difficult to filter the most
relevant results, and the extraction of the core information, for inclusion
in one of the knowledge resources being maintained by the research community, becomes very expensive. Therefore, there is a growing interest
in text processing approaches that can deliver selected information from
scientific publications, which can limit the amount of human intervention
normally needed to gather those results.
This paper presents and evaluates an approach aimed at automating
the process of extracting semantic relations (e.g. interactions between
genes and proteins) from scientific literature in the domain of Molecular
Biology. The approach, using a novel dependency-based parser, is based
on a complete syntactic analysis of the corpus.