Header

UZH-Logo

Maintenance Infos

Question answering in terminology-rich technical domains


Rinaldi, Fabio; Hess, M; Dowdall, J; Aliod Moll, D; Schwitter, R (2004). Question answering in terminology-rich technical domains. In: Maybury, M T. New directions in question answering. Menlo Park: AAAI Press, 133-140.

Abstract

The current tendency in Question Answering is towards the processing of large volumes of open-domain text. This tendency is spurred by the creation of the Question Answering track in TREC, and the recent increase of systems that use the Web to extract the answers to the questions. This has undoubtly the advantage that narrow, application-specific concerns can be overlooked in favor of more general approaches. However the unconstrained nature of the domain and questions does not necessarily lead to systems that are better at specific tasks, as they might be required in a deployed application.

It has been already been observed in other competitions (notably the Information Extraction competitions organized under the name of Message Understanding Conferences) that the nature of the competitive process tends to select a type of system that better adapts to the evaluation itself, rather than systems that deal in an optimal way with the problem. [To use a comparison from evolution theory, a too severe selection in a given local environment leads to a converge of the population to a very limited genetic pool, which is then uncapable of coping with even a minor change in the environment.]

In restricted domains, systems cannot take advantage of the so-called "Zipf's law of Questions" [Prager], which states that there is an inverse relation between the frequency of certain types of questions and their complexity. In other words, the questions most frequently asked are those that can be solved with simpler techniques. By targeting a smaller set of frequent questions types, system can achieve good results with limited effort.

By contrast, the non-redundant nature of most technical documentation, and the use of domain specific sublanguage and terminology, makes them unsuitable to (some of) the approaches seen in the TREC QA competition. In the proposed contribution We will discuss the specific nature of technical documentation, with examples from real domains (e.g. the Maintenance Manual of a large commercial aircraft) and illustrate solutions that have been adopted in a deployed system.

An example of the difference between technical documents and open domain texts is the focus on specific types of entities. While in Open Domain systems Named Entities play a major role, in Technical Documentation they are almost irrelevant, by contrast a far greater role is played by domain terminology.

Technical domains present the additional problem of "domain navigation". By assuming that users are familiar with domain concepts, inexpert users are presented with a barrier separating questions from answers. Unfamiliarity with domain terminology might lead to questions which contain imperfect formulations of domain terms. A question answering system for junior doctors or training technicians needs therefore to use whatever scarce domain knowledge is contained in a query to extract relevant answers. Detecting terminological variants and exploiting the relations between terms (like synonymy, meronymy, antonymy) is vital to this task.

Another idiosyncrasy of technical domains is the tendency towards definitional questions ("what is the ANT connection?"), which prove tricky to answer precisely in a generic document collection (and for this reason they have been deliberately left out of the recent TREC 2002). In Technical Domains it can be expected that such type of question would play a major role, and therefore systems must be capable of coping with them.

In this book chapter we aim to explain the above concepts and illustrate them with examples taken from text from technical domains. We will also illustrate why techniques that are typically used in data-intensive open-domain question-answering systems would not work effectively in technical domains that have less data redundancy. In sum, we will show that question-answering of technical domains present a better opportunity to explore content-based approaches to question-answering, while at the same time bringing the possibility of producing commercially viable systems in the short term.

Abstract

The current tendency in Question Answering is towards the processing of large volumes of open-domain text. This tendency is spurred by the creation of the Question Answering track in TREC, and the recent increase of systems that use the Web to extract the answers to the questions. This has undoubtly the advantage that narrow, application-specific concerns can be overlooked in favor of more general approaches. However the unconstrained nature of the domain and questions does not necessarily lead to systems that are better at specific tasks, as they might be required in a deployed application.

It has been already been observed in other competitions (notably the Information Extraction competitions organized under the name of Message Understanding Conferences) that the nature of the competitive process tends to select a type of system that better adapts to the evaluation itself, rather than systems that deal in an optimal way with the problem. [To use a comparison from evolution theory, a too severe selection in a given local environment leads to a converge of the population to a very limited genetic pool, which is then uncapable of coping with even a minor change in the environment.]

In restricted domains, systems cannot take advantage of the so-called "Zipf's law of Questions" [Prager], which states that there is an inverse relation between the frequency of certain types of questions and their complexity. In other words, the questions most frequently asked are those that can be solved with simpler techniques. By targeting a smaller set of frequent questions types, system can achieve good results with limited effort.

By contrast, the non-redundant nature of most technical documentation, and the use of domain specific sublanguage and terminology, makes them unsuitable to (some of) the approaches seen in the TREC QA competition. In the proposed contribution We will discuss the specific nature of technical documentation, with examples from real domains (e.g. the Maintenance Manual of a large commercial aircraft) and illustrate solutions that have been adopted in a deployed system.

An example of the difference between technical documents and open domain texts is the focus on specific types of entities. While in Open Domain systems Named Entities play a major role, in Technical Documentation they are almost irrelevant, by contrast a far greater role is played by domain terminology.

Technical domains present the additional problem of "domain navigation". By assuming that users are familiar with domain concepts, inexpert users are presented with a barrier separating questions from answers. Unfamiliarity with domain terminology might lead to questions which contain imperfect formulations of domain terms. A question answering system for junior doctors or training technicians needs therefore to use whatever scarce domain knowledge is contained in a query to extract relevant answers. Detecting terminological variants and exploiting the relations between terms (like synonymy, meronymy, antonymy) is vital to this task.

Another idiosyncrasy of technical domains is the tendency towards definitional questions ("what is the ANT connection?"), which prove tricky to answer precisely in a generic document collection (and for this reason they have been deliberately left out of the recent TREC 2002). In Technical Domains it can be expected that such type of question would play a major role, and therefore systems must be capable of coping with them.

In this book chapter we aim to explain the above concepts and illustrate them with examples taken from text from technical domains. We will also illustrate why techniques that are typically used in data-intensive open-domain question-answering systems would not work effectively in technical domains that have less data redundancy. In sum, we will show that question-answering of technical domains present a better opportunity to explore content-based approaches to question-answering, while at the same time bringing the possibility of producing commercially viable systems in the short term.

Statistics

Altmetrics

Downloads

0 downloads since deposited on 06 Aug 2009
0 downloads since 12 months

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:410 Linguistics
000 Computer science, knowledge & systems
Language:English
Date:2004
Deposited On:06 Aug 2009 11:31
Last Modified:19 Aug 2017 22:29
Publisher:AAAI Press
ISBN:978-0-262-63304-8

Download