Header

UZH-Logo

Maintenance Infos

Domain adaptation for translation models in statistical machine translation


Sennrich, Rico. Domain adaptation for translation models in statistical machine translation. 2013, University of Zurich, Faculty of Arts.

Abstract

We investigate methods to adapt translation models in SMT to a specific target domain. We discuss two major problems, unknown words because of data sparseness in the (in-domain) training data, and ambiguities arising from out-of-domain parallel texts with different domain-specific translations. We propose novel solutions to both problems.
The main contributions of this thesis are as follows:
* We present a novel translation model architecture that supports domain adaptation at decoding time from a vector of component models. The combination is implemented through instance weighting, and all statistics necessary for the computation of translation probabilities are stored in the models.
* We present an architecture to combine multiple MT systems, using techniques and ideas from domain adaptation. The hypotheses by external MT systems are treated as out-of-domain knowledge, and combined with in-domain data through instance weighting.
* We introduce a sentence alignment algorithm that is able to robustly align even noisy parallel texts. We found that higher-quality sentence alignment of the in-domain parallel text has a significant effect on translation quality in our target domain.
* We propose new translation model features that express how flexible, or general, translation units are, in order to prevent translations that only occur in the context of multiword expressions from being overgeneralised.

Abstract

We investigate methods to adapt translation models in SMT to a specific target domain. We discuss two major problems, unknown words because of data sparseness in the (in-domain) training data, and ambiguities arising from out-of-domain parallel texts with different domain-specific translations. We propose novel solutions to both problems.
The main contributions of this thesis are as follows:
* We present a novel translation model architecture that supports domain adaptation at decoding time from a vector of component models. The combination is implemented through instance weighting, and all statistics necessary for the computation of translation probabilities are stored in the models.
* We present an architecture to combine multiple MT systems, using techniques and ideas from domain adaptation. The hypotheses by external MT systems are treated as out-of-domain knowledge, and combined with in-domain data through instance weighting.
* We introduce a sentence alignment algorithm that is able to robustly align even noisy parallel texts. We found that higher-quality sentence alignment of the in-domain parallel text has a significant effect on translation quality in our target domain.
* We propose new translation model features that express how flexible, or general, translation units are, in order to prevent translations that only occur in the context of multiword expressions from being overgeneralised.

Statistics

Downloads

384 downloads since deposited on 14 Jan 2014
61 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Dissertation
Referees:Volk M, Schwenk H
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:2013
Deposited On:14 Jan 2014 15:50
Last Modified:05 Apr 2016 17:23
Number of Pages:148

Download

Download PDF  'Domain adaptation for translation models in statistical machine translation'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 976kB