Header

UZH-Logo

Maintenance Infos

Combining statistical machine translation and translation memories with domain adaptation


Läubli, Samuel; Fishel, Mark; Volk, Martin; Weibel, Manuela (2013). Combining statistical machine translation and translation memories with domain adaptation. In: NODALIDA 2013, Nordic Conference of Computational Linguistics, Oslo, Norway, 22 May 2013 - 24 May 2013, 331-341.

Abstract

Since the emergence of translation memory software, translation companies and freelance translators have been accumulating translated text for various languages and domains. This data has the potential of being used for training domain-specific machine translation systems for corporate or even personal use. But while the resulting systems usually perform well in translating domain-specific language, their out-of-domain vocabulary coverage is often insufficient due to the limited size of the translation memories. In this paper, we demonstrate that small in-domain translation memories can be successfully complemented with freely available general-domain parallel corpora such that (a) the number of out-of-vocabulary words (OOV) is reduced while (b) the in-domain terminology is preserved. In our experiments, a German–French and a German–Italian statistical machine translation system geared to marketing texts of the automobile industry has been significantly improved using Europarl and OpenSubtitles data, both in terms of automatic evaluation metrics and human judgement.

Abstract

Since the emergence of translation memory software, translation companies and freelance translators have been accumulating translated text for various languages and domains. This data has the potential of being used for training domain-specific machine translation systems for corporate or even personal use. But while the resulting systems usually perform well in translating domain-specific language, their out-of-domain vocabulary coverage is often insufficient due to the limited size of the translation memories. In this paper, we demonstrate that small in-domain translation memories can be successfully complemented with freely available general-domain parallel corpora such that (a) the number of out-of-vocabulary words (OOV) is reduced while (b) the in-domain terminology is preserved. In our experiments, a German–French and a German–Italian statistical machine translation system geared to marketing texts of the automobile industry has been significantly improved using Europarl and OpenSubtitles data, both in terms of automatic evaluation metrics and human judgement.

Statistics

Altmetrics

Downloads

63 downloads since deposited on 13 Jun 2013
8 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), not refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:24 May 2013
Deposited On:13 Jun 2013 06:26
Last Modified:15 Dec 2017 08:06
Publisher:Linköpings universitet Electronic Press
Series Name:Linköping Electronic Conference Proceedings
ISSN:1650-3686
ISBN:978-91-7519-589-6
Funders:Swiss Federal Commission for Technology and Innovation CTI
Free access at:Official URL. An embargo period may apply.
Official URL:http://www.ep.liu.se/ecp/085/030/ecp1385030.pdf
Related URLs:http://www.ep.liu.se/ecp_article/index.en.aspx?issue=085;article=030

Download

Download PDF  'Combining statistical machine translation and translation memories with domain adaptation'.
Preview
Content: Published Version
Language: English
Filetype: PDF
Size: 227kB