Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Neural text normalization with adapted decoding and POS features

Ruzsics, Tatyana; Lusetti, Massimo; Göhring, Anne; Samardžić, Tanja; Stark, Elisabeth (2019). Neural text normalization with adapted decoding and POS features. Natural Language Engineering, 25(5):585-605.

Abstract

Text normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Romance Studies
Dewey Decimal Classification:800 Literature, rhetoric & criticism
470 Latin & Italic languages
410 Linguistics
440 French & related languages
460 Spanish & Portuguese languages
450 Italian, Romanian & related languages
Scopus Subject Areas:Physical Sciences > Software
Social Sciences & Humanities > Language and Linguistics
Social Sciences & Humanities > Linguistics and Language
Physical Sciences > Artificial Intelligence
Language:English
Date:September 2019
Deposited On:27 Nov 2019 08:58
Last Modified:22 Dec 2024 02:36
Publisher:Cambridge University Press
ISSN:1351-3249
OA Status:Green
Publisher DOI:https://doi.org/10.1017/S1351324919000391
Related URLs:https://www.cambridge.org/core/journals/natural-language-engineering (Publisher)
Download PDF  'Neural text normalization with adapted decoding and POS features'.
Preview
  • Content: Submitted Version

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
3 citations in Web of Science®
5 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

63 downloads since deposited on 27 Nov 2019
12 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications