Navigation auf zora.uzh.ch

Search

ZORA (Zurich Open Repository and Archive)

Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts

Schneider, Gerold; Pettersson, Eva; Percillier, Michael (2017). Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts. In: Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language, Gothenburg, 22 May 2017. Linköping University Electronic Press, Linköpings universitet, 40-46.

Abstract

To be able to use existing natural language processing tools for analysing historical text, an important preprocessing step is spelling normalisation, converting the original spelling to present-day spelling, before applying tools such as taggers and parsers. In this paper, we compare a probablistic, language-independent approach to spelling normalisation based on statistical machine translation (SMT) techniques, to a rule-based system combining dictionary lookup with rules and non-probabilistic weights. The rule-based system reaches the best accuracy, up to 94% precision at 74% recall, while the SMT system improves each tested period.

Additional indexing

Item Type:Conference or Workshop Item (Paper), original work
Communities & Collections:06 Faculty of Arts > English Department
06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:820 English & Old English literatures
Language:English
Event End Date:22 May 2017
Deposited On:30 May 2017 13:45
Last Modified:03 Dec 2020 15:19
Publisher:Linköping University Electronic Press, Linköpings universitet
Number:133
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://www.ep.liu.se/ecp/article.asp?issue=133&article=008&volume=#
Related URLs:https://spraakbanken.gu.se/swe/processing-historical-language (Organisation)
Download PDF  'Comparing Rule-based and SMT-based Spelling Normalisation for English Historical Texts'.
Preview
  • Content: Published Version

Metadata Export

Statistics

Downloads

72 downloads since deposited on 30 May 2017
19 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications