Header

UZH-Logo

Maintenance Infos

Reducing OCR errors in Gothic-script documents


Furrer, Lenz; Volk, Martin (2011). Reducing OCR errors in Gothic-script documents. In: 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), Hisar, 16 September 2011 - 16 September 2011, 97-103.

Abstract

In order to improve OCR quality in texts originally typeset in Gothic script, we have built an automated correction system which is highly specialized for the given text. Our approach includes external dictionary resources as well as information derived from the text itself. The focus lies on testing and improving different methods for classifying words as correct or erroneous. Also, different techniques are applied to find and rate correction candidates. In addition, we are working on a web application that enables users to read and edit the digitized text online.

Abstract

In order to improve OCR quality in texts originally typeset in Gothic script, we have built an automated correction system which is highly specialized for the given text. Our approach includes external dictionary resources as well as information derived from the text itself. The focus lies on testing and improving different methods for classifying words as correct or erroneous. Also, different techniques are applied to find and rate correction candidates. In addition, we are working on a web application that enables users to read and edit the digitized text online.

Statistics

Altmetrics

Downloads

144 downloads since deposited on 27 Sep 2011
15 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:16 September 2011
Deposited On:27 Sep 2011 13:06
Last Modified:12 Aug 2017 11:22
ISBN:978-954-452-019-9

Download

Download PDF  'Reducing OCR errors in Gothic-script documents'.
Preview
Filetype: PDF
Size: 2MB