Header

UZH-Logo

Maintenance Infos

Unsupervised Text Segmentation for Automated Error Reduction


Furrer, Lenz (2014). Unsupervised Text Segmentation for Automated Error Reduction. In: KONVENS 2014, Hildesheim, 8 October 2014 - 10 October 2014, 178-185.

Abstract

Challenging the assumption that traditional whitespace/punctuation-based tokenisation is the best solution for any NLP application, I propose an alternative approach to segmenting text into processable units. The proposed approach is nearly knowledge-free, in that it does not rely on language-dependent, man-made resources. The text segmentation approach is applied to the task of automated error reduction in texts with high noise. The results are compared to conventional tokenisation.

Abstract

Challenging the assumption that traditional whitespace/punctuation-based tokenisation is the best solution for any NLP application, I propose an alternative approach to segmenting text into processable units. The proposed approach is nearly knowledge-free, in that it does not rely on language-dependent, man-made resources. The text segmentation approach is applied to the task of automated error reduction in texts with high noise. The results are compared to conventional tokenisation.

Statistics

Altmetrics

Downloads

103 downloads since deposited on 03 Dec 2014
46 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Uncontrolled Keywords:Unsupervised Segmentation OCR Error Correction
Language:English
Event End Date:10 October 2014
Deposited On:03 Dec 2014 17:18
Last Modified:15 Aug 2017 21:04
Publisher:Universität Hildesheim
ISBN:978-3-934105-46-1
Free access at:Official URL. An embargo period may apply.
Official URL:http://nbn-resolving.de/urn:nbn:de:gbv:hil2-opus-2893

Download

Download PDF  'Unsupervised Text Segmentation for Automated Error Reduction'.
Preview
Content: Accepted Version
Language: English
Filetype: PDF
Size: 961kB
Licence: Creative Commons: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)