Header

UZH-Logo

Maintenance Infos

Unsupervised Text Segmentation for Automated Error Reduction


Furrer, Lenz (2014). Unsupervised Text Segmentation for Automated Error Reduction. In: KONVENS 2014, Hildesheim, 8 October 2014 - 10 October 2014. Universität Hildesheim, 178-185.

Abstract

Challenging the assumption that traditional whitespace/punctuation-based tokenisation is the best solution for any NLP application, I propose an alternative approach to segmenting text into processable units. The proposed approach is nearly knowledge-free, in that it does not rely on language-dependent, man-made resources. The text segmentation approach is applied to the task of automated error reduction in texts with high noise. The results are compared to conventional tokenisation.

Abstract

Challenging the assumption that traditional whitespace/punctuation-based tokenisation is the best solution for any NLP application, I propose an alternative approach to segmenting text into processable units. The proposed approach is nearly knowledge-free, in that it does not rely on language-dependent, man-made resources. The text segmentation approach is applied to the task of automated error reduction in texts with high noise. The results are compared to conventional tokenisation.

Statistics

Altmetrics

Downloads

104 downloads since deposited on 03 Dec 2014
5 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Uncontrolled Keywords:Unsupervised Segmentation, , OCR Error Correction
Language:English
Event End Date:10 October 2014
Deposited On:03 Dec 2014 17:18
Last Modified:30 Jul 2020 15:19
Publisher:Universität Hildesheim
ISBN:978-3-934105-46-1
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://nbn-resolving.de/urn:nbn:de:gbv:hil2-opus-2893
  • Content: Accepted Version
  • Language: English
  • Licence: Creative Commons: Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)