Navigation auf zora.uzh.ch

Search

ZORA (Zurich Open Repository and Archive)

Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication

Sugisaki, Kyoko (2017). Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication. In: 27th International Conference, GSCL 2017, Berlin, September 2017, s.n..

Abstract

In this paper, we present a segmentation system for German texts. We apply conditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representation (i.e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of German Studies
06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Theoretical Computer Science
Physical Sciences > General Computer Science
Language:English
Event End Date:September 2017
Deposited On:20 Feb 2018 16:25
Last Modified:26 Jan 2022 16:14
Publisher:s.n.
Funders:SNF
OA Status:Hybrid
Publisher DOI:https://doi.org/10.1007/978-3-319-73706-5_6
Project Information:
  • Funder: SNSF
  • Grant ID:
  • Project Title: SNF
Download PDF  'Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication'.
Preview
  • Content: Published Version

Metadata Export

Statistics

Citations

Altmetrics

Downloads

88 downloads since deposited on 20 Feb 2018
4 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications