Header

UZH-Logo

Maintenance Infos

Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication


Sugisaki, Kyoko (2017). Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication. In: 27th International Conference, GSCL 2017, Berlin, September 2017, s.n..

Abstract

In this paper, we present a segmentation system for German texts. We apply conditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representation (i.e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.

Abstract

In this paper, we present a segmentation system for German texts. We apply conditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representation (i.e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.

Statistics

Citations

Altmetrics

Downloads

87 downloads since deposited on 20 Feb 2018
17 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of German Studies
06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Theoretical Computer Science
Physical Sciences > General Computer Science
Language:English
Event End Date:September 2017
Deposited On:20 Feb 2018 16:25
Last Modified:26 Jan 2022 16:14
Publisher:s.n.
Funders:SNF
OA Status:Hybrid
Publisher DOI:https://doi.org/10.1007/978-3-319-73706-5_6
Project Information:
  • : FunderSNSF
  • : Grant ID
  • : Project TitleSNF

Download

Hybrid Open Access

Download PDF  'Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication'.
Preview
Content: Published Version
Filetype: PDF
Size: 124kB
View at publisher