Header

UZH-Logo

Maintenance Infos

Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication


Sugisaki, Kyoko (2017). Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication. In: 27th International Conference, GSCL 2017, Berlin, September 2017 - September 2017.

Abstract

In this paper, we present a segmentation system for German texts. We apply con- ditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representa- tion (i.e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.

Abstract

In this paper, we present a segmentation system for German texts. We apply con- ditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representa- tion (i.e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.

Statistics

Downloads

11 downloads since deposited on 20 Feb 2018
11 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of German Studies
06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Event End Date:September 2017
Deposited On:20 Feb 2018 16:25
Last Modified:31 Jul 2018 05:16
Publisher:s.n.
Funders:SNF
OA Status:Green
Project Information:
  • : FunderSNSF
  • : Grant ID
  • : Project TitleSNF

Download

Download PDF  'Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication'.
Preview
Content: Published Version
Filetype: PDF
Size: 124kB