Publication:

Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication

Date

Date

Date
2017
Conference or Workshop Item
Published version
cris.lastimport.scopus2025-05-21T03:32:31Z
cris.lastimport.wos2025-08-17T03:19:48Z
dc.contributor.institutionUniversity of Zurich
dc.date.accessioned2018-02-20T16:25:51Z
dc.date.available2018-02-20T16:25:51Z
dc.date.issued2017-09
dc.description.abstract

In this paper, we present a segmentation system for German texts. We apply conditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representation (i.e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.

dc.identifier.doi10.1007/978-3-319-73706-5_6
dc.identifier.scopus2-s2.0-85041103576
dc.identifier.urihttps://www.zora.uzh.ch/handle/20.500.14742/140265
dc.identifier.wos000491467200006
dc.language.isoeng
dc.subject.ddc000 Computer science, knowledge & systems
dc.subject.ddc410 Linguistics
dc.title

Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication

dc.typeconference_item
dcterms.accessRightsinfo:eu-repo/semantics/openAccess
dcterms.bibliographicCitation.originalpublishernames.n.
dspace.entity.typePublicationen
oairecerif.event.endDate2017-09
oairecerif.event.placeBerlin
oairecerif.event.startDate2017-09
uzh.contributor.affiliationUniversity of Zurich
uzh.contributor.authorSugisaki, Kyoko
uzh.contributor.correspondenceYes
uzh.document.availabilitypublished_version
uzh.eprint.datestamp2018-02-20 16:25:51
uzh.eprint.lastmod2022-01-26 16:14:38
uzh.eprint.statusChange2018-02-20 16:25:51
uzh.event.presentationTypepaper
uzh.event.title27th International Conference, GSCL 2017
uzh.event.typeconference
uzh.funder.nameSNSF
uzh.funder.projectTitleSNF
uzh.harvester.ethYes
uzh.harvester.nbNo
uzh.identifier.doi10.5167/uzh-149515
uzh.oastatus.unpaywallhybrid
uzh.oastatus.zoraHybrid
uzh.publication.citationSugisaki, Kyoko (2017). Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication. In: 27th International Conference, GSCL 2017, Berlin, September 2017, s.n..
uzh.publication.freeAccessAtUNSPECIFIED
uzh.publication.originalworkoriginal
uzh.publication.publishedStatusfinal
uzh.scopus.impact0
uzh.scopus.subjectsTheoretical Computer Science
uzh.scopus.subjectsGeneral Computer Science
uzh.workflow.doajuzh.workflow.doaj.false
uzh.workflow.eprintid149515
uzh.workflow.fulltextStatuspublic
uzh.workflow.revisions31
uzh.workflow.rightsCheckkeininfo
uzh.workflow.statusarchive
uzh.wos.impact0
Files

Original bundle

Name:
sugisaki2017b.pdf
Size:
121.88 KB
Format:
Adobe Portable Document Format
Publication available in collections: