Publication: Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication
Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication
Date
Date
Date
2017
Conference or Workshop Item
Published version
| cris.lastimport.scopus | 2025-05-21T03:32:31Z | |
| cris.lastimport.wos | 2025-08-17T03:19:48Z | |
| dc.contributor.institution | University of Zurich | |
| dc.date.accessioned | 2018-02-20T16:25:51Z | |
| dc.date.available | 2018-02-20T16:25:51Z | |
| dc.date.issued | 2017-09 | |
| dc.description.abstract | In this paper, we present a segmentation system for German texts. We apply conditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representation (i.e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication. | |
| dc.identifier.doi | 10.1007/978-3-319-73706-5_6 | |
| dc.identifier.scopus | 2-s2.0-85041103576 | |
| dc.identifier.uri | https://www.zora.uzh.ch/handle/20.500.14742/140265 | |
| dc.identifier.wos | 000491467200006 | |
| dc.language.iso | eng | |
| dc.subject.ddc | 000 Computer science, knowledge & systems | |
| dc.subject.ddc | 410 Linguistics | |
| dc.title | Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication | |
| dc.type | conference_item | |
| dcterms.accessRights | info:eu-repo/semantics/openAccess | |
| dcterms.bibliographicCitation.originalpublishername | s.n. | |
| dspace.entity.type | Publication | en |
| oairecerif.event.endDate | 2017-09 | |
| oairecerif.event.place | Berlin | |
| oairecerif.event.startDate | 2017-09 | |
| uzh.contributor.affiliation | University of Zurich | |
| uzh.contributor.author | Sugisaki, Kyoko | |
| uzh.contributor.correspondence | Yes | |
| uzh.document.availability | published_version | |
| uzh.eprint.datestamp | 2018-02-20 16:25:51 | |
| uzh.eprint.lastmod | 2022-01-26 16:14:38 | |
| uzh.eprint.statusChange | 2018-02-20 16:25:51 | |
| uzh.event.presentationType | paper | |
| uzh.event.title | 27th International Conference, GSCL 2017 | |
| uzh.event.type | conference | |
| uzh.funder.name | SNSF | |
| uzh.funder.projectTitle | SNF | |
| uzh.harvester.eth | Yes | |
| uzh.harvester.nb | No | |
| uzh.identifier.doi | 10.5167/uzh-149515 | |
| uzh.oastatus.unpaywall | hybrid | |
| uzh.oastatus.zora | Hybrid | |
| uzh.publication.citation | Sugisaki, Kyoko (2017). Word and sentence segmentation in german: Overcoming idiosyncrasies in the use of punctuation in private communication. In: 27th International Conference, GSCL 2017, Berlin, September 2017, s.n.. | |
| uzh.publication.freeAccessAt | UNSPECIFIED | |
| uzh.publication.originalwork | original | |
| uzh.publication.publishedStatus | final | |
| uzh.scopus.impact | 0 | |
| uzh.scopus.subjects | Theoretical Computer Science | |
| uzh.scopus.subjects | General Computer Science | |
| uzh.workflow.doaj | uzh.workflow.doaj.false | |
| uzh.workflow.eprintid | 149515 | |
| uzh.workflow.fulltextStatus | public | |
| uzh.workflow.revisions | 31 | |
| uzh.workflow.rightsCheck | keininfo | |
| uzh.workflow.status | archive | |
| uzh.wos.impact | 0 | |
| Files | ||
| Publication available in collections: |