Publication:

From historic books to annotated XML: Building a large multilingual diachronic corpus

Date

Date

Date
2011
Conference or Workshop Item
Published version

Citations

Citation copied

Jitca, M., Sennrich, R., & Volk, M. (2011). From historic books to annotated XML: Building a large multilingual diachronic corpus (No. 96). 96, 75–80. http://www.corpora.uni-hamburg.de/gscl2011/downloads/AZM96.pdf

Abstract

Abstract

Abstract

This paper introduces our approach towards annotating a large heritage corpus, which spans over 100 years of alpine literature. The corpus consists of over 16.000 articles from the yearbooks of the Swiss Alpine Club, 60% of which represent German texts, 38% French, 1% Italian and the remaining 1% Swiss German and Romansh. The present work describes the inherent difficulties in processing a multilingual corpus by referring to the most challenging annotation phases such as article identification, correction of optical character recognit

Metrics

Citations

Additional indexing

Creators (Authors)

Event Title

Event Title

Event Title
Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011

Event Location

Event Location

Event Location
Hamburg

Event Country

Event Country

Event Country
Germany

Event Start Date

Event Start Date

Event Start Date
2011-09-28

Event End Date

Event End Date

Event End Date
2011-09-30

Publisher

Publisher

Publisher

Page Range

Page Range

Page Range
75

Page end

Page end

Page end
80

Item Type

Item Type

Item Type
Conference or Workshop Item

Dewey Decimal Classifikation

Dewey Decimal Classifikation

Dewey Decimal Classifikation

Language

Language

Language
English

Date available

Date available

Date available
2011-11-21

Series Name

Series Name

Series Name
Arbeiten zur Mehrsprachigkeit, Folge B. Working Papers in Multilingualism, Series B

Number

Number

Number
96

ISSN or e-ISSN

ISSN or e-ISSN

ISSN or e-ISSN
0176-599X

OA Status

OA Status

OA Status
Green

Related URLs

Related URLs

Related URLs

Metrics

Citations

Citations

Citation copied

Jitca, M., Sennrich, R., & Volk, M. (2011). From historic books to annotated XML: Building a large multilingual diachronic corpus (No. 96). 96, 75–80. http://www.corpora.uni-hamburg.de/gscl2011/downloads/AZM96.pdf

Green Open Access
Loading...
Thumbnail Image

Files

Files

Files
Files available to download:1

Files

Files

Files
Files available to download:1
Loading...
Thumbnail Image