Publication: From historic books to annotated XML: Building a large multilingual diachronic corpus
From historic books to annotated XML: Building a large multilingual diachronic corpus
Date
Date
Date
Citations
Jitca, M., Sennrich, R., & Volk, M. (2011). From historic books to annotated XML: Building a large multilingual diachronic corpus (No. 96). 96, 75–80. http://www.corpora.uni-hamburg.de/gscl2011/downloads/AZM96.pdf
Abstract
Abstract
Abstract
This paper introduces our approach towards annotating a large heritage corpus, which spans over 100 years of alpine literature. The corpus consists of over 16.000 articles from the yearbooks of the Swiss Alpine Club, 60% of which represent German texts, 38% French, 1% Italian and the remaining 1% Swiss German and Romansh. The present work describes the inherent difficulties in processing a multilingual corpus by referring to the most challenging annotation phases such as article identification, correction of optical character recognit
Additional indexing
Creators (Authors)
Event Title
Event Title
Event Title
Event Location
Event Location
Event Location
Event Country
Event Country
Event Country
Event Start Date
Event Start Date
Event Start Date
Event End Date
Event End Date
Event End Date
Page Range
Page Range
Page Range
Page end
Page end
Page end
Item Type
Item Type
Item Type
In collections
Dewey Decimal Classifikation
Dewey Decimal Classifikation
Dewey Decimal Classifikation
Language
Language
Language
Date available
Date available
Date available
Series Name
Series Name
Series Name
Number
Number
Number
ISSN or e-ISSN
ISSN or e-ISSN
ISSN or e-ISSN
OA Status
OA Status
OA Status
Citations
Jitca, M., Sennrich, R., & Volk, M. (2011). From historic books to annotated XML: Building a large multilingual diachronic corpus (No. 96). 96, 75–80. http://www.corpora.uni-hamburg.de/gscl2011/downloads/AZM96.pdf