Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Part-Of-Speech in Historical Corpora: Tagger Evaluation and Ensemble Systems on ARCHER

Schneider, Gerold; Hundt, Marianne; Oppliger, Rahel (2016). Part-Of-Speech in Historical Corpora: Tagger Evaluation and Ensemble Systems on ARCHER. In: KONVENS 2016, Bochum, 19 September 2016 - 21 September 2016, RUB.

Abstract

Tagger accuracy deteriorates when applied to texts different from the training corpus, e.g. with respect to register or time period. On historical data, accuracy can drop to and below 90%. We are tagging and parsing ARCHER, a historical corpus sampled from British and American texts from 1600-1999. We improve tagging accuracy by (1) using a version of the corpus that has been automatically mapped to PDE spelling with VARD, (2) by combining several part-of-speech taggers in an ensemble system – which improves tagging by about 1% over CLAWS and 2% over Tree-Tagger, and (3) by using a small amount of human intervention – which allows us to reach 98% accuracy from 1700 on.

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, original work
Communities & Collections:06 Faculty of Arts > English Department
06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:820 English & Old English literatures
Language:English
Event End Date:21 September 2016
Deposited On:16 Feb 2017 08:15
Last Modified:19 Feb 2024 08:00
Publisher:RUB
OA Status:Green

Metadata Export

Statistics

Citations

Downloads

176 downloads since deposited on 16 Feb 2017
11 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications