UZH-Logo

Maintenance Infos

Learn-filter-apply-forget. Mixed approaches to named entity recognition


Volk, M; Clematide, S (2001). Learn-filter-apply-forget. Mixed approaches to named entity recognition. In: 6th International Workshop on Applications of Natural Language for Informations Systems, Madrid, Spain, 2001 - 2001.

Abstract

We have explored and implemented different approaches to named entity recognition in German, a difficult task in this language since both regular nouns and proper names are capitalized. Our goal is to identify and recognise per-
son names, geographical names and company names in a computer magazine corpus. Our geographical name classifier works with precompiled lists but our company name classifier learns the names from the corpus. For the recognition of
person names we work with a precompiled list of first names and the program learns the last names. For this classifier we suggest setting an activation value for the last name and subsequently depriming the value until “forgetting” the name. Our evaluation results show that our mixed approaches are as good as the recall and precision values reported for English. It is shown that a carefully tuned cascade of
name classifiers can even distinguish between different interpretations of a name token within the same document.

We have explored and implemented different approaches to named entity recognition in German, a difficult task in this language since both regular nouns and proper names are capitalized. Our goal is to identify and recognise per-
son names, geographical names and company names in a computer magazine corpus. Our geographical name classifier works with precompiled lists but our company name classifier learns the names from the corpus. For the recognition of
person names we work with a precompiled list of first names and the program learns the last names. For this classifier we suggest setting an activation value for the last name and subsequently depriming the value until “forgetting” the name. Our evaluation results show that our mixed approaches are as good as the recall and precision values reported for English. It is shown that a carefully tuned cascade of
name classifiers can even distinguish between different interpretations of a name token within the same document.

Downloads

161 downloads since deposited on 18 Aug 2009
81 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Event End Date:2001
Deposited On:18 Aug 2009 14:39
Last Modified:05 Apr 2016 13:19
Permanent URL: https://doi.org/10.5167/uzh-20271

Download

[img]
Preview
Filetype: PDF
Size: 1MB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations