Header

UZH-Logo

Maintenance Infos

XCoref: Cross-document Coreference Resolution in the Wild


Zhukova, Anastasia; Hamborg, Felix; Donnay, Karsten; Gipp, Bela (2022). XCoref: Cross-document Coreference Resolution in the Wild. In: Smits, Malte. Information for a Better World: Shaping the Global Future : 17th International Conference, iConference 2022, Virtual Event, February 28 – March 4, 2022, Proceedings, Part I. Cham: Springer, 272-291.

Abstract

Datasets and methods for cross-document coreference resolution (CDCR) focus on events or entities with strict coreference relations. They lack, however, annotating and resolving coreference mentions with more abstract or loose relations that may occur when news articles report about controversial and polarized events. Bridging and loose coreference relations trigger associations that may lead to exposing news readers to bias by word choice and labeling. For example, coreferential mentions of “direct talks between U.S. President Donald Trump and Kim” such as “an extraordinary meeting following months of heated rhetoric” or “great chance to solve a world problem” form a more positive perception of this event. A step towards bringing awareness of bias by word choice and labeling is the reliable resolution of coreferences with high lexical diversity. We propose an unsupervised method named XCoref, which is a CDCR method that capably resolves not only previously prevalent entities, such as persons, e.g., “Donald Trump,” but also abstractly defined concepts, such as groups of persons, “caravan of immigrants,” events and actions, e.g., “marching to the U.S. border.” In an extensive evaluation, we compare the proposed XCoref to a state-of-the-art CDCR method and a previous method TCA that resolves such complex coreference relations and find that
XCoref outperforms these methods. Outperforming an established CDCR model shows that the new CDCR models need to be evaluated on semantically complex mentions with more loose coreference relations to indicate their applicability of models to resolve mentions in the “wild” of political news articles.

Abstract

Datasets and methods for cross-document coreference resolution (CDCR) focus on events or entities with strict coreference relations. They lack, however, annotating and resolving coreference mentions with more abstract or loose relations that may occur when news articles report about controversial and polarized events. Bridging and loose coreference relations trigger associations that may lead to exposing news readers to bias by word choice and labeling. For example, coreferential mentions of “direct talks between U.S. President Donald Trump and Kim” such as “an extraordinary meeting following months of heated rhetoric” or “great chance to solve a world problem” form a more positive perception of this event. A step towards bringing awareness of bias by word choice and labeling is the reliable resolution of coreferences with high lexical diversity. We propose an unsupervised method named XCoref, which is a CDCR method that capably resolves not only previously prevalent entities, such as persons, e.g., “Donald Trump,” but also abstractly defined concepts, such as groups of persons, “caravan of immigrants,” events and actions, e.g., “marching to the U.S. border.” In an extensive evaluation, we compare the proposed XCoref to a state-of-the-art CDCR method and a previous method TCA that resolves such complex coreference relations and find that
XCoref outperforms these methods. Outperforming an established CDCR model shows that the new CDCR models need to be evaluated on semantically complex mentions with more loose coreference relations to indicate their applicability of models to resolve mentions in the “wild” of political news articles.

Statistics

Citations

Dimensions.ai Metrics
2 citations in Web of Science®
2 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

16 downloads since deposited on 07 Feb 2023
12 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Political Science
08 Research Priority Programs > Digital Society Initiative
Dewey Decimal Classification:320 Political science
Scopus Subject Areas:Physical Sciences > Theoretical Computer Science
Physical Sciences > General Computer Science
Uncontrolled Keywords:Cross-document coreference resolution, news analysis, media bias
Language:English
Date:23 February 2022
Deposited On:07 Feb 2023 09:52
Last Modified:29 Mar 2024 02:40
Publisher:Springer
Series Name:Lecture Notes in Computer Science
Number:17
ISSN:0302-9743
ISBN:9783030969561
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Publisher DOI:https://doi.org/10.1007/978-3-030-96957-8_25
Official URL:https://arxiv.org/abs/2109.05252
  • Content: Submitted Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)