Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Reproducible extraction of cross-lingual topics (rectr)

Chan, Chung-Hong; Zeng, Jing; Wessler, Hartmut; Jungblut, Marc; Welbers, Kaspar; Bajjalieh, Joseph W; Althaus, Scott L; van Atteveldt, Wouter (2020). Reproducible extraction of cross-lingual topics (rectr). Communication Methods and Measures, 14(4):285-305.

Abstract

With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and propose a new method—Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news comparing our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Department of Communication and Media Research
Dewey Decimal Classification:070 News media, journalism & publishing
Scopus Subject Areas:Social Sciences & Humanities > Communication
Language:English
Date:1 October 2020
Deposited On:03 Dec 2020 17:33
Last Modified:22 Apr 2025 01:40
Publisher:Taylor & Francis
ISSN:1931-2458
Additional Information:This is an Accepted Manuscript of an article published by Taylor & Francis in Communication Methods and Measures on September 7th, 2020, available online: https://www.tandfonline.com/doi/full/10.1080/19312458.2020.1812555
OA Status:Green
Publisher DOI:https://doi.org/10.1080/19312458.2020.1812555
Download PDF  'Reproducible extraction of cross-lingual topics (rectr)'.
Preview
  • Content: Accepted Version
  • Language: English

Metadata Export

Statistics

Citations

Dimensions.ai Metrics
19 citations in Web of Science®
29 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

146 downloads since deposited on 03 Dec 2020
33 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications