Header

UZH-Logo

Maintenance Infos

Reproducible extraction of cross-lingual topics (rectr)


Chan, Chung-Hong; Zeng, Jing; Wessler, Hartmut; Jungblut, Marc; Welbers, Kaspar; Bajjalieh, Joseph W; Althaus, Scott L; van Atteveldt, Wouter (2020). Reproducible extraction of cross-lingual topics (rectr). Communication Methods and Measures, 14(4):285-305.

Abstract

With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and propose a new method—Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news comparing our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.

Abstract

With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and propose a new method—Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news comparing our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.

Statistics

Citations

Dimensions.ai Metrics
3 citations in Web of Science®
4 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

1 download since deposited on 03 Dec 2020
1 download since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Department of Communication and Media Research
Dewey Decimal Classification:700 Arts
Scopus Subject Areas:Social Sciences & Humanities > Communication
Language:English
Date:1 October 2020
Deposited On:03 Dec 2020 17:33
Last Modified:28 Apr 2021 07:21
Publisher:Taylor & Francis
ISSN:1931-2458
Additional Information:This is an Accepted Manuscript of an article published by Taylor & Francis in Communication Methods and Measures on September 7th, 2020, available online: https://www.tandfonline.com/doi/full/10.1080/19312458.2020.1812555
OA Status:Closed
Publisher DOI:https://doi.org/10.1080/19312458.2020.1812555

Download

Closed Access: Download allowed only for UZH members

Content: Accepted Version
Language: English
Filetype: PDF - Registered users only until 7 March 2022
Size: 1MB
View at publisher
Embargo till: 2022-03-07