Header

UZH-Logo

Maintenance Infos

Automated identification of bias inducing words in news articles using linguistic and context-oriented features


Spinde, Timo; Rudnitckaia, Lada; Mitrović, Jelena; Hamborg, Felix; Granitzer, Michael; Gipp, Bela; Donnay, Karsten (2021). Automated identification of bias inducing words in news articles using linguistic and context-oriented features. Information Processing & Management, 58(3):102505.

Abstract

Media has a substantial impact on public perception of events, and, accordingly, the way media presents events can potentially alter the beliefs and views of the public. One of the ways in which bias in news articles can be introduced is by altering word choice. Such a form of bias is very challenging to identify automatically due to the high context-dependence and the lack of a large-scale gold-standard data set. In this paper, we present a prototypical yet robust and diverse data set for media bias research. It consists of 1,700 statements representing various media bias instances and contains labels for media bias identification on the word and sentence level. In contrast to existing research, our data incorporate background information on the participants’ demographics, political ideology, and their opinion about media in general. Based on our data, we also present a way to detect bias-inducing words in news articles automatically. Our approach is feature-oriented, which provides a strong descriptive and explanatory power compared to deep learning techniques. We identify and engineer various linguistic, lexical, and syntactic features that can potentially be media bias indicators. Our resource collection is the most complete within the media bias research area to the best of our knowledge. We evaluate all of our features in various combinations and retrieve their possible importance both for future research and for the task in general. We also evaluate various possible Machine Learning approaches with all of our features. XGBoost, a decision tree implementation, yields the best results. Our approach achieves an -score of 0.43, a precision of 0.29, a recall of 0.77, and a ROC AUC of 0.79, which outperforms current media bias detection methods based on features. We propose future improvements, discuss the perspectives of the feature-based approach and a combination of neural networks and deep learning with our current system.

Abstract

Media has a substantial impact on public perception of events, and, accordingly, the way media presents events can potentially alter the beliefs and views of the public. One of the ways in which bias in news articles can be introduced is by altering word choice. Such a form of bias is very challenging to identify automatically due to the high context-dependence and the lack of a large-scale gold-standard data set. In this paper, we present a prototypical yet robust and diverse data set for media bias research. It consists of 1,700 statements representing various media bias instances and contains labels for media bias identification on the word and sentence level. In contrast to existing research, our data incorporate background information on the participants’ demographics, political ideology, and their opinion about media in general. Based on our data, we also present a way to detect bias-inducing words in news articles automatically. Our approach is feature-oriented, which provides a strong descriptive and explanatory power compared to deep learning techniques. We identify and engineer various linguistic, lexical, and syntactic features that can potentially be media bias indicators. Our resource collection is the most complete within the media bias research area to the best of our knowledge. We evaluate all of our features in various combinations and retrieve their possible importance both for future research and for the task in general. We also evaluate various possible Machine Learning approaches with all of our features. XGBoost, a decision tree implementation, yields the best results. Our approach achieves an -score of 0.43, a precision of 0.29, a recall of 0.77, and a ROC AUC of 0.79, which outperforms current media bias detection methods based on features. We propose future improvements, discuss the perspectives of the feature-based approach and a combination of neural networks and deep learning with our current system.

Statistics

Citations

Dimensions.ai Metrics
15 citations in Web of Science®
24 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

34 downloads since deposited on 07 Oct 2021
18 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Political Science
08 Research Priority Programs > Digital Society Initiative
Dewey Decimal Classification:320 Political science
Scopus Subject Areas:Physical Sciences > Information Systems
Physical Sciences > Media Technology
Physical Sciences > Computer Science Applications
Social Sciences & Humanities > Management Science and Operations Research
Social Sciences & Humanities > Library and Information Sciences
Uncontrolled Keywords:library and information sciences, management science and operations research, computer science applications, media technology, information systems media bias, feature engineering, text analysis, context analysis, news analysis, bias data set
Language:English
Date:May 2021
Deposited On:07 Oct 2021 15:48
Last Modified:27 Jan 2024 02:40
Publisher:Elsevier
ISSN:0306-4573
OA Status:Hybrid
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1016/j.ipm.2021.102505
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)