Header

UZH-Logo

Maintenance Infos

Replicable semi-supervised approaches to state-of-the-art stance detection of tweets


Reveilhac, Maud; Schneider, Gerold (2023). Replicable semi-supervised approaches to state-of-the-art stance detection of tweets. Information Processing & Management, 60(2):103199.

Abstract

Stance is defined as the expression of a speaker’s standpoint towards a given target or entity. To date, the most reliable method for measuring stance is opinion surveys. However, people’s increased reliance on social media makes these online platforms an essential source of comple- mentary information about public opinion. Our study contributes to the discussion surrounding replicable methods through which to conduct reliable stance detection by establishing a rule- based model, which we replicated for several targets independently. To test our model, we relied on a widely used dataset of annotated tweets - the SemEval Task 6A dataset, which contains 5 targets with 4,163 manually labelled tweets. We relied on “off-the-shelf” sentiment lexica to expand the scope of our custom dictionaries, while also integrating linguistic markers and using word-pairs dependency information to conduct stance classification. While positive and negative evaluative words are the clearest markers of expression of stance, we demonstrate the added value of linguistic markers to identify the direction of the stance more precisely. Our model achieves an average classification accuracy of 75% (ranging from 67% to 89% across targets). This study is concluded by discussing practical implications and outlooks for future research, while highlighting that each target poses specific challenges to stance detection.

Abstract

Stance is defined as the expression of a speaker’s standpoint towards a given target or entity. To date, the most reliable method for measuring stance is opinion surveys. However, people’s increased reliance on social media makes these online platforms an essential source of comple- mentary information about public opinion. Our study contributes to the discussion surrounding replicable methods through which to conduct reliable stance detection by establishing a rule- based model, which we replicated for several targets independently. To test our model, we relied on a widely used dataset of annotated tweets - the SemEval Task 6A dataset, which contains 5 targets with 4,163 manually labelled tweets. We relied on “off-the-shelf” sentiment lexica to expand the scope of our custom dictionaries, while also integrating linguistic markers and using word-pairs dependency information to conduct stance classification. While positive and negative evaluative words are the clearest markers of expression of stance, we demonstrate the added value of linguistic markers to identify the direction of the stance more precisely. Our model achieves an average classification accuracy of 75% (ranging from 67% to 89% across targets). This study is concluded by discussing practical implications and outlooks for future research, while highlighting that each target poses specific challenges to stance detection.

Statistics

Citations

Dimensions.ai Metrics
4 citations in Web of Science®
6 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

7 downloads since deposited on 10 Feb 2023
7 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > English Department
06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Department of Communication and Media Research
08 Research Priority Programs > Digital Religion(s)
06 Faculty of Arts > Linguistic Research Infrastructure (LiRI)
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:070 News media, journalism & publishing
Scopus Subject Areas:Physical Sciences > Information Systems
Physical Sciences > Media Technology
Physical Sciences > Computer Science Applications
Social Sciences & Humanities > Management Science and Operations Research
Social Sciences & Humanities > Library and Information Sciences
Uncontrolled Keywords:Stance detection, Social media, Rule-based model, Linguistic features, Custom dictionaries
Language:English
Date:March 2023
Deposited On:10 Feb 2023 10:38
Last Modified:29 May 2024 01:48
Publisher:Elsevier
ISSN:0306-4573
OA Status:Hybrid
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1016/j.ipm.2022.103199
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)