Header

UZH-Logo

Maintenance Infos

Potential biases in big data: omitted voices on social media


Hargittai, Eszter (2018). Potential biases in big data: omitted voices on social media. Social Science Computer Review:1-15.

Abstract

While big data offer exciting opportunities to address questions about social behavior, studies must not abandon traditionally important considerations of social science research such as data representativeness and sampling biases. Many big data studies rely on traces of people’s behavior on social media platforms such as opinions expressed through Twitter posts. How representative are such data? Whose voices are most likely to show up on such sites? Analyzing survey data about a national sample of American adults’ social network site usage, this article examines what user characteristics are associated with the adoption of such sites. Findings suggest that several sociodemographic factors relate to who adopts such sites. Those of higher socioeconomic status are more likely to be on several platforms suggesting that big data derived from social media tend to oversample the views of more privileged people. Additionally, Internet skills are related to using such sites, again showing that opinions visible on these sites do not represent all types of people equally. The article cautions against relying on content from such sites as the sole basis of data to avoid disproportionately ignoring the perspectives of the less privileged. Whether business interests or policy considerations, it is important that decisions that concern the whole population are not based on the results of analyses that favor the opinions of those who are already better off.

Abstract

While big data offer exciting opportunities to address questions about social behavior, studies must not abandon traditionally important considerations of social science research such as data representativeness and sampling biases. Many big data studies rely on traces of people’s behavior on social media platforms such as opinions expressed through Twitter posts. How representative are such data? Whose voices are most likely to show up on such sites? Analyzing survey data about a national sample of American adults’ social network site usage, this article examines what user characteristics are associated with the adoption of such sites. Findings suggest that several sociodemographic factors relate to who adopts such sites. Those of higher socioeconomic status are more likely to be on several platforms suggesting that big data derived from social media tend to oversample the views of more privileged people. Additionally, Internet skills are related to using such sites, again showing that opinions visible on these sites do not represent all types of people equally. The article cautions against relying on content from such sites as the sole basis of data to avoid disproportionately ignoring the perspectives of the less privileged. Whether business interests or policy considerations, it is important that decisions that concern the whole population are not based on the results of analyses that favor the opinions of those who are already better off.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

0 downloads since deposited on 22 Feb 2019
0 downloads since 12 months

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > Department of Communication and Media Research
Dewey Decimal Classification:700 Arts
Uncontrolled Keywords:Big data, data bias, sampling, sampling bias, survey, social media, Facebook, Twitter
Language:English
Date:30 July 2018
Deposited On:22 Feb 2019 16:01
Last Modified:30 Apr 2019 07:26
Publisher:Sage Publications Ltd.
ISSN:0894-4393
OA Status:Closed
Publisher DOI:https://doi.org/10.1177/0894439318788322

Download