Abstract
Social media in general, and the Twitter micro-blogging service in particular, have recently received a lot of attention by social science scholars and political observers as potentially promising new data sources. The validity of such data, however, remains heavily disputed. One of the most serious problems that potential users of Twitter data have to face is selection bias. We propose an at least partial solution to this issue for Switzerland by compiling an original data set of all Twitter accounts affiliated with a Swiss political party. Specifically, we first use chain-referral sampling to expand an initial set of high-profile politicians and party users to a larger set of potentially relevant accounts. Then, we use text mining to filter for accounts with self-declared party affiliations. Using this method, we are able to systematically compare our sample of Twitter users against the broader population. We find significant bias with respect to ideology, gender, location, and language. Quantifying the selection bias in our sample enables potential future users of the data to mitigate some of its negative consequences. To illustrate this, we provide two proof-of-concept case studies. First, we use network analysis to exploit the relations embedded in Twitter data. This allows us to gain insights about central actors and drivers of the relationships of the communication networks. Then, we focus on political communication by using topic models to show who talks about what on Twitter and how this relates communication. Specifically, we use topic models to identify salient issues in public opinion.