Chris Bail
Duke University


This is part of a series of tutorials I’ve written on collecting digital trace data from sites such as Twitter, Facebook, or other internet sources. These earlier tutorials demonstrated some of the potential of digital trace data, but also highlighted many limitations of these new wellsprings of data as well. For example, digital trace data are often incomplete, inaccessible, non-representative, unstructured–and thus difficult to work with– and sensitive in nature. Because of these limitations, there is growing consensus that hybrid approaches are needed that combine digital trace data with more conventional methods such as surveys.

Note: this tutorial is a work in progress. It will be updated soon to include more annotated code

Building Apps for Social Science Research

The term “app” has come to refer to an impressive array of software– from mobile tools we use to find our way around to desktop-based tools for editing photos. In this article, I make the case that apps can also be extremely useful for social scientists. More specifically, I argue that apps can provide a vehicle for social scientists to collect digital trace data alongside survey data. There are numerous advantages of such a hybrid strategy– for example, surveys can be used to collect demographic information about those who produce social media texts in order to evaluate issues of coverage and representativeness. Or, surveys can be used to collect information about other confounding factors. What is more, authentication dialogues within apps provide a natural opportunity to obtain informed consent from social media users– though this was not done in a comprehensive manner in past studies such as those that became central to the Cambridge Analytica scandal.

To demonstrate the potential of apps for social science research it will be useful to provide a more detailed example. Some years ago, I was interested in how social media posts go viral. I was specifically interested in public posts on Facebook fan pages by advocacy organizations and other civil society groups. Using the procedures described in my previous tutorials, I could easily collect the text of all messages and counts of the number of times they were shared or commented upon via Facebook’s Graph API. Yet I was not able to answer important questions about who had viewed such posts, and precisely how they interacted with them. I was also not able to measure critical features of the non-profit groups that I was studying such as their financial resources, number of staff, and their use of offline tactics to call attention to their cause.

To solve this problem, I created a web-based app called “Find Your People.” Find Your People allowed non-profit organizations to get high-quality analysis of their social media outreach via comparisons to their peers who had also installed the app. In return, organizations agreed to share non-public aggregate data about their audiences with me known as Facebook “insights data”– these include metrics such as the number of people age 18-24 who viewed a post on a given day, but not the names of those people or other information that could be used to easily identify them. In addition, Find Your People asked non-profit organizations to complete a brief web-based survey that allowed me to collect additional information about the organization. I recruited non-profit groups working in the fields of Autism Spectrum Disorders and Human Organ Donation, respectively, to install the apps. Response rates to these requests were relatively high, and I identified minimal evidence of response bias. I used this tool to develop a new theory of how social media posts go viral. Readers who are interested in this theory– or those would would like to see more detail about how the apps were employed– can view this paper, this paper, this paper, or this paper.

How to Build Apps

When I created the Find Your People app, app-building required somewhat involved knowledge of programming in multiple languages, web design, and cloud computing. Yet the R program Shiny has become a gamechanger. Shiny is an interactive app building tool that you can use directly from rStudio. In addition to an easy to use, integrated app building tool, RStudio also provides a variety of tools to host and deploy apps on the web with the click of a button. Finally, there is a vibrant community of Shiny app developers— many of whom share the code they used to create their apps on sites such as this one.

There are a number of excellent tutorials online about how to use Shiny, including this excellent video series. Many of these are simple tools for interactive data visualization, yet Shiny enables development of apps for just about anything. Indeed, API calls can be embedded within Shiny apps to produce analyses of a user’s twitter data. Consider, for example, this nice example. Shiny allows you to create text boxes, multiple choice buttons, and many other of the standard fare of online surveys. Together, these tools could be used to create the functionality that I developed in the stone age with much less time and energy.

Building Bots for Social Science Research

Another recent trend in studies that employ digital trace data is the creation of bots, or automated social media accounts. In a very creative study, political science PhD Student Kevin Munger build an app designed to examine racial harrassment on Twitter. He created two automated accounts– one of which had a profile picture with a white person and the other with an African-American person. The bots were then designed to a) search Twitter for tweets by white men that contain racist language; and then b) reply to these tweets with condemnations of the racist language. Notwithstanding some limitations of the research design, this study suggests that people are more likely to stop using racist language if they are chastized by the bot with the White person pictured in its Twitter profile than the African-American person pictured in its Twitter profile.