(obligatory network visualization)


Instructor: Chris Bail
Email:
Weekly Online Discussion Time: 4-5pm Tuesday Office Hours: 5:00-6:00pm Wednesday (please email in advance to schedule)
Github: https://github.com/cbail
Slack Channel: https://join.slack.com/t/duke-dwt8988/shared_invite/zt-kifv4g5z-esbBuZaxT2l8gR973yxUQw

Course Description

The past decade has witnessed an explosion of data produced by websites such as Twitter, Facebook, Google, and Wikipedia, the mass digitization of administrative and historical records, and the rapid expansion of mobile technology into nearly every corner of our lives. A new wave of techniques for collecting, classifying, and analyzing these data hold enormous potential to address many of the most urgent questions in social science: How do diseases spread? What causes financial meltdowns? How did America become so politically polarized? This course surveys the nascent inter-disciplinary field of computational social science, which combines insights from computer and information science, sociology and social network analysis, economics, political science, and public health in order to answer such questions.

Course Prerequisites

This course requires a basic working knowledge of the R programming language. If you do not yet have such training, I invite you to check out my computational social science “boot camp” videos.

Course Goals

Students will learn to ask social science questions, and learn how to answer them by collecting data from digital sources such as social media sites. Students will also acquire advanced skills in automated text analysis, application programming interfaces, and the R programming language.

Course Format

This course will be held entirely online due to the COVID-19 pandemic. In order to provide maximum flexibility to students as we all navigate this unprecedented challenge, the course combines pre-recorded, asynchronous, lectures with weekly small group discussions. Nearly all of these lectures focus on a skill in computational social science (e.g. collecting data from Twitter) and the group discussions are designed to facilitate conversations of required readings each week that showcase how these techniques can be applied to answer social science questions. Because of the length of the pre-recoded videos– and to prevent Zoom fatigue– we will limit our group conversations to one hour each week. If you need one-on-one help with questions aboud coding or techniques, you can either a) ask each other (or me) on our course’s Slack workspace (learn more about slack below), or b) visit my virtual office hours (by appointment).

Course Assignments

The only assignment for this class is a final project that examines some dimension of computational social science that involves at least 6,500 words of written material. My goal is to make this class as useful to you as possible while you navigate whatever stage of the graduate school process you are in.For example, I welcome students using this class as an opportunity to add a substantial new computational component to a paper already in progress, draft a research proposal for a dissertation (or grant), or start an entirely new project. I offer this flexibility in order to help you focus on producing high quality publications as soon as possible in graduate school given the considerable competition for academic jobs at the moment. If you do not plan to pursue an academic job, I am also open to other ways of adopting your project for this class into a “portfolio” project for future employers as well. Final Projects are Due by 5pm EST on April 30th.

Grading

100% of your grade will be assessed based upon your final project. Your final project will be graded according to the overall quality of the research and writing presented therein. The best final projects will a) ask a research question relevant computational social science; b) explain why this topic is important (to social science and/or the world); c) develop at least one hypothesis to answer this question (unless you convincingly argue for the need for purely descriptive research); d) collect data that allows you to test this hypothesis; and e) describe whether or not your hypothesis was confirmed, and what implications this should have for people who want to do future research on your topic. Whether you find support for your hypothesis will not effect your grade. Instead, you will be evaluated based upon a) the quality of the research question you ask and the hypotheses you develop; and, b) the quality of the data collection and analysis. If your analysis does not support your hypothesis– and your hypothesis is a well-founded one– then I consider this to be an important finding. There will be no extra credit assignments for this class. Students who submit final projects after the due date will receive an incomplete or failing grade.

General Course Policies

The Duke Compact recognizes our shared responsibility for our collective health and well-being. Please be reminded that by signing your name to this pledge, you have acknowledged that you understand the conditions for being on campus (if you are on campus this semester). These include complying with university, state, and local requirements and acting to protect yourself and those around you. For complete language and updated policies, please visit this link

Academic Integrity/the DCS

All students, whether residing on campus or learning remotely, must adhere to the Duke Community Standard (DCS): Duke University is a community dedicated to scholarship, leadership, and service and to the principles of honesty, fairness, and accountability. Citizens of this community commit to reflect upon these principles in all academic and non-academic endeavors, and to protect and promote a culture of integrity. Plagiarism, cheating or other violations will be dealt with according to University policy. All student assignments will be processed by plagiarism detection software.

Mental Health and Wellness

We are living through unprecedented times that are creating tremendous challenges for everyone. If your mental health concerns and/or stressful events negatively affect your daily emotional state, academic performance, or ability to participate in your daily activities, many resources are available to you, including ones listed below. Duke encourages all students to access these resources, particularly as we navigate the transition and emotions associated with this time. Duke Student Government has worked with DukeReach and student advocates to create the Fall 2020 “Two-Click Support” Form, and Duke Reach has expanded its drop in hours as well.

DukeReach. Provides comprehensive outreach services to identify and support students in managing all aspects of wellbeing.

Counseling and Psychological Services (CAPS). CAPS services include individual, group, and couples counseling services, health coaching, psychiatric services, and workshops and discussions. They can be reached at (919) 660-1000

Blue Devils Care. A convenient and cost-effective way for Duke students to receive 24/7 mental health support through TalkNow. Managing daily stress and self-care are also important to well-being.

Duke offers several resources for students to both seek assistance on coursework and improve overall wellness, some of which are listed below and described in more detail at this link

• The Academic Resource Center: (919) 684-5917, , or arc.duke.edu,
• DuWell: (919) 681-8421, , or https://studentaffairs.duke.edu/duwell)
• WellTrack: https://app.welltrack.com/

Accessibility

In addition to accessibility issues experienced during the typical academic year, I recognize that remote learning may present additional challenges. Students may be experiencing unreliable wi-fi, lack of access to quiet study spaces, varied time-zones, or additional responsibilities while studying at home. If you are experiencing these or other difficulties, please contact me to discuss possible accommodations.

Technology Accommodations Students with demonstrated high financial need who may have limited access to computers and stable internet may request assistance in the form of loaner laptops and WIFI hotspots. For new Spring 2021 technology assistance requests, please go here. Please note that supplies are limited. For updates, please visit this link.

Academic Accommodations The Student Disability Access Office (SDAO) will continue to be available to ensure that students are able to engage with their courses and related assignments. Students should be in touch with the Student Disability Access Office to request or update accommodations under these circumstances. Zoom has the ability to provide live closed captioning. If you are not seeing this, and but would like to see this feature, please reach out to Duke OIT for assistance.

Accommodations for Remote Students If you are unable to attend one of our group meetings, please contact me and we can discuss how to accommodate your needs during this very challenging time.

Syllabus

I reserve the right to make changes to the syllabus, including project due dates and test dates. These changes will be announced as early as possible and no later than one week before materials are due.

Help Me Make This Course Better

Creating high quality teaching materials is hard work! If you ever discover any errors or inconsistencies in the teaching materials on this site, please email me.

Resources

Below I have listed several resources which I hope might be helpful to you for this course and beyond (particularly if you want to pursue the study of text as data after this class).

RStudio Tutorials

In this class, we will use the R software, which is free and open-source. There are a variety of different ways to use R, but the most common way to do so is with the software RStudio, a free Graphical User Interface which you can either run on your laptop, or via a web server. R and RStudio are both supported by a vibrant community of individuals who have created a treasure-trove of learning resources online. Here is a link to some very helpful beginner tutorials, and this link also includes some intermediate and advanced tutorials if you really want to challenge yourself.

Stack Overflow

The field of computational social science is growing so rapidly that none of the resources I give you will remain at the cutting edge for long. You will almost certainly encounter issues unique to the data we collect as part of our group research project and/or incompatibilities between software packages and/or your computer. Stack Overflow is a website where computer programmers help each other solve such problems. Individuals ask questions, and others earn “reputation points” for solving their problems—these reputation points are awarded by the person who asks the question as well as other site users who vote upon the elegance/efficiency of each solution. For you, this reputation system means you can quickly identify the most high-quality solutions to your problems.

Twitter/Blogs

Many of the most important advances in computational social science appear first on Twitter or blogs. I therefore encourage you to open a Twitter account- if you don’t already have one- and follow the authors we read, or consider checking out the people I follow. Having a Twitter account will also come in handy for some of the exercises we do in class to collect data from Twitter. Of the many blogs that you might read, I recommend R Bloggers, which provides a concise overview of new functions in R as well as solutions to common problems faced by computational social scientists, as well as those in other fields.

Course Schedule

Introduction

January 24-30


Required reading:
Salganik, Matthew, Bit by Bit, Introduction & Observing Behavior
Lazer et al. Computational social science: Obstacles and opportunities, Science.

Suggested reading:
Lazer et al. Computational Social Science, Science.
Lazer et al. Life in the network: the coming age of computational social science, Science.
Watts, Duncan. Should social science be more solution-oriented?, Nature Blumenstock et al. Predicting Poverty and Wealth from Mobile Phone Data, Science. David Donoho. 50 Years of Data Science

Ethics

January 31-February 6

Required reading:
Salganik, Matthew, Bit by Bit, Ethics
Adam Kramer, Jamie Guillory, & Jeffrey Hancock. Emotional Contagion, PNAS.

Suggested reading:
Robinson Meyer. Everything We Know About Facebook’s Secret Mood Manipulation Experiment, the Atlantic
Alex Hanna and Meredith Whittaker. Timnit Gebru’s Exit From Google Exposes a Crisis in AI, Wired
Sendhil Mullainathan. Biased Algorithms Are Easier to Fix Than Biased People. New York Times

Application Programming Interfaces

February 7- February 13

Annotated code that describes procedures in video in more detail.

Required Reading

Munger, Kevin, and Joseph Phillips. 2020. Right-Wing YouTube: A Supply and Demand Perspective The International Journal of Press/Politics, 34.

Freelon, Deen. 2018. Computational Research in the Post-API Age Political Communication 35 (4): 665–68.

Suggested Reading

Askin, Noah, and Michael Mauskapf. 2017. What Makes Popular Culture Popular? Product Features and Optimal Differentiation in Music. American Sociological Review 82 (5): 910–44.

Pablo Barbera & Zachary C. Steinert-Threlkeld How to Use Social Media Data for Political Science Research

Screen Scraping

February 14- February 20

Annotated code that describes procedures in video in more detail.

Required Reading King, Gary, Jennifer Pan, and Margaret E. Roberts. 2013. How Censorship in China Allows Government Criticism but Silences Collective Expression American Political Science Review 107 (02): 326–43.

Chris Bail et al. Using Internet Search Data to examine the relationship between anti-Muslim and pro-ISIS sentiment in U.S. counties. Science Advances

An Introduction to Text Analysis

February 21- March 7


Annotated code part 1, Annotated code part 2, Annotated code part 3,

that describes procedures in video in more detail.

Required reading:

Justin Grimmer & Brandon Stewart. Text as Data: The Promises and Pitfalls of Automated Content Analysis, Political Analysis.
James Evans & Pedro Aceves. Machine Translation: Mining Text for Social Theory. Annual Review of Sociology.

Suggested reading:

DiMaggio, Paul. 2015. “Adapting Computational Text Analysis to Social Science (and Vice Versa).” Big Data & Society 2 (2)
Bo Pang, Lillian Lee, & Shivakumar Vaithyanathan. Thumbs up: Sentiment Classification using Machine Learning Techniques.
Kathleen Carley. Extracting Culture Through Textual Analysis. Poetics, 22:291-312.

Word2Vec

February 28- March 7

No Video Lecture this week, see annotated code instead.

Required reading:

Kozlowski et al. 2019. The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings American Sociological Review

Suggested reading:

Garg et al. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes

Spring Break

March 7- March 13

Topic Models

March 14- March 21

No group meeting this week because of “spring break”

Annotated code that describes procedures in video in more detail.

Required reading:
Blei, David M. 2012. Probabilistic Topic Models. Communications of the ACM
Roberts, Margaret, Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Kushner Gadarian, Bethany Albertson, and David Rand. 2014. Structural Topic Models for OpenEnded Survey Responses: Structural Topic Models for Survey Responses American Journal of Political Science 58 (4): 1064–82.

Suggested reading: Kozlowski, Austin, Matt Taddy, and James Evans. 2019. The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings American Sociological Review*

Davidson, Thomas, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. “Automated Hate Speech Detection and the Problem of Offensive Language.” In Proceedings of the 11th International Conference on Web and Social Media (ICWSM), 512–515.

Text Networks

March 21- March 28

Annotated code that describes procedures in video in more detail.

Required reading: Rule, Alix et al. 2015 Lexical shifts, substantive changes, and continuity in State of the Union discourse, 1790–2014
Bail, Christopher A. 2016. Combining natural language processing and network analysis to examine how advocacy organizations stimulate conversation on social media PNAS

Suggested reading Karell, Daniel, and Michael Freedman. 2019. Rhetorics of Radicalism American Sociological Review 84 (4): 726–53.
Smith, Steven et al. Automatic detection of influential actors in disinformation networks PNAS.
Stoltz, Dustin S, and Marshall A Taylor. 2019. “Textual Spanning: Finding Discursive Holes in Text Networks.” Socius.

Surveys in the Digital Age

March 28- April 3

Required Reading

Wang, Wei et al. Forecasting Elections with non-representative polls, International Journal of Forecasting

Suggested Reading Chris Bail et al. Assessing the Impact of the Russian Internet Research Agency’s Impact on the Political Attitudes and Behaviors of U.S. Twitter Users. PNAS

Online Experiments

April 4- April 10

Required Reading

Salganik, Matthew, Peter Sheridan Dodds, and Duncan Watts. 2006. Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science 311: 854–56.

Bail, Christopher et al. 2018.Exposure to Opposing Views on Social Media can Increase Political Polarization

Suggested Reading

Alexandra Siegal & Vivienne Badaan. #No2Sectarianism: Experimental Approaches to Reducing Sectarian Hate Speech Online.

Wellness Week

April 11- April 17

Our weekly discussion is cancelled this week. A list of wellness strategies and programs is available at this link and will be updated throughout the spring. Although the goal of Wellness Week 2021 is to provide time and space to engage in activities that enhance your well-being, please remember that wellness isn’t achieved in one day. Balancing your personal, professional, and academic commitments is a skill that should be practiced regularly and over time.

Building Apps

April 18- April 24

Required Reading

Bail, Christopher A. 2015. Taming Big Data: Using App Technology to Study Organizational Behavior on SocialMedia Sociological Methods and Research 46:2 189-21.

Presentations

April 25- May 1

During this last week I’ll ask each of you to give a 15 minute presentation about your project during our regularly-schedule group meeting

Final Papers are Due by 5pm EST on April 30th.