BigSurv20 program

Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December


Embrace the trace? Exploring digital trace data 1

Moderator: Trent Buskirk ([email protected])
Slack link
Quick Zoom

Detailed zoom login information
Friday 13th November, 10:00 - 11:30 (ET, GMT-5)
7:00 - 8:30 (PT, GMT-8)
16:00 - 17:30 (CET, GMT+1)

Better understanding when and how social media posts can augment public opinion surveys

Professor Michael F. Schober (New School for Social Research) - Presenting Author
Professor Frederick G. Conrad (University of Michigan)
Ms Jessica Holzberg (US Census Bureau)
Dr Robyn A. Ferg (University of Michigan)
Mr Jonathan Katz (US Census Bureau)
Ms Jennifer H. Childs (US Census Bureau)
Dr Paul C. Beatty (US Census Bureau)
Dr Johann A Gagnon-Bartsch (University of Michigan)

Despite some early promising findings that analyses of social media posts—e.g. the sentiment of tweets containing particular key words—can align with survey results, recent systematic comparisons have failed to replicate some of those findings over broader time spans and with different analytic tools. At the same time, other new analyses demonstrate the usefulness of augmenting survey data with online data streams; for example, adding views of political candidates’ web pages to models predicting election outcomes can improve model fit. The question raised by the mixed pattern of findings is under what circumstances and with exactly which analytic approaches analyses of social media posts can reliably augment survey data collection. Systematic examination across a range of topic areas and data sources, and the development of new theory about posting and the dynamics of information flow in social media, are needed to understand which approaches will be trustworthy over time.

To this end, we report here focused explorations in the domain of public attitudes towards the 2020 US decennial census. We see this domain as holding particular promise of alignment between social media posts and survey findings because the topics the survey asks about may be more likely to be connected with the content of what people post about than in domains previously studied. Since awareness of and controversy about the census is high, there is a correspondingly large number of social media posts about the census, discussing, for example, public trust in government, concerns about privacy and confidentiality, willingness to participate in the census, and other perceptions of the census that have been identified as important areas for 2030 Census research. Because the Census Bureau tracks public attitudes towards the census using surveys such as the Census Barriers, Attitudes, and Motivators Study (CBAMS), the Gallup Daily Poll, and the 2020 Census Tracking Survey, there is a rich set of survey results to which analyses of social media posts can be compared.

We will report preliminary analyses examining the alignment of Twitter posts from January-September 2020 with key daily 2020 Census Tracking Survey estimates from the same period. We explore two propositions about when alignment is more and less likely: (1) for survey questions whose responses over time have a good signal-to-noise ratio (responses actually change over the time frame and sampling variability is low, and (2) using social media metrics well suited to question content. Given the range of topics that the tracking survey provides estimates on, our goal is to help researchers understand which kinds of topics—e.g., knowledge about the census vs. awareness of controversies vs. feelings towards the census—are more likely to yield alignment than others. In the longer term, these comparisons will allow post-2020-census analyses of how social media posts (in total and by subgroups)

Fully Automated and localized opinion polling: Real-time measurement of support for Brexit via Twitter

Mr Roberto Cerina (Nuffield College, University of Oxford) - Presenting Author
Mr Kayvane Shakerifar (Data Scientist at KPMG)
Mr Raymond Duch (Nuffield College, Oxford University)

Download presentation

In light of increasing availability of real-time, high frequency, politically-relevant opinions via social media, we propose a fully automated method to poll opinion using the Twitter Streaming API. We apply this method to a large sample of Tweets captured during the lead-up to the Brexit Referendum in the Spring of 2016. We show that a) Tweets and Twitter accounts contain plenty of usable voter characteristics, including signals about race, age gender and social status, the fundamental variables at play in modern politics; b) that post-stratified and modelled sentiment coming from Twitter is a strong signal of vote-choice. Our method innovates by improving the literature on Regularised Regression and Post-Stratification (RRP) with the use of variational inference via STAN; it further exploits advances in computer-vision and topic modelling to produce an automatically-derived feature set for each Twitter user expressing an opinion relative to the referendum. The power of the model to generate accurate features is tested against the popular M3 (Multimodal, Multilingual, and Multi-attribute) system. Our method is general enough that it would work on any individual-level sample with relevant text or visual information.

Gender portrayal on instagram

Dr Simon Kühne (Universität Bielefeld) - Presenting Author
Mr Dorian Tsolak (Universität Bielefeld)

In recent years, social media has been identified as an important source of digital trace data, reflecting real world behavior in an online environment. Many researchers have analyzed social media data, often text messages, to make inferences about people’s attitudes, opinions, and traits. Yet many of those are not saliently expressed but remain implicit. One example are gender role attitudes, that are hard to measure using textual data. In this regard, images posted on social media such as Instagram may be better suited to analyze the phenomenon. In this regard, existing sociological research has shown that men and women differ in how they portray themselves when being photographed (Goffman 1979, Kang 1997, Götz & Becker 2019, Tortajada et al. 2013), often reflecting what is stereotypically considered as masculine or feminine. Our study is concerned with the question how images from social media containing self-portraits can be harnessed as a measure of gender roles, display, and stereotypes.

We rely on about 800,000 images collected from Instagram in 2018. Contrary to qualitative and manual techniques applied in existing research, we present a new approach to quantify gender portrayal using automated image processing. We use a body pose detection algorithm to identify the 2-dimensional skeletons of persons within images. We then cluster these skeletons based on the similarity of their body pose.

As a result, we obtain a number of clusters that reflect gender typical poses and relate to the gender display types initially categorized by Goffman (1997) and Kang (1997). Examples of typical female body poses include S-shaped poses reflecting sexual appeal, the feminine touch (touching the own body or hair) implying insecurity, or asymmetric body posture representing fragility. Typical male body poses include the upper body facing the camera square to show strength, or a view aimed into the distance signifying pensiveness.

Our study provides an automated approach that allows for a quantitative measurement of gender stereotypes in self-portraits by examining body poses. Moreover, our results contribute to a better understanding of online & social media gender stereotype reproduction mechanisms.

Identifying depression related behaviour in Facebook – an experimental study

Dr Zoltán Kmetty (Eötvös Loránd University, Faculty of Social Sciences) - Presenting Author
Dr Károly Bozsonyi (Károli Gáspár University of the Reformed Church)

In the recent years, an increasing tendency of suicide has been observed in some western countries. These tendencies drew the attention of social- and medical scientists, and several studies were published, in which they tried to understand this phenomenon. One of the research streams who picked up on the increasing trends, started to analyze the online footprints of suicide and depression.
Our study uses a novel joint data source of combined Facebook and survey data. After an informed consent obtained, respondents were asked to log-in to FB on the interviewers’ notebook and to download their FB profile archive. 150 respondents took part in our study. the data covers a wide range of Facebook activities: posts, comments, likes and reactions, pages, friends, profile, and ads data. Besides sharing their Facebook data, participants had to fill out an online questionnaire. Questions about politics, media usage, self-representation. spare-time activities and music preferences were asked from the participants. Above that, we asked the participants to fill out a modified version of Patient Health Questionnaire (PHQ-9).

In this study we use this slightly modified PHQ-9 questionnaire module. Two indicators of depression are extracted by ML Factor Analysis based on the PHQ-9 questions, a cognitive depression scale (CDS) and a psychosomatic depression scale (SDS) one. From the FB data, we used the temporal dynamic of Facebook activity of the users and the ads interest categories. Facebook categorizes every user for sales for advertising. This is an algorithmic machine learning classification of the users based on their own likes, activities, and used keywords and also based on their friends' preferences.

For the temporal analysis we selected the 2019 and 2018 years of Facebook usage, and calculated the ratio of days with Facebook usage. Then we calculated two indicators: the first one was the difference between active day ratio in 2019 and 2018. The second indicator was the absolute value of the previous indicator. The calculations were done separately for posts, reactions on friends and reactions on pages.

The analysis found a moderate but significant correlation (0.33) between cognitive depression level and absolute change of Facebook posting activity. The detailed analysis showed that the effect was stronger for males than females. The effect was much smaller in the case of reactions on friend and we didn’t find significant relationship when we studied the reactions on groups data. Psychosomatic depression didn’t correlate with any of the measures.
The second part of the study focused on the possible correlation between FB generated ads interest categories and depression level. We selected those ads categories where at least 10 percent of our sample were categorized in (N=1561). We calculated the relationship of each category belonging and the two depression scales (based on ANOVA and ETA statistics). Then we searched for patterns within the results. One interesting thing came out. Those who scored low on psychosomatic depression scale had higher probability to categorized into finance related categories by Facebook, like banking or finance.

Someone to call on: studying social ties in action

Ms Carolina Mattsson (Northeastern University) - Presenting Author
Professor Drew Margolin (Cornell University)
Dr Stefan Wojcik (Twitter Research)
Professor David Lazer (Northeastern University)

Download presentation

Digital traces are reshaping the empirical data available on social ties with implications for social network theory and analysis. Most consequential for our understanding of social processes – but often overlooked – is that mobile phones, social media platforms, and other such sources keep a record of behavior at a level of granularity that lets us place social ties in context (Eagle, Pentland, Lazer, 2009). This means we can study social ties as they exist at particular moments in time, and how they affect behavior in particular situations. To do so, we must re-imagine how we collect, analyze, and interpret social network data.

We propose a methodology based on collecting “hybrid” data about social ties in contexts where those ties come into play. The idea is to measure both behavioral and survey-solicited aspects of social ties at a particular point in time, and relate these to outcomes of interest within that particular situation. Our smartphone app collected mobile communication history and social network survey data about ties relevant to people affected by the Boston Marathon bombings (April 15th, 2013) alongside contextual features and relevant outcomes: crisis communication, providing help, sharing information, and emotional support over distance.

Context-specific hybrid data on social ties are rich, complex, and messy; this complicates both analysis and interpretation. For analysis, we bring in statistical learning techniques developed for behavioral prediction in data science (Hastie, Tibshirani & Friedman, 2013). These methods are built to handle complicated data, and they offer a useful analytical lens: to what extent is behavior predictable? For interpretation, we lean on recent developments in social network theory. There is growing recognition that social ties are neither simple nor static; one aught to consider carefully how to define social ties (Kitts & Quintane, 2019) and recognize that social behavior is highly contextual (Small, 2017). Particular role relations, recent interactions, or even the relative location between people might become especially salient in some contexts.

We find that we can learn something about crisis communication, help, and support after the Boston Marathon bombings from our hybrid data. On the other hand, the same hybrid social network data are entirely uninformative with respect to sharing information about the bombings. All three "facets" of ties about which we collect data (role relations, behavioral interaction, and context) independently affected behavior in the immediate aftermath of the Boston Marathon bombings. However, role relations and contextual factors lose relevance later in the day; speaking on the phone over the past month is the most informative for predicting speaking on the phone following this salient event.

Our methodology builds on the idea that digital trace data can help us understand social behavior in specific contexts. We can study the activation of social ties in different situations if we take an expanded view of how to measure those ties. Our study combines data on contextual social actions with data on relevant social ties from both behavioral and survey sources---we find different “facets” of social ties become relevant in different contexts.