BigSurv20 program


Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December

Back

Responsive design using external information and modern prediction

Moderator: Michael Elliot (mrelliot@umich.edu)
Slack link
Quick Zoom

Detailed zoom login information
Friday 13th November, 10:00 - 11:30 (ET, GMT-5)
7:00 - 8:30 (PT, GMT-8)
16:00 - 17:30 (CET, GMT+1)

Improving Bayesian prediction of daily response propensity in responsive design with data-driven priors

Professor Brady West (University of Michigan) - Presenting Author
Professor James Wagner (University of Michigan)
Ms Stephanie Coffey (University of Maryland)
Professor Michael Elliott (University of Michigan)

Download presentation

Responsive Survey Design (RSD) aims to increase the efficiency of survey data collection via live monitoring of paradata and the introduction of protocol changes when survey errors and increased costs seem imminent. Daily response propensities for all active sampled cases are among the most important parameters for live monitoring of data collection outcomes, making sound predictions of these propensities essential for the success of RSD. Because it relies on real-time updates of prior beliefs about key design parameters like response propensity, RSD stands to benefit from Bayesian approaches. However, empirical evidence of the merits of these approaches is lacking in the literature, and elicitation of informative prior distributions is required for their effectiveness. We evaluate the ability of two data-driven prior elicitation approaches to improve predictions of daily response propensity in a real data collection employing RSD: analyzing historical data from the most recent field period of an ongoing data collection, and combining historical data from multiple recent field periods to form precision-weighted prior distributions. In both approaches, we consider a high-dimensional model of daily response propensity utilizing a variety of paradata, commercial data, and other auxiliary information, and we discuss issues related to the formation of multivariate normal prior distributions for regression coefficients in this setting.



Alternative modeling approaches to predicting costs in a responsive design framework

Professor James Wagner (University of Michigan) - Presenting Author
Professor Michael Elliott (University of Michigan)
Professor Brady West (University of Michigan)
Ms Stephanie Coffey (University of Maryland)

Download presentation

Responsive survey designs (RSD) rely upon predictions of costs and errors in order to make decisions about design changes. In the RSD setting, accurate predictions will lead to more efficient designs. There has been a great deal of effort to understand the errors side of the equation, with a particular emphasis on the use of balance indicators such as the R-Indicator. On the other hand, there has been very little emphasis on measuring or predicting costs. Inaccurate predictions of costs can yield less efficient designs. We consider two modeling approaches to the predictions of costs in field studies. These costs are predicted using Bayesian Additive Regression Trees (BART) and multi-level models. We illustrate the use of predictions from these models as inputs to making decisions about two different design features – stopping effort or changing several features of the design including incentive and mode.



Leveraging expert opinion and external data to improve prediction of daily response propensity in a Bayesian framework for responsive survey design

Ms Stephanie Coffey (University of Maryland) - Presenting Author
Professor Brady West (University of Michigan)
Professor James Wagner (University of Michigan)
Professor Michael Elliott (University of Michigan)

Responsive and adaptive survey designs rely on predictions of expected outcomes, including response propensity, in order to make data collection interventions or protocol changes. Unfortunately, predictions based only on partial data from the current round of data collection can be biased, leading to ineffective tailoring. Bayesian approaches can help avoid this bias by using prior beliefs generated from external data to supplement current round paradata. The elicitation of the prior beliefs, then, is an important characteristic of these approaches. While historical data for the same or a similar survey may be the most natural source for generating priors, other sources may be required when historical data are not available.
Here, we evaluate two potential sources for generating priors for the coefficients in a response propensity model: a literature review, and expert elicitation from experienced survey managers. Priors for the literature review method were generated by aggregating coefficients and standard errors found in survey methodological and statistical literature that were similar to those in our response propensity model. Separately, we fielded a questionnaire to survey managers from two organizations, asking about expected attempt-level response rates for different subgroups of cases, and developed prior distributions for attempt-level response propensity model coefficients based on the mean and standard error of their responses. We evaluated both methods using respondent data from a real survey (enabling calculations of expected response propensity if all call attempt data were available) by comparing the predictions of response propensity generated when each type of prior was utilized to those based on a standard method that considers accumulating paradata only, as well as a method that incorporates historical survey data into a prior.



Towards a fully integrated responsive survey design methodology

Ms Stephanie Coffey (University of Maryland)
Professor Michael Elliott (University of Michigan) - Presenting Author
Professor James Wagner (University of Michigan)
Professor Brady West (University of Michigan)

Much current work on responsive survey design focuses on developing and improving methods to forecast ongoing costs and response propensities in real time, using paradata or preliminary response data, possibly in combination with data from previous waves or other outside sources. Here we attempt to develop a practical methodology for full integration of responsive survey design methodologies, by 1) using Bayesian methods to combine prior information with incoming data in these computations, and 2) providing survey managers with a cost-efficient decision rule for applying different design protocols to a subsample of cases in a manner that will reduce non-response bias in key survey estimates. Rather than specify that a phase will end at a fixed point in time, we will update our prior assumptions about when phase capacity has been reached. We apply this proposed methodology to the 2019 National Survey of College Graduates, a survey of recent US college graduates that implements a web/mail/CATI data collection mode pattern. We will compare the results with respect to cost and estimated mean error from our proposed methodology, applied to a random subsample, to that of the remaining sample whose data collection has been done under standard “ad-hoc” methods.

Explaining and predicting web survey response with time-varying factors and incidental data

Mr Qixiang Fang (Utrecht University) - Presenting Author
Dr Joep Burger (Statistics Netherlands)
Dr Ralph Meijers (Statistics Netherlands)
Dr Kees van Berkel (Statistics Netherlands)

Web surveys tend to suffer from lower response rates, which can be influenced not only by factors (like gender, occupation and marital status) that stay relatively fixed during a survey data collection period but also by factors (like day of a week, holiday and weather) subject to change within a short period of time. In this paper, we investigate the effects of such time-varying factors on the daily response odds during the web mode of the 2016 and 2017 Dutch Health Surveys. Specifically, we look at the following time-varying factors: number of days since the start of the survey, day of a week, survey phase (e.g. invitation phase, reminder phase), weather (e.g. temperature, cloudiness, precipitation, wind speed, sunshine duration, air pressure and humidity) and indicators of daily societal trends (e.g. signs of disease outbreaks, public outdoor engagement and degree of societal concern over data privacy) that we carefully construct from Google search history. We apply the regularised machine learning version of discrete-time survival analysis to the data and find that, among others, Monday is a strong positive predictor of daily survey response odds, while Saturday and Sunday are very negative ones. More pleasant weather (e.g. higher temperature, less rain, longer sunshine duration and higher air pressure), signs of disease outbreaks and public outdoor engagement predict lower response odds. In addition, we show that the use of these time-varying predictors alone achieves satisfactory prediction accuracy of both daily response probabilities and cumulative response rates when a model based on some training data is applied to unseen "future" data. Furthermore, many of these variables contribute positively to the model's prediction performance. Last but not least, we compare the effects of these time-varying factors with common time-fixed factors such as country of origin, marital status and household types and show that both types of factors have comparable explanatory power (i.e. effect size) on survey response. Our findings have several implications. First, we can make practical data collection recommendations to survey researchers, such as avoiding administering web surveys during the weekend or in good weather. Second, we showcase a modelling framework (discrete-time survival analysis) where both time-fixed and time-varying factors can be incorporated at the same time. In the presence of many predictors, a regularised version of the technique can be adopted. Third, the finding that the use of non-personal time-varying factors in a prediction model alone can achieve satisfactory prediction is very promising, especially considering the fact that under GDPR, access to personal data becomes significantly more difficult. Last but not least, we demonstrate the usefulness of incidental data (like weather records and Google search history) in applied social science research.