BigSurv20 program

Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December


Using alternative data sources and innovative methods to improve the National Survey of College Graduates

Moderator: John Voorheis ([email protected])
Slack link
Quick Zoom

Detailed zoom login information
Friday 13th November, 11:45 - 13:15 (ET, GMT-5)
8:45 - 10:15 (PT, GMT-8)
17:45 - 19:15 (CET, GMT+1)

Reducing panel survey attrition with big administrative data

Dr Brad Foster (US Census Bureau) - Presenting Author

New sources of big and administrative data offer researchers the opportunity to address perennial and intractable problems in longitudinal sample surveys, such as respondent attrition. While some survey attrition may be attributable to declining interest among respondents over time, it is usually not possible to distinguish those who fail to respond to subsequent waves due to lack of interest from those who simply moved to a new home after initial response and, therefore, never received the subsequent survey wave. As a result, the typical strategies for reducing attrition among migrating respondents, such as monetary incentives and/or multiple mail, email, or phone contact attempts by survey administrators, may miss their intended targets and increase the costs associated with longitudinal surveys. This research asks whether and how near population-level administrative data might be used to less invasively and more cost-effectively address survey attrition by distinguishing between migrant and non-migrant respondents.

This paper highlights ongoing research aimed at understanding and reducing attrition in a nationally representative biennial longitudinal survey of college-educated individuals in the U.S. – the National Survey of College Graduates (NSCG) – with administrative data from the Internal Revenue Service (IRS). Using anonymous identifiers developed by the U.S. Census Bureau, we link individuals sampled for the NSCG to their annual IRS tax records to develop a “residential history file” – a chronological list of known addresses – for the NSCG sample. NSCG respondents’ residential histories yield several novel insights into the relationships between attrition and migration. First, we confirm that those migrating after a given NSCG wave are less likely to respond in the future. Second, we model migration and attrition as a function of self-reported demographic, socioeconomic, and geographic characteristics to understand those who move and those who fail to respond to the NSCG. Broadly speaking, results here are consistent with the vast literature on migration and residential mobility in the U.S., but demonstrate the impact of migration on survey attrition among college-educated individuals.

These insights, in turn, inform an exploration of methods for utilizing IRS administrative data for applied use in future NSCG waves. Specifically, we use model outputs detailing the relationships between respondent characteristics, migration, and attrition to predict which NSCG respondents from a given wave will migrate and/or fail to respond in the subsequent wave. Then, we test the accuracy of these predictions using actual response data.

Evaluating administrative records as a potential sample frame for the national survey of college graduates

Dr John Voorheis (US Census Bureau) - Presenting Author

The National Survey of College Graduates (NSCG) is an important source of information on the education and career paths of the college graduate population in the United States. The NSCG currently uses the American Community Survey as a frame for sampling new respondents. There is interest, however, in whether other, alternative datasets may be superior sources for sample frame construction. In this paper, we examine the suitability of one of these datasets – the National Student Clearinghouse (NSC) – by comparing thecoverage and alignment of an extract of the NSC with the ACS (the current frame source). For the population under the age of 30 the NSC has excellent coverage compared to the ACS, with almost all college graduate ACS respondents having an enrollment record in the NSC. There is, however, some misalignment in graduation records. In particular, there are a substantial number of ACS respondents who report having a college degree who do not have a graduation record in the NSC, and conversely a substantial number of ACS respondents who do not report receiving a college degree who nonetheless have an NSC graduation record. Initial research suggests that although the NSC may be an important source for frame supplementation, it may not be possible to replace the current frame with the NSC outright.

Combating nonresponse: The progression of a survey data collection strategy using multiple methods and data sources

Dr Rachel Horwitz (US Census Bureau) - Presenting Author
Dr Beth Newman (US Census Bureau)
Dr Renee Reeves (US Census Bureau)
Dr Christine Bottini (US Census Bureau)

As surveys are experiencing declining response rates, researchers seek new ways to combat this trend. Often, researchers tweak one component of data collection at a time, like letter content or contact strategy (sequential, choice, etc.), in order to isolate the effect of change. However, there are many factors that go into motivating response and a holistic approach using all data sources available can simultaneously fight nonresponse, control costs, and redistribute resources. This research outlines a series of efforts using a variety of methods and data sources including focus groups, web survey paradata, production experiments, data collection metrics, and simulations studies to improve traditional data collection methods across multiple modes to increase survey appeal while reducing burden and cost.

In this presentation, we will start by describing the baseline data collection strategy and then outline the changes that have been made as the result of several experiments and simulations aimed at improving or maintaining response rates. The first set of experiments resulted from focus groups to update the letter format, letter content, and envelopes. Using the focus group feedback, the letter format was updated to include more white space, have a clear call to action and use bulleting to break up chunks of text, while the content focused on specific benefits of the survey and how the data can be used. The second set of experiments focused on how to use deadlines within our contact materials. We tested providing a deadline early in data collection or towards the end to allow us to determine if the deadline is more helpful in influencing early or late respondents. In addition to the timing of the deadline, we also tested where the deadline was displayed: only inside the letter or on the envelope and inside the letter, to inform whether a deadline on the envelope influences respondents to open letters and take action or if it is off-putting.

While conducting these experiments, we also incorporated both web and contact history paradata to improve the overall data collection strategy. We saw from web paradata that an increasing percent of respondents were using smartphones to complete the survey each cycle, so we updated the instrument to have a responsive web design. We also used the paradata to identify problematic screens and made changes to make the instrument less burdensome. Finally, we used a contact history instrument to evaluate what call outcomes rarely resulted in a completed interview (e.g. ring no answer, fast busy signal) and simulated the effect of limiting the number of those calls to sample cases. This effort not only reduces nonproductive calls, but it makes those resources available to contact sample cases that may be more likely to respond.

This presentation outlines the steps taken to conduct a holistic data collection redesign using multiple data sources, from initial contact using letters and telephone calls through the administration of the survey, to combat falling response rates and informs what changes and strategies were most successful.

Using the longitudinal employer-household dataset to inform measurement error properties of national survey of college graduates estimates

Dr Michaela Dillon (US Census Bureau) - Presenting Author

Using administrative records in survey operations can potentially improve data accuracy and survey operations. In this study, we link administrative data on earnings from the Longitudinal Employer-Household Dynamics (LEHD) dataset to the National Survey of College Graduates (NSCG) to understand the alignment of this administrative records information with respondent collected data. Around 50 percent of linked individuals report earnings in the NSCG that are within ten percent of their LEHD earnings. Large disagreement between linked values appears to be prevalent among high-earning individuals, and demonstrates some association with characteristics of low labor market attachment such as part time status and retirement age. We additionally are able to understand the alignment of firm characteristics reported in the NSCG and the LEHD. The agreement rates across NSCG firm characteristic topics range from about 25 percent (for firm size) to about 75 percent (for employment status). Given these results, LEHD data has significant potential to enhance employment status and industry information within NSCG, but may not be as useful for firm size and age.