BigSurv20 program

Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December


I'm sensing there's even more APPs for that!

Moderator: Jan Karem Hoehne ([email protected])
Slack link
Quick Zoom

Detailed zoom login information
Friday 13th November, 11:45 - 13:15 (ET, GMT-5)
8:45 - 10:15 (PT, GMT-8)
17:45 - 19:15 (CET, GMT+1)

Mind the gap: Addressing missingness both small and large in longitudinal GPS data

Ms Danielle McCool (Utrecht University, Statistics Netherlands) - Presenting Author
Mr Peter Lugtig (Utrecht University)
Mr Barry Schouten (Statistics Netherlands)

Data collected continuously from sensors such as wearables and mobile devices is a rich source of information for survey researchers but is subject to different patterns of missing data than traditional surveys. Respondents may miss anywhere from minutes to days of data, and it often distributed unequally across both time and persons. This paper develops methods for adequately handling both extremes of data and investigates the balance between them under differing circumstances.

Integration of sensor data into survey research is becoming more common as devices facilitating the collection of continuous data gain widespread usage. In transportation research, tracking GPS coordinates can give a very detailed picture of a person's travel behavior, while in health research, tracking a respondent's heart rate can offer more insight into his or her fitness level. When the focus is on longitudinal data collection of continuous variables, researchers must rely heavily on device functioning and user compliance in order to obtain uninterrupted streams of data. Often one or both of these factors compromises the complete data, leading to periods of missing data. This paper investigates methodology for addressing this missingness, using data collected from the Statistics Netherlands travel diary app in 2018. Respondents participated in data collection by downloading an application onto their own mobile device and allowing it to send their longitudinal coordinates over a period of seven days.

The data manifest varied patterns of missingness. Users frequently demonstrate lapses in coordinate acquisition for periods of up to ten minutes, but their reliable locations and trajectories both before and after this gap allow us to reconstruct movement behavior with minimal error. At other times, we find larger gaps within the data, where users record no data for a full 24 hours or more. This gap is too large to accommodate fine-grained reconstruction of location history and must therefore be addressed with imputation of day- or user-level statistics. This missingness is often related to travel behavior itself in both direct and indirect ways, necessitating approaches that take this into consideration. In this paper, we address options for handling the middle ground between brief and extended missingness, as well as provide methods for estimation of error under a variety of contexts.

Smartphone-based travel surveys: An example of controlled data fusion

Mr Mark Bradley (RSG) - Presenting Author
Mr Jeff Doyle (RSG)

Household travel surveys are used to get a snapshot of regional or national travel patterns, and to estimate predictive models of travel behavior. Traditionally, these surveys have been done using travel diary forms to aid in recall, with data collection via mail-back forms, telephone interviewers, or online software. Over the past few years, increasing use has been made of smartphone apps that track peoples’ trips using the phone’s location-based services, and also show respondents their trips on a map interface and ask related questions such as travel purpose, mode of travel, and number of co-travelers. Currently, the majority of regional household travel surveys in major metropolitan areas of the U.S. are done via smartphone over a seven day period (with the small percentage of households who do not own smartphones still using the diary-based methods). The data from a smartphone-based travel survey is a unique combination of passive data (point traces of locations and times) and active survey data (self-reported modes, purposes, etc.). Compared to diary-based methods, smartphone-based methods provide more accurate trip times and locations, and have much less under-reporting and recall bias, capturing about 20% more trips per person-day. There are also unique challenges for smartphone-based methods because in some cases the respondent’s reported trip characteristics do not match the trace data. For example, the person may report a walk trip which the trace data indicates covered 30 kilometers at a speed of 60 kph. Or, the person may say they went home, when the location for the trip end is clearly not the home location. Over time, the app user interfaces can be made clearer and easier to use to avoid such inconsistencies. In the meantime, imputation methods have been developed to impute trip purposes and modes in cases where the reported data is suspect. The methods are a combination of rule-based algorithms, sampling algorithms, and discrete-choice probability models. The methods can be used not only to “correct” suspect data, but to impute data for missing cases where respondents did not complete some of the trip-specific surveys. We term this “controlled” data fusion because there is a tight match at the respondent level between the passive data and the self-reported data. However, some of the methods developed may be extended to data fusion with passive “big data” from the wider population (or to a much longer data collection period that shifts over time from active surveying towards purely passive data collection from the same individuals.) The paper gives background on the smartphone-based survey approach and experiences, describes the fusion/imputation methods developed and applied thus far, and discusses future directions for development.

Using linked survey, administrative and geospatial data to measure exposure to and the impacts of the 2019/20 Australian bushfires

Professor Nicholas Biddle (Australian National University) - Presenting Author
Dr Ben Edwards (Australian National University)

The bushfires (wildfires) that occurred over the 2019/20 Australian spring and summer were unprecedented in scale and wide in their geographic impact. They are estimated to have been the largest bushfires ever to have occurred over a single season (anywhere in the world). At the time of writing, more than 11 million hectares (110,000 sq km or 27.2 million acres) have been burnt and according to the Australian Academy of Science, Australia appears to have lost over a billion birds, mammals and reptiles, with additional loss of life of insects, amphibians and fish. The fires are truly unprecedented in scale, global in impact, and appear to have had wide ranging political and attitudinal impacts.

Between 20 January and 3 February 2020, the ANU Centre for Social Research and Methods and the Social Research Centre collected data from more than three thousand Australian adults about their exposure to the bushfires as part of the ANUpoll series of surveys, as well as a range of other attitudes and beliefs. We estimate that the vast majority of Australians (78.6 per cent) were impacted in one way or another either directly, through their family/friends, or through the physical effects of smoke. Furthermore, we estimate that around 14.4 per cent adult Australians had their property damaged, their property threatened, or had to be evacuated. This is the first estimate of self-reported impacts on that scale from a nationally representative, probability-based survey.

In this paper, we combine information (at the individual and the area level) from the ANUpoll survey, the 2016 Census of Population and Housing, the National Visitor Survey, air quality data, and fire mapping data. Our findings show a strong, but not complete correlation between subjective and objective measures of exposure, with the discrepancy explained by temporary mobility and explainable biases in reporting. We also show that subjective wellbeing amongst the Australian population has declined since the start of spring 2019, people are less satisfied with the direction of the country, and have less confidence in the Federal Government. People are more likely, however, to think that the environment and climate change are issues and a potential threat to them, with a significant decline in the proportion of people who support new coal mines. By linking individuals through time, we are also able to show that some of these changes are attributable to exposure to the bushfires (as measured by self-report and geo-spatial data).

Multiple imputation techniques to handle visibility bias in count data with excess zeros

Ms Shalima Zalsha (Southern Methodist University) - Presenting Author

Measurement error and missing data are two common problems in wildlife population surveys. These data are collected from the environment and may be missing or measured with error when the observer’s ability to see the animal is obscured. Visual census methods such as video transects for estimating red snapper abundance in the Gulf of Mexico are highly affected by these problems since abundance will be underestimated if missing and mismeasured counts are ignored. We shall refer to this problem as visibility bias; it occurs when the true counts of animals are observed when visibility is high, partially observed when visibility is low (mismeasured), and unobservable when there is no visibility (missing).

Visibility bias is a form of measurement error and can be corrected using validation or auxiliary data. Auxiliary information can be obtained from various underwater instruments such as a device used to measure water clarity and an acoustic device used to detect fish when visibility is lost or degraded. Information from these instruments can be combined with the observed count data from the video transects to correct visibility bias. Furthermore, data from wildlife population surveys are often zero-inflated (sparse) since not all sampled regions are inhabited by the species. Therefore, visibility bias correction techniques must also preserve the structural zeros of the data.

In this study, we examined several imputation techniques to correct visibility bias in count data with excess zeros using auxiliary information collected with multiple instruments. Methods considered in this study include the off-the-shelf imputation methods such as predictive mean matching, normal, Poisson, and zero inflated Poisson imputation. In addition, a modified hot deck imputation and Bayesian hierarchical models were also developed to specifically handle visibility bias. The hot deck approach was modified to accommodate, not only missing data, but also measurement error. The Bayesian hierarchical model is a two-level hierarchical model which specifies the count data distribution in the first level, and visibility model in the second level.

We performed a simulation study to examine the methods’ performance for estimating total abundance and habitat occupancy rate (non-zero proportion). The methods were also evaluated for their ability to preserve the discreteness and structural zeros of the data. The results suggest that Bayesian hierarchical model is the best correction method when visibility model is correctly specified or when the data is highly sparse while predictive mean matching is robust and outperforms the other methods when the data is not highly sparse.