BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Leveraging Big Data for Improving Health Research… the Follow-up Visit

Chair Professor Naja Rod (Section of Epidemiology, University of Copenhagen)
TimeSaturday 27th October, 11:00 - 12:30
Room: 40.012

Smartphone Interactions and Mental Well-Being in Young Adults - A Longitudinal Study Based on High-Resolution Smartphone Data

Miss Agnete Skovlund Dissing (Copenhagen University) - Presenting Author
Professor Naja Hulvej Rod (Copenhagen University)
Professor Thomas Alexander Gertds (Copenhagen University)
Dr Rikke Lund (Copenhagen University)

Objectives: To investigate the effects of objectively measured smartphone interactions (network size of interactions, frequency and duration of interactions) on indicators of mental well-being in a population of young adults.

Methods: We used data from the Copenhagen Network Study. 816 college students (mean age 21.3, SD 2.7, and 77% males) were followed with continuous recordings of smartphone interactions from calls, texts and social media site Facebook. Participants also self-reported on indicators of mental well-being: loneliness, depressive symptoms and disturbed sleep in a baseline and in a follow-up survey approximately four months later. Multiple linear regression models were used to adjust for confounders and previous levels of mental well-being.

Results: Higher number of smartphone interactions was associated with lower levels of loneliness, depressive symptoms and less disturbed sleep. These associations attenuated over time, but social network size of smartphone interactions appeared to be consistently related to lower levels of loneliness and disturbed sleep over time.

Conclusion: Smartphone interactions are related to increased mental well-being. A higher level of smartphone interactions might indicate a larger underlying social network, which in itself may be protective.


How to Operationalize Adaptive Sampling Along With Various Big Data Phenotypic-Neurologic-Ecological-Genotypic Elements Across a Multi-Site Trauma-Based Prospective Data Collection, the AURORA Cooperative Agreement

Mr Charlie Knott (RTI International) - Presenting Author
Mr Steve Gomori (RTI International)
Mr Mai Nguyen (RTI International)
Ms Sue Pedrazzani (RTI International)
Mrs Sridevi Sattaluri (RTI International)
Mr Thomas Walker (RTI International)

Download presentation

Combining self-report survey data with other sources such as passive data from wearable devices and biophysical data can provide a comprehensive investigation of research aims. Yet, such a complex study presents challenges for ensuring complete and accurate collection of all sources of data. This presentation will discuss key aspects of the information management system architecture, interfaces, and administrative data linkages in a year-long clinical study that results in the most effective centralized data collection, processing, linkage, reporting, and analytic engine.

The AURORA Study is a national initiative to improve the understanding, prevention, and treatment of posttraumatic neuropsychiatric sequelae. Trauma survivors, ages 18-69, are enrolled through emergency departments in the immediate aftermath of trauma in a 1-year follow up that involves complete physiologic, biologic, neurocognitive, symptom and health outcome assessments. Data collection includes web-based surveys, neurocognitive assessments, ecological monitoring via a smartphone app and via a Verily Watch, biospecimens, and for a subsample, in-person deep phenotyping assessments. Approximately 260 separate data collection events are obtained for each participant over the study year.

This presentation will address the following gaps and big data challenges. First, how to document informed consent, and collect (A) prospective phenotypic variables across 9 multi-mode surveys, (B) ecological monitoring via Verily Watch, (C) physiologic, (D) biologic, (E) neurocognitive, (F) symptom, and (G) health outcome assessments across multiple time points over a 1-year period in a diverse, heterogeneous population (5,000 across the United States) that suffered a traumatic event to be considered eligible. Second, the Participant Contact Protocol requires sending emails and SMS messages by specific windows to invite/remind participants of their ~250 follow-up events and tracking event completion to maximize cooperation and response rates. Third, the system maintains blackout periods, event histories, calling notes, incentive payment history as well as projected future payment dates. Finally, the system architecture must be flexible to adapt to protocol changes in the multi-site nationwide trauma-based enrollment protocol to meet scientific aims and objectives.

Solutions: RTI developed a sophisticated in-house Control System to support primary data collection projects. For the AURORA cooperative agreement, system enhancements were designed to implement a data-driven event model to meet these collective primary big data collection challenges. Interfaces were critical to communicate across various user groups (i.e., participants, follow-up coordinators, clinic PIs and research assistants, laboratory personnel, application developers, and scientific leadership).

AURORA is one of the first NIH-funded programs to incorporate and integrate via adaptive sampling new sources of Big Data such as Verily Watch, TestMyBrain neurocognitive assessments, and Mindstrong Health app flash surveys along with functional MRI, neurological assessments, and complex phenotypic constructs. In our presentation, we will provide details of the complex contacting protocol requirements, an overview of the data-driven event model, and the GUI (web pages) for viewing, tracking and supporting the study operations including big data elements.


Applying a Geospatial Big Data Approach to Survey Data: The Next Stage in Population Health Studies

In review process for the special issue

Dr Eileen Avery (Department of Sociology, University of Missouri ) - Presenting Author
Mr Timothy Haithcoat (University of Missouri Informatics Institute)
Dr Richard Hammer (Department of Pathology and Laboratory Medicine, University of Missouri)
Dr Chi-Ren Shyu (Electrical Engineering and Computer Science, University of Missouri)

Download presentation

Population health studies increasingly require vast amounts of integrated data because the questions asked are more complex and interdisciplinary than ever before and are explicitly or implicitly tied to understanding context. Yet, accessing a variety of relevant contextual variables remains a tedious process, wasting researchers’ time, while effort is duplicated compiling and integrating information.

Further, although extant work using multilevel modeling has expanded our understanding of the ways characteristics of built, social, and physical environments are associated with disparities in health, many of these studies are limited to specific geographic areas and have limited sample sizes and/or health outcomes. Larger, longitudinal, neighborhood-based surveys are increasingly rare in a competitive funding environment. For these reasons, future work may more commonly utilize restricted data from large scale national surveys. These surveys will benefit from data tools that can efficiently provide expanded measures of context across a variety of different layers.

We aim to illustrate the ways that large scale, traditional surveys can be combined with a big data tool to greatly expand the scope of population health research in efficient ways.

In particular, we use the geospatial health context cube (GHCC), an informatics big data approach to address the lack of an integrated data framework. The cube consists of data that is pre-processed, cleaned, integrated and represented in its spatial context. The cube’s core is functional social science data that is pre-processed, cleaned, standardized, and integrated. Environmental, infrastructure, cultural, economic, as well as geospatially derived data (i.e. isolation, accessibility) are added to provide further context.

We use survey data from the Behavioral Risk Factor Surveillance System (BRFSS), a state-based cross-sectional telephone survey of the noninstitutionalized adult population conducted annually by the Centers for Disease Control. The data contain measures of individuals’ health behaviors, social environments, and health outcomes across a range of diseases and conditions. The goal is for the BRFSS to have 4,000 interviews per state per year but in many cases the sample size is much larger. We use publicly available data at the state level and at the metropolitan level (where available) to examine contextual effects. We use smaller units of geography to the extent access is granted.

Using the GHCC and the BRFSS, we examine how relevant characteristics in the built (e.g. intersection density), environmental (e.g. pollution), social (e.g. segregation), and economic (e.g. income inequality) environments are associated with disparities in health behaviors (e.g. physical activity), perceived physical and mental health (e.g. self-rated health, depressive symptomology), chronic conditions (e.g. obesity), and wellbeing (e.g. happiness) and how they interact with individual characteristics to predict these outcomes.

The discussion will focus on substantive results as well as furthering our understanding of how a spatially integrated single Big Data table can be used to benefit researchers who want to expand upon traditional survey work by engaging in more detailed and powerful analysis, such as association mining, while utilizing spatial and visual analytical tools.


Classifying Health Insurance Type From Survey Responses Using Enrollment Data

Ms Joanne Pascale (US Census Bureau) - Presenting Author
Ms Kathleen Call (SHADAC)
Ms Angela Fertig (University of Minnesota)
Mr Don Oellerich (US Department of Health and Human Services)

Download presentation

Challenges in measuring health insurance in the U.S. through surveys has been well-documented since the 1980s and the Affordable Care Act (ACA), implemented in 2014, added considerable complexity to the task of accurately categorizing health coverage from surveys. One of the primary reasons for the difficulty (even prior to the ACA) is that most surveys ask discrete questions about specific types of coverage (e.g., employer-sponsored insurance (aka ESI), Medicaid, Medicare), and they rely on a single household respondent to answer the series of questions on behalf of all household members. Research identified an association between the difficulty of the reporting task and measurement error, with respondents misreporting one type of coverage for another, reporting the same coverage twice, and failing to report coverage altogether. As a result, one of the federal surveys most commonly used for health insurance estimates – the Current Population Survey (CPS) – was recently redesigned in an attempt to make the series more user-friendly for respondents and less prone to measurement error. The series departs from the usual set of yes/no questions on discrete types of coverage and instead begins with a question on general source of coverage (job, government or other). Follow up questions tailored to the source (e.g., type of government program, program name, premium, subsidization) are then asked to capture the detail needed to derive coverage type.

While research indicates the redesigned series did reduce measurement error, it produced a new challenge: figuring out how to combine answers across several survey items to classify coverage type. The objective of this research is to inform development of an algorithm for combining answers to questions about features of health insurance from this newly-redesigned module in order to maximize accurate categorization of coverage type. Data come from the CHIME study (Comparing Health Insurance Measurement Error), a reverse record check study in which households with individuals enrolled in a range of public and private health insurance plans (including the marketplace) were administered a telephone survey that included the CPS health module. After the survey and records data were linked, answers to survey questions about the characteristics of coverage (e.g., general source of coverage, program name, premium, subsidization) were examined in relation to the coverage type indicated by the record. A machine learning approach was used to develop three alternative algorithms to categorize coverage type – one skewing toward public coverage in ambiguous cases, one skewing toward marketplace and one in-between. Three different accuracy metrics were calculated for each algorithm: sensitivity, predictive power and prevalence. Results varied slightly across algorithms and showed sensitivity for private and public coverage was about 98 percent and 82 percent, respectively. Predictive power was about 97 percent for both private and public coverage. The survey estimate of private coverage was about 8 percentage points higher than the population prevalence, and the survey estimate of public coverage was about 3 percentage points lower than the population prevalence.