BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Leveraging Big Data for Improving Health Research… the Initial Visit

Chair Mr Charlie Knott (RTI International)
TimeFriday 26th October, 09:45 - 11:15
Room: 40.105

Mapping Behavioral Influencers in the Pharmaceutical Industry

Ms Elizabeth Rountree (Charles River Associates)
Mr Rob Sederman (Charles River Associates)
Mr Michael Roy (Charles River Associates)
Dr Kristen Backor (Charles River Associates) - Presenting Author
Ms Erika Sloan (Charles River Associates)
Mrs Greta Olesen (Charles River Associates)

Pharmaceutical companies have a need to identify behavioral influencers (Key Opinion Leaders, known as KOLs) whose prescribing decisions influence others in their network. Traditional identification of key prescribers has come from a fairly simple process of ranking physicians based on insurance claims volume – the more claims, the more important the physician. However, relying on claims excludes KOLs who influence other physicians to prescribe more, even though they do not prescribe as much personally. Our approach leveraged organizational network analytics, secondary data on thought leadership, and quantitative primary market research to develop a list of KOLs more effectively than traditional methods.

In the initial stage, proprietary network analytics were used to map the connectivity of behavioral influencers using thousands of data points, including patient-level claims, prescriptions, and affiliation data. We then developed a methodology for assigning influence scores for prescribing, based on decisions for shared patients (e.g., referrals, consults, initiations, switches) within the network. Additional influence scores for thought leadership (from physicians who may treat fewer patients but are involved in developing standard of care) were developed through secondary research focused on signifiers like publishing relevant articles (via PubMed) and participation in pharmaceutical advisory boards (via payment data). In the market research portion, a quantitative survey of practicing physicians was used to directly identify those perceived as KOLs on a national and local level. Each respondent gave names and information about perceived KOLs nationally and in the physician’s local area. The survey also collected information on KOL interactions not included in claims data (e.g., interactions at speaker programs/conferences, article reviews, patient-specific consultations, etc.).

Through triangulation of these methods (data analytics, traditional secondary research, and primary market research), we developed a comprehensive list of KOLs, including a rank for each in terms of importance to the prescribing market. This approach has been validated with follow-up analyses demonstrating a significant impact on prescribing for physicians identified as KOLs via this method:

• Physicians connected to those identified as KOLs who prescribed a product were 2.5x more likely to prescribe the product themselves compared to physicians connected to KOLs who did not prescribe the product (controlling for factors like calls and specialty)
• Physicians connected to those identified as KOLs who prescribed a product prescribed ~20% more of a given product themselves compared to physicians connected to KOLs who did not prescribe the product (controlling for factors like calls and specialty)

Accurate KOL identification is key for informing resourcing, targeting efforts to support product launches, and driving adoption through sales channel optimization and influencer engagement. However, many current approaches to identification have drawbacks; traditional methods are limited by available claims and lack information about interactions, while market research alone may miss subconscious influencing (reflected in prescribing data) and lacks sufficient sample for a comprehensive list (given the size of the physician universe). These findings suggest that integration of analytics and market research results in a more accurate identification of physician influence than either approach alone.


Predicting Depression Occurrence Using Classification Algorithm in Data Mining

Mr Abdur Rahman (Department of Statistics, Shahjalal University of Science and Technology, Sylhet, Bangladesh) - Presenting Author
Dr Kanis Ferdushi (Assistant Professor, Department of Statistics, Shahjalal University of Science and Technology, Sylhet, Bangladesh)

Download presentation

Depression or depressive disorder is a common medical illness among elderly people that has negative effects on how they feel, think and act. This study was structured to investigate the condition of depression and activities of daily living of elderly men and woman in Sylhet region of Bangladesh. We have considered 229 elderly peoples aged ranges from 60 to 60+ face to face personal interviews through questionnaires during March to September 2015. Among them 72.5 percent is male and 27.5 percent is female. The data were collected using cluster sampling, dividing the population into four clusters such as urban area, rural area, tea garden area and ethnic area. This paper gives the current overview of the application of data mining techniques to predict the condition of elderly health. In health care, data mining is one of the most vital and motivating areas of research with the objective of finding meaningful information from huge data sets and provides an efficient analytical approach for detecting unknown and valuable information in healthcare data. In this study, a model was built to predict the occurrence of depression among elderly people, using different classification algorithms on R. The classification algorithms used are logistic regression, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and K-nearest neighbors (KNN). KNN has performed with better accuracy to predict the depression than others.


Assessing Community Health Using Imagery From Google Street View

Final candidate for the monograph

Dr Pablo Diego-Rosell (Senior Researcher, Gallup, Inc.) - Presenting Author
Dr Rajesh Srinivasan (Director of Research, Gallup, Inc.)
Dr Ben Dilday (Data Scientist, Gallup, Inc.)
Mr Stafford Nichols (Research Consultant, Gallup, Inc.)

American communities face significant health and well-being challenges. In addition to long-standing inequalities among minorities, mortality and morbidity rates have been increasing among white non-Hispanic Americans in midlife since the turn of the century. Community health data in the US are collected through expensive large-scale surveys including the American Community Survey (ACS) and the Behavioral Risk Factor Surveillance System (BRFSS). Additionally, Gallup conducts a cross-sectional Daily Poll (G1K) that covers both subjective well-being (SWB) and self-reported health variables. These data sources are relatively infrequent, with smaller regions only surveyed every three or five years in the ACS, and with much lower coverage at the community level in the BRFSS and G1K.

Over the last decade computational methods have made great strides, and deep learning applications are now able to leverage text data from Twitter and imagery from Google Earth (GE) and Google Street View (GSV) to reliably estimate statistics relating to race, gender, education, occupation, unemployment, and other demographics down to the census tract level (see e.g. Gebru et al. 2017). We propose here a similar method to predict subjective well-being and health outcomes at the zip code level, based on GSV and satellite imagery, and survey data from BRFSS and G1K. Our method begins by compiling a training set of GSV images, randomly selected within specific census tracts. These images are then crowdsourced via Amazon Mechanical Turk and labelled according to several features that have been identified in the literature to predict SWB and health outcomes (alcoholism, drug addiction), including:

• Car types: These are used to derive income and demographics, following an approach similar to Gebru et al (2017).
• Community markers, such as the presence of graffiti, litter, greenery, walkability, etc.
• Housing: Single family vs. multi-family, vacant, etc.

Additionally, data on population density, distance to downtown areas, zoning (residential, mixed use, commercial, etc.) and other stable features are captured from Census maps and other official sources. Using the labelled data, we create aggregate measures of each feature by census tract and train an algorithm to optimize a prediction of average SWB/health outcomes within the zip code from survey data. Our project aims to provide proof of concept at a small scale, to then expand nationally through automatic image labelling using deep learning.


Smartphone Interrupted Sleep: A New Public Health Challenge? High-Resolution Smartphone Data From Denmark

Professor Naja Rod (Section of Epidemiology, University of Copenhagen) - Presenting Author
Ms Agnete Dissing (Section of Epidemiology, University of Copenhagen)
Dr Alice Clark (Section of Epidemiology, University of Copenhagen)
Professor Thomas Gerds (Section of Biostatistics, University of Copenhagen)
Dr Rikke Lund (Section of Social Medicine, University of Copenhagen)

Introduction: The widespread and increasing round-the-clock use of smartphones provides an interesting analogy to experimental sleep studies, which have shown adverse health consequences of disrupted sleep. We aim to comprehensively describe smartphone activity during the entire sleep span among young adults and to characterize those with smartphone interrupted sleep in terms of sleep impairment and mental and physical health indicators.
Methods: We use unique objective high-resolution information on timing of smartphone activity (based on >250,000 phone actions) continuously monitored over a four week period among 815 young adults combined with comprehensive questionnaire information on self-reported bedtimes as well as indicators of mental and physical health.

Results: We find substantial smartphone activity during the self-reported sleep period. More than 12% had smartphone activity in the middle of the sleep period (3 to 5 hours after self-reported bedtime) and 41% had smartphone interrupted sleep on at least one weekday during a four week period. Those with frequent smartphone interrupted sleep had on average 48 minutes shorter self-reported sleep duration, lower self-rated health, higher body mass index and they reported more problems with tiredness. There were no differences in mental health symptoms according to level of smartphone interrupted sleep.

Conclusions: We document substantial smartphone activity during bed hours in young adults, which suggests that the increasing round-the-clock smartphone use may pose a public health challenge. Especially the relation to overweight warrants close attention.