Short Course 1: Data Integration
Instructor: Dr. Trivellore Raghunathan (“Raghu”)
The data landscape has changed tremendously. Until a few years ago, sample surveys were the primary sources of information but with the ability to harness data from many other sources have become available. These include spatial observations, administrative sources, sensor data, business transactions and social media, to just name a few. These “found data” provide unique opportunities to blend information from multiple sources to harness inferences about the population of interest to address societal problems. This short course will cover important challenges such as harmonization and comparability of measurements across various sources, methods to combine information, modeling challenges and framework needed to evaluate the validity and reliability of estimates derived from such combined sources. Several case studies will be used to illustrate the challenges, opportunities and benefits.
Short Course 2: Fundamentals of Data Science
Instructor: Dr. Juan Esteban Díaz Leiva
Data are everywhere and come in overwhelming quantities. Thus, being able to extract relevant information from them has become an essential ability. Machine learning allows us to do this by granting us “superpowers”, such as seeing in more than 3 dimensions or recognizing patterns when dealing with millions of variables. Here we will introduce this branch of artificial intelligence, briefly review its main areas, and finally focus on regression and clustering, which are two of the most used tools from supervised and unsupervised learning, respectively.
Short Course 3: Unlocking the Superpowers of Advanced Machine Learning Models for Social Scientists: From Lassos to Boosts to Nets!
Instructors: Dr. Trent D Buskirk & Dr. Adam Eck
Social scientists and survey researchers are confronted with an increasing number of new data sources such as apps and sensors that often result in complex data structures that are difficult to handle with traditional modeling methods. At the same time, advances in the field of machine learning (ML) have created an array of flexible methods and tools that can be used to tackle a variety of modeling problems. Against this background, this course discusses advanced ML frameworks, methods and models such as regularization methods, ensemble approaches to learning and deep learning models. The course aims to illustrate these concepts, methods and approaches from a social science perspective in an accessible way so that researchers can apply these methods in their own work to unlock insights. Code examples will be provided using both R and Python and will be available to attendees. The course assumes basic familiarity with fundamental machine learning methods like regression, logistic regression and tree-based models.
Training Session: Hands-on training to select a two-stage gridded population sample using free, user-friendly tools
Instructors: Dr. Dana R Thomson & Dr. Dale Rhoda
Household surveys in countries with an outdated census, or in complex urban settings with mobile or informal populations can be implemented with an improved sample frame based on modelled gridded population estimates. This hands-on training will briefly introduce survey practitioners to the emerging field of gridded population sampling before guiding attendees through two hands-on activities. The activities are based on free, easy-to-use tools – GridSample and GeoSampler – so no special programming or GIS skills are required to attend this session. In the first activity, attendees will generate a sample frame from gridded population data and select primary sampling units with probability proportional to size (GridSample). In the second activity, attendees will randomly sample structures (GeoSampler). Further instruction will be provided about questions to include in the survey questionnaire that allow adjustments for households-per-structure in the sample weights, and production of digital/paper maps that enable easy navigation for field workers. The training is based on the recently published manual on “Designing and Implementing Gridded Population Surveys.”
Professor of Biostatistics at the School of Public Health, Research Professor of Survey Methodology at the Institute for Social Research, University of Michigan. He is also Research Professor at the Joint Program in Survey Methodology, University of Maryland. His research interests are in the analysis of incomplete data, multiple imputation, Bayesian methods, design and analysis of sample surveys, combining information from multiple sources, small area estimation, confidentiality and disclosure limitation, longitudinal data analysis and statistical methods for epidemiology. He has developed a SAS based software for imputing the missing values for a complex data set and can be downloaded from www.iveware.org. He is a Fellow of American Statistical Association, received Richard Remington Award from American Heart Association and Monroe Sirken Award for his contributions to Survey Methodology.
Director of the USFQ Data Science Institute, director of the Master Program in Data and Business Management and Professor of Operations Management at Universidad San Francisco de Quito. He was awarded a PhD in Business and Management by the University of Manchester. He also holds a master's degree in Food and Resource Economics from Bonn University and a Food Engineering degree from Universidad San Francisco de Quito. He is an expert in evolutionary computation, automatic algorithm design and configuration, multiobjective optimisation under uncertainty and artificial intelligence. He also has multiple publications in high-impact journals and a is a consultant in areas such as artificial intelligence, business analytics, data science, among others.
Trent D. Buskirk, Ph.D. is the Novak Family Distinguished Professor of Data Science and outgoing Chair of the Applied Statistics and Operations Research Department at Bowling Green State University. Dr. Buskirk is a Fellow of the American Statistical Association and his research interests include big data quality, recruitment methods through social media, the use of big data and machine learning methods for health, social and survey science design and analysis, mobile and smartphone survey designs and in methods for calibrating and weighting nonprobability samples and fairness in AI models and interpretable ML methods. Recently, Trent served as the President of the Midwest Association for Public Opinion Research in 2016, the Conference Chair for AAPOR in 2018 and is currently part of the scientific committee for the BigSurv23 conference. Trent also serves as an Associate Editor for Methods for the Journal of Survey Statistics and Methodology. When Trent is not geeking out over data science, big data or survey methodology, you can find him playing a competitive game of Pickleball!
Adam Eck is an Associate Professor of Computer Science and Chair of the Data Science Integrative Concentration at Oberlin College where he leads the Social Intelligence Lab. Adam's research interests include interdisciplinary applications of artificial intelligence and machine learning to solve real-world problems, such as data science and machine learning for improving data collection and analysis in the computational social sciences (e.g., Survey Informatics) and public health, as well as decision making for intelligent agents and multiagent systems in complex, uncertain environments.
Dana Thomson is a pioneer in the field of gridded population household surveys. She also coordinates the IDEAMAPS Network, a global initiative that integrates "slum" mapping traditions to map deprived urban areas routinely and accurately at scale. Her other work includes improving the accuracy of gridded population datasets, measuring "slum" upgrading in ways that incentivize community participation, and co-developing data trainings for "slum"-based researchers and advocates. Dr. Thomson is a consultant and visiting researcher at the University of Twente (Netherlands).
Dale Rhoda is a statistical consultant and expert on design & analysis of household surveys for public health. In recent years, he led the statistical aspects of updating the World Health Organization guidelines on vaccination coverage surveys. He regularly coordinates design and analysis of large country-wide surveys in Africa and Asia. Dr. Rhoda is currently interested in data entry errors with touchscreen devices, how entry errors propagate through analysis workflows, using gridded population datasets as survey sampling frames, characterizing missed opportunities for vaccination, and designing survey samples with both design- and model-based estimation in mind.