BigSurv20 program

Friday 6th November Friday 13th November Friday 20th November Friday 27th November
Friday 4th December


Poster session 2

Friday 20th November, 11:45 - 13:15 (ET, GMT-5)
8:45 - 10:15 (PT, GMT-8)
17:45 - 19:15 (CET, GMT+1)

Saving sex for marriage: Understanding the complexity between sexual abstinence and marital bliss based on social media and survey data

Mr Emmanuel Olamijuwon (University of the Witwatersrand) - Presenting Author
Professor Clifford Odimegwu (University of the Witwatersrand)

Almost a third of females who have been in a relationship have experienced physical or sexual violence. Although several studies from diverse contexts have examined the predictors of intimate partner violence, the pervasiveness of the problem highlights the need for more research to identify possible mechanisms through which the problem persists.

This study combines social media data with the 2018 Nigeria demographic and health survey (NDHS). In our analysis, we use causal loop diagrams (CLDs) to illustrate from young people’s perspectives, the pathways through which saving sex till marriage might contribute to marital bliss. The Facebook group comprised of more than 176,461 young adults mostly from African countries and a total of 3,482 posts and comments related to sexual abstinence and marital bliss between June 1 2018 – May 31 2019. We used survey data from 1,817 couples sampled in the NDHS to validate the CLD and the underlying conceptual thinking.

Young adults believe that sexual abstinence could affect the timing of first marital birth and lead to sexual and marital satisfaction for women. Women who saved sex for marriage are also perceived to have high levels of self-control and discipline which may reduce their likelihood of engaging in marital infidelity and subsequently boost partner’s trust and confidence and reduce the likelihood of partner control. Women who have their partner’s trust and confidence are also unlikely to experience any form of physical, emotional or psychological violence by their husband, all of which contributes to marital satisfaction.

Binomial logistic regression models were fitted to examine associations between couples’ premarital sexual experience and women’s experience of intimate partner violence. We observed that unions in which only the woman is a virgin are not significantly different from unions in which none of the partners was virgins in terms of women’s experience of sexual, emotional and physical violence while adjusting for covariates. On the other hand, women in unions in which only the partner had no premarital sexual experience are less likely to experience emotional (β= -0.62, CI: -1.09; -0.16) or physical (β= -0.93, CI: -1.50; -0.36) violence compared to unions in which both partners have some premarital sexual experience. In a like manner, women in unions in which both partners had no premarital sexual experience are less likely to experience emotional (β= -0.57, CI: -0.99; -0.15) or physical (β= -1.05, CI: -1.54; -0.55) violence compared to unions in which both partners have some premarital sexual experience.

Our study holds important implication for the design of effective interventions for addressing cultural attitudes and women’s experience of intimate partner abuse. This study also contributes to scholarship by showing that using social media data like Facebook offers a new possibility for understanding the complexity of social issues that affect women’s health and well-being. Furthermore, by adopting a system thinking approach, we were able to make explicit assumptions, identify new hypotheses and test the same against survey data.

Improved regression-in-ratio estimators in estimating the population means in simple random sampling with empirical extreme maximum and minimum values in survey statistics

Dr Peter Ogunyinka (Olabisi Onabanjo University, Ago-Iwoye, Nigeria) - Presenting Author
Mr Emmanuel Ologunleko (University of Ibadan, Ibadan, Nigeria)
Professor Ademola Sodipo (University of Ibadan, Ibadan, Nigeria)

Download presentation 1

Download presentation 2

Significant improvement had been made to regression-in-ratio estimators in simple random sampling in Survey Statistics. However, such estimators would be over-estimated or under-estimated in the presence of extreme maximum or minimum value in the survey data, respectively. This study had proposed three regression-in-ratio estimators (¯y_1,¯y_2 and ¯y_3 ) that corrected the challenge of over-estimation or under-estimation in the estimates when there are extreme values in the survey data. The bias and the Mean Square Error (MSE) expressions were established. Theoretical comparison confirmed the conditional efficiency of the proposed estimators to the reviewed estimators. Further empirical comparison, with twenty-six simulated populations comprising of high and low extreme maximum values, was used to ascertain the asymptotic sensitivity of the proposed estimators to different magnitudes of extreme values. Two estimators (¯y_1 and ¯y_2 ), out of the three proposed estimators, proved to be more biased than the corresponding reviewed estimators while one estimator (¯y_3 ) proved to be less bias than the corresponding reviewed estimator. The proposed estimators proved to be asymptotically efficient with smaller variances and Mean Square Errors (MSEs) over the reviewed estimators. Finally, the ranking of the percentage relative efficiency showed that the three proposed estimators (¯y_1,¯y_2 and ¯y_3 ) were 120%, 119% and 120%, respectively efficient over the corresponding reviewed estimators. Sample Survey method to test for significant extreme values, in Survey Statistics, before the application of extreme value correction method was suggested for further study.

The ecosystem of technologies for data collection, survey and analysis in the social sciences

Dr Daniela Duca (SAGE Publishing) - Presenting Author
Mrs Katie Metzler (SAGE Publishing)

The growth in digitally borne data, combined with increasingly accessible means of developing software, has resulted in a proliferation of tools to support the research lifecycle, and especially for social data research. To understand the variety of tools and their key uses, we reviewed 418 software applications and packages used by social science researchers. This paper explores who leads the development of these tools, where the supporting communities and investors are, and what challenges users and creators face. Among the 418 tools we found, close to 50% are based in the United States, just over 50% are free to use for researchers. The software tools are either developed by private (50%), big tech (5%), public sector, or individuals as side-projects (45%). Only 10% of the key people involved in designing and developing these tools were women. When it comes to supporting the development of research tools, a growing number of communities, organisations, and consortia offer guidance, training, and some form of financial support. Among these are the Software Sustainability Institute, the discipline-specific Digital Methods Initiative, NUMFocus and Pelagios Commons, and the regional NeCTAR and CESSDA. Where tools have applicability or a primary focus in the business intelligence world, we find top venture capitalists involved, such as Sequoia and Index Ventures. With the exception of Prolific, only a few startups coming out of university incubators target academics specifically. Although large, our list of tools is biased towards English-based survey, social media data, text mining, annotation, and qualitative data analysis tools. When looking at surveying tools, we note that the successful ones like Qualtrics, SurveyMonkey, and TypeForm do the basic job of survey management, enable an easier-to-use interface in designing complex questionnaires, interoperate with games and experimental sites, and help recruit participants. The next most fascinating development, however, will address the effectiveness and efficiency of these surveys. Matt Salganik and his team developed, a free surveying tool that enables researchers to engage their respondents to contribute to the survey development while also collecting answers. A group of computer scientists and social scientists from the University of Washington Madison developed NEXT, a surveying tool powered by an algorithm that adapts the survey sample and questions as more people answer them to get better results faster and without having to rerun the survey. Academics are at the forefront of these projects, developing new methodologies and tools that will eventually be taken up by the private sector. To do that more effectively, they need a community of users, financial support, consortia and other organisations that are able to host and scale up their tools. This will ensure that more researchers can use and build on existing tools, as well as enable the development of sustainable models and the growth of the community of users.

Combining multiple data sources with synthetic populations. Applications to predictions and alleviation of privacy concerns.

Mr James Rineer (RTI International) - Presenting Author
Mr Georgiy Bobashev (RTI International)
Ms Emily Hadley (RTI International)
Ms Caroline Kery (RTI International)
Dr Alan Karr (AFK Analytics)

Many modern research questions require knowledge acquired from multiple datasets. This need arises could happen when obtaining a single dataset is either difficult or impossible, or when the data already exists in multiple data sources. Examples include: predicting the effect of new cancer screening in on the US population, identifying obesity hotspots (i.e. geographic areas with unusually high prevalence of high BMI individuals), predicting the effect of interventions aimed to reduce opioid-related deaths, etc. (cite colorectal, breast and cervical cancer models, obesity, stunting, opioid). We present an approach that allows one to create an individual-level analysis dataset that allows one to probabilistically link multiple datasets of different nature, such as administrative data surveys, clinical data, medical records, etc. The key component of our approach is a synthetic population developed at RTI. Synthetic populations are statistically and spatially accurate representations of persons, their families, and their related social structure. Researchers can map variables from different datasets onto a synthetic population, resulting in a dataset that contains information from a variety of sources. This individual-level dataset is sufficient to produce reliable statistical inference with quantifiable uncertainty while still adhering to data privacy restrictions. However, the choice of method to map the variables can considerably impact the accuracy of the predictions. We describe three methods for linking datasets with synthetic data: resampling, modeling predictors independently, and modeling predictors sequentially. Resulting datasets could be then used for making estimations (e.g. geographic hotspots) or prediction of future outcomes with microsimulations or agent-based models. We provide examples of such linkage from cancer, obesity, and substance use research.

Predicting race and ethnicity from the sequence of characters in a name

Dr Gaurav Sood (Sunnyvale Labs) - Presenting Author
Mr Suriyan Laohaprapanon (Appeler, Inc.)

To answer questions about racial inequality, we often need a way to infer race and ethnicity from a name. Until now, a bulk of the focus has been on optimally exploiting the last names list provided by the Census Bureau. But there is more information in the first names, especially for African Americans. To estimate the relationship between full names and race, we exploit the Florida voter registration data and the Wikipedia data (Ambekar et al. 2009). In particular, we model the relationship between
the sequence of characters in a name, and race and ethnicity using Long Short Term Memory Networks. Our out of sample (OOS) precision and recall for the full name model estimated on the Florida Voter Registration data is .83 and .84, respectively. This compares to OOS precision and recall of .79 and .81 for the last name only model. As expected, the gains are asymmetric. Recall is considerably better for Asians and Non-Hispanic blacks with the full name---.49 and .43 respectively, compared to .41 and
.21 respectively. The precision with which we predict non-Hispanic Blacks is also considerably higher---it is 9 points higher for the full name model. To illustrate the use of this method, we apply our approach to the campaign finance data to estimate the share of donations made by people of various racial groups. We find that relying on census last name data understates the racial differences in contribution because of its higher error rate. For instance, based on the census last name data, in 2010, about 83.5% of the contributions were made by Whites. But the commensurate number based on the Florida full name model was nearly 3% more, 86.5%. Moving to blacks, we see a similar story. Based on the census last name data, about 10.2% of the contributed money came from blacks. But based on Florida full name model, the number is about 2.3% lower or a massive 22.2% relative change.

Survey of digital assets for project-based studies

Dr Gloria Miller (maxmetrics) - Presenting Author

Projects are a key vehicle for economic and social action, and the source of innovation, research, and organizational change. They can be the size of the gross domestic product of a small nation or larger than the organizations participating in them. In 2019, the world bank assessed 214 projects worth approximately $71 billion (IEG World Bank Project Performance Ratings, 2020) and they argued that private sector investments and leveraging of digital technologies are crucial in boosting economic growth. Projects offer situations and digital assets that can be analyzed through a multitude of theoretical lenses. This research is a survey of digital assets available through a project; specifically, what is available, why it would be a relevant source for research, and who could be the sources of the data.

Projects are a temporary organization with a set of actors working together over a limited, pre-determined period of time for a given outcome. Their existences can be explained and studied using philosophical underpinnings such as the Newtonian understanding of time, space and activity, through the archetypes of project contexts such as project-based organizations, project-supported organizations, or project networks, or through the investigation of the project processes or actors (Geraldi & Söderlund, 2016; Jensen, Thuesen, & Geraldi, 2016; Lundin, 2016). Furthermore, they can be used for project studies such as stakeholder engagement, project performance, and individual and group performance. The majority of project-based research relies on qualitative methods such as literature reviews, surveys, and case studies. Thus, there is a call for new research approaches that investigate the actual or lived experience (Drouin, Müller, & Sankaran, 2013; Geraldi & Söderlund, 2016).

The use of historical data from structured project repositories for cost estimation is well-known in project management. However, little project research uses text mining, machine learning, topic analysis, or social network analysis against those data sources for other research purposes. While email data has been used as an alternative to survey data, even that research has not fully exploited the data for insights into other social intricacies. This research uses a literature review and interviews to compile a survey of digital assets available for research through a project-context, including suggestions as to why the data source would be relevant and counterparties that may be able to provide the data.

Drouin, N., Müller, R., & Sankaran, S. (2013). Novel Approaches to Organizational Project Management Research: Translational and Transformational. Denmark: Copenhagen Business School Press.
Geraldi, J., & Söderlund, J. (2016). Project studies and engaged scholarship: Directions towards contextualized and reflexive research on projects. International Journal of Managing Projects in Business, 9(4), 767-797. doi:10.1108/IJMPB-02-2016-0016
IEG World Bank Project Performance Ratings. (2020). Retrieved from:
Jensen, A., Thuesen, C., & Geraldi, J. (2016). The projectification of everything: Projects as a human condition. Project Management Journal, 47(3), 21-34.
Lundin, R. A. (2016). Project society: paths and challenges. Project Management Journal, 47(4), 7-15.

Developing a tool suite for managing large scale cross-national web surveys within the framework of the European Open Science Cloud

Dr Gianmaria Bottoni (ESS HQ, City University of London) - Presenting Author
Professor Rory Fitzgerald (ESS HQ, City University of London)
Professor Nicolas Sauger (Sciences Po)
Dr Genevieve Michaud (Sciences Po)
Dr Quentin Agren (Sciences Po)

The European Social Survey, European Research Infrastructure Consortium recently experimented with the worlds’ first input harmonised, probability based cross-national web panel in three countries by recruiting panel members who had taken part in the face-to-face survey. The experiment took place in Estonia, Great Britain and Slovenia (the CRONOS web panel).
A key challenge identified during the CRONOS experiments was the absence of a sample management system that was well suited for use in a multi-country environment and which could also meet data protection requirements. In addition, handling multiple language versions in a harmonised way proved difficult.
This paper will describe work developing a sample management system for a cross-national web panel that meets the needs of different surveys in a complex multi-national environment and which also links seamlessly to a survey platform. The work is being conducted under the Social Science and Humanities Open Science Cloud (SSHOC) H2020 project.
Proposals for content will be outlined with the key fields for sample management presented. In addition, functionality will be discussed such as contact modes (SMS, postal and e-mail) and user accounts. User profiles rights will also be outlined. The paper will discuss how the system links with the commercial software Qualtrics based on the API. The approach taken for managing survey administration across these two tools in multiple countries will be showcased with examples from testing for CRONOS-2 being conducted in 12 countries later in 2020 used to highlight opportunities and limitations of the suite.
The tested software will in due course be made available on the SSH Open Market Marketplace and Workbench for installation by third parties along with related documentation.

The impact of regional holidays on early job-seeker registrations

Dr Gerald Seidel (Federal Employment Agency) - Presenting Author

Early job-seeker registrations, which are enforced by the German Social Code and reflect the number of terminated job contracts, are an important indicator of the labor market. Therefore, I analyse the impact of regional holidays on this indicator. I approximate the number of early job-seeker registrations by the entries of employed persons to the official status ‘job-seeking’ at the Federal Employment Agency.
The results of my RegARIMA analysis indicate that most of the regional holidays significantly (and to a plausible degree) reduce the number of early job-seeker registrations. In contrast, only the Reformation Day turns out to be insignificant for most German Länder (states). I check the latter result for robustness exploiting the variations in regional holiday legislation due to the 500th reformation anniverary. The overall missing effects of the Reformation Day on early jobseeker registrations might (partly) be explained by the fact that it coincides with the last day of a month (October 31st).

Big analytics and segmentation- A new framework for single datasets and integrating Survey and big data

Dr Richard Timpone (Ipsos) - Presenting Author
Mr Jonathan Kroening (Ipsos)

Download presentation

As Big Data altered the face of research, the same defining factors of Volume, Velocity and Variety reflect changes in opportunities of analytic data exploration as well. Improvements in algorithms and computing power provide the foundation for platforms that explore masses of models to identify new insights for research goals. We previously introduced this as the concept of Big Analytics (Timpone, Yang and Kroening 2018).

Extending the idea of Big Analytics, we developed our Segmentation Evaluation System (SES) to evaluate over 1000 different solutions from a single dataset to identify those best suited for research goals. This provides new opportunities in both social science research and business settings for deeper understanding of differences among groups of individuals.

The array of models conducted is evaluated on success criteria chosen for specific research problems. These include criteria focused on single dataset evaluation, such as segment cohesion or number of features needed to create a typing tool, but also allow explicitly identifying solutions that both meet research standards for the segmented dataset and how it performs in linking solutions to other survey and Big Data databases.

This new conceptual framework contrasts with traditional approaches that run single solutions (whether with traditional methods like k-means or ML solutions like Self-Organizing Maps). No one method is best suited across the board and this approach provides fit for purpose methods to bridge the art and science of segmentation.

The SES platform includes diverse clustering methods (k-means, hierarchical clustering, ensemble clustering, Self-Organizing Maps, latent class, affinity propagation, non-negative matrix factorization among others) and multiple distance measures (Euclidean, angular distance, generalized distance measures and random forest dissimilarity).

Given the large number of segmentation solutions, the key becomes their evaluation. We demonstrate how different solutions vary as the research goals of a segmentation change. Criteria for success in the framework include factors such as segment differentiation and cohesion, that are central to segmentation, as well as criteria such as profiling variable ownership, segment reproducibility, and practical criteria like the overall fit of survey questions to create a typing tool and how few items can be used to create a robust typing tool.

These latter criteria make this Big Analytic approach superior to traditional segmentations for integrating survey data with other sources including Big Data. We have successfully leveraged SES for online behavioral segmentations that also successfully differentiate on attitudinal items from a survey to ensure actionability. In the other direction, creating need and attitudinal survey-based segments and using criteria on how well they type on hooks to link to other types of databases (from CRM to online media personas) ensure better linkage than building a segmentation and then post hoc trying to predict segments in other databases.

Beyond the theory, we show how this framework has been used in practical cases to identify more actionable solutions in general as well as linking across databases as a clear exemplar of the vision of BigSurv, and the explicit linkage of Survey and Big Data.

From big data to (trusted) smart surveys

Professor Markus Zwick (Institute for Research and Development in Federal Statistics) - Presenting Author
Ms Shari Stehrenberg (Institute for Research and Development in Federal Statistics)

In order to reap the benefits of the data revolution, the European Statistical System (ESS) launched the ESSnet Big Data I and as a follow-up project, the ESSnet Big Data II based on the Scheveningen Memorandum. Both ESSnet projects have their foundation in the Big Data Action Plan and Roadmap (BDAR), which was adopted by the ESS in 2014.
The overall objective of both ESSnet Big Data projects is to further prepare the ESS for integrating big data sources into the production of official statistics. Meanwhile the ESS explored various non-traditional data sources and first results are delivered. Some National Statistical Institutes as well as Eurostat have established a section on their web sides to publish the results of this experimental statistics.
Both ESSnet Big Data projects are data-related with a focus on how, new and non-traditional data sources could be integrated into the production of official statistics. With the Bucharest Memorandum, the ESS achieved to step forward from Big Data to Trusted Smart Statistics (TSS). With the TSS concept, the research interest is wider. Beside the new non traditional data, it is also relevant, how the ESS is able to use the digitalisation to further enhance the production of traditional data like surveys.
With the ESSnet Smart Survey 2020-2021 twelve NSIs started a project to research the opportunities of mobile application to further digitalize surveys. The Federal Statistical Office of Germany coordinates the project.
By the term “smart surveys” we refer to surveys that use smart personal devices, equipped with sensors and mobile applications. The concept of smart surveys goes well beyond the mere use of web-based (online) data collection that essentially transforms the paper questionnaire into an electronic version. Smart surveys involve dynamic and continuous interaction with the respondent and with his/her personal device(s).
The term “trusted smart surveys” refers to an augmentation of smart surveys by technological solutions that collectively increase their degree of trustworthiness and hence acceptance by the citizens. Constituent elements of a trusted smart survey are the strong protection of personal data based on privacy-preserving computation solutions, full transparency and auditability of processing algorithms. (Trusted) smart surveys will increase the attractiveness of participating in a survey, not only because they reduce the time needed to fill out a questionnaire, but also because participants receive individualized incentives.
A second goal of the ESSnet Smart Surveys is to define the specifications for a European Platform supporting the use of shared smart survey solutions and furthermore to assess the usage of applications for European social surveys, such as the Time Use Survey (TUS) or the Household Budget Survey (HBS). Both surveys are considered to be quite burdensome to respondents and to be prone to low recall as well as underreporting errors.
The presentation will give an insight of the ESS concept Trusted Smart Statistics with a special focus on how this concept is used within the ESSnet Smart Surveys.

Predicting basic human values from digital traces on social media

Mr Mikhail Bogdanov (National Research University ) - Presenting Author

Download presentation

There is a number of studies that demonstrate that some human traits and attributes are predictable from digital traces on social media. Probably the most studied phenomenon is personality traits. It has been shown in meta-analyses that personality traits are predictable from digital traces on social media. However, there are only a few studies that attempt to predict human values based on the digital traces. Although the values are more socially constructed than personality traits and, therefore, might reflect in people’s social media profiles. More than that, most studies use digital traces from such globally popular social media platforms as Facebook and Twitter. There is substantively less number of studies that employ data of local social media websites.
In this study, we try to fill these niches by predicting Schwartz’s Basic Human Values using digital traces from Russian social network platform “Vkontakte” (analogue of Facebook). Our analysis was based on the data of nationally representative cohort panel study - “Trajectories in Education and Careers” (TrEC). This study is based on the cohort of eight graders of 2011 who participated in the international study “Trends in Mathematics and Science Study” (TIMSS). Now the average age of these respondents is 23 years. We use the survey data from the recent wave about values measured by Schwartz’s Basic Human Values approach and subscriptions to the public pages and groups on the social media platform “Vkontakte”. The Vkontakte is a leading social media platform in Russia with over 90% of youth registered on it.
We employed different machine learning algorithms (random forest, boosting, regularized regression etc) to predict Basic Human Values from subscriptions to the public pages on this platform and found that values could be predicted from digital traces with similar accuracy as the personality traits.

Using spatial big data to unpack neighbourhood effects on social wellbeing

Professor Chan-Hoong Leong (Singapore University of Social Sciences) - Presenting Author

This study examines how social environment of neighbourhoods shape social trust, immigrant perception, and emotional resilience. Social environment is defined based on the geo-locations of public residential apartments known to have a high concentration of ethnic minorities, immigrant groups, and low housing resale prices (i.e., lower income neighbourhoods). The data measuring spatial clustering of ethnic and migrant communities and residents of lower socio-economic status are obtained from various online platforms managed by the Singapore housing authorities. Using Geographic Information Systems, the spatial data is first transformed into a continuous raster data format, and then overlaid and integrated with a large national survey measuring various aspects of social and individual wellbeing, including social trust, emotional resilience, and support for multiculturalism. The combination of survey and spatial big data demonstrated a complex web of mutual interdependence between individual profile - measured in the survey - and the environmental features. Neighbourhoods with a higher concentration of minority ethnic groups reported lower social trust. On the other hand, neighbourhoods with higher immigrant density demonstrated strong mutual trust, emotional resilience, and support for multicultural policies. The presence of immigrants in the neighbourhood moderated the impact of minority ethnic groups. The findings are discussed in the context of Singapore public housing policies, and the limitations of traditional multilevel research models and the modifiable areal unit problem.

Measuring attitudes and behaviors toward the 2020 Census across time

Dr Yazmin Trejo (U.S. Census Bureau) - Presenting Author
Mrs Jennifer Hunter Childs (U.S. Census Bureau)

As part of the effort for the 2020 Census in the United States, researchers designed a survey called “2020 Census Attitudes Survey.” The goal of this survey is to track public opinion before and during the census data collection, the bulk of which is scheduled to take place March thru July. The survey was conducted monthly from September to December 2019 and will be conducted weekly from January to June 2020 in English and Spanish. This paper reports on measured survey trends including intention to participate in the census, census awareness, knowledge, and potential participation concerns for the general population and across groups (e.g. age, education, sex, race, and ethnicity). The measurement of behaviors and attitudes associated with intention to participate will serve as a baseline to inform the day-to-day decisions for the operations of the communications campaign. The survey uses a combination of both a nationally representative telephone sample of the U.S. population and a nonprobability sample using online panels. This survey serves as a unique opportunity to compare public opinion with actual participation in a mandatory civic activity, which is the act of filling out the census.

Subjective wellbeing and the intention to emigrate: A cross-national analysis of 157 countries, 2006-2017

Dr Tatiana Karabchuk (UAE University) - Presenting Author
Dr Marina Selini Katsaiti (UAE University)
Mrs Karin Johnson (University of California Riverside)

The core of migration literature examines the processes by which people migrate and their experiences during and after migration. However, there is little work that explains what factors influence whether a person intends to emigrate to another country. This study contributes to this gap by investigating to what extent individual subjective wellbeing and the broader social environment affect the likelihood someone wishes to leave their home country. This paper fits hierarchical linear models to Gallup Poll data across 157 countries for the years 2006 to 2017 as a first step. As a second step the paper tests machine learning modeling of the big cross-national data in years. We hypothesize that greater levels of subjective wellbeing will reduce the intention to migrate abroad, but that even when wellbeing is high if the broader social context is restrictive or ineffective, people will have a greater likelihood of wishing to migrate than residents of a country with a more effective social system. Furthermore, we hypothesize that results will show a gradient of intentionality based on the region in which a person lives. These findings have three implications: first, they describe patterns of migration and how they change over time in relation to individual- and country-level factors; second, they broaden our understanding of migration push factors beyond economic hardship or conflict; and, third, we may consider how to modify existing programs in a home country to improve welfare, as well as reception policies in countries where people intend to migrate to facilitate their social, economic, and cultural contribution. An additional methodological contribution of this paper is in its comparative methodology test: traditional econometric models VS machine learning techniques.

Quality guidelines for the acquisition and usage of big data

Dr Alexandcr Kowarik (Statistics Austria) - Presenting Author
Dr Magdalena Six (Statistics Austria)

The increasing knowledge and experience within the European Statistical System (ESS) in the acquisition, processing and use of new data sources provides now a clearer picture on quality demands. These quality based experiences are used by the ESSnet Big Data II to formulate guidelines for NSIs who already use and/or plan to use new data sources for the production of official statistics. Looking at the production process of statistics, the usage of new data sources mostly affects quality aspects of processes related to input and throughput. Taking this into account the guidelines concentrate on the input and the throughput phase of the statistical production process.
With new data sources, the access to as well as the processing of input data makes it necessary to consider new and very source- and data-specific sub-processes. The variety of sub-processes is much broader compared to the use of traditional data sources. What is relevant for one data class and one data access might be of no interest for others. We therefore decided to develop a modular approach for the structure of the quality guidelines, allowing producers to focus on the guidelines relevant for the intended form of data access and the intended data usage taking into account the peculiarities of a specific data class .

The Global Diffusion of Cybersecurity Strategy

Dr Nadiya Kostyuk (University of Michigan) - Presenting Author

One of the most important developments of the last two decades has been the spread of national cybersecurity strategies. Yet researchers have paid little attention to the spread of this phenomenon. Contrary to existing scholarly works that argue that increases in cyberthreats motivates nations to adopt cybersecurity strategies, this research treats the spread of these strategies as an example of policy diffusion. I argue that a nation is motivated to adopt its first cybersecurity strategy when other nations with similar preferences on an extent of a government’s desire to control the part of the Internet within its own borders have adopted cybersecurity strategies. Using a survival model, I test this hypothesis with a newly collected cross-sectional timeseries data set of national cybersecurity strategies between 1999 and 2018. The analysis provides robust empirical support for my theoretical argument. As the world is currently debating future Internet regulations, my findings have important policy implications.