BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Posters 1 (actively presented from 11.30 to 13.00)

Chair Dr Antje Kirchner (RTI)
TimeFriday 26th October, 08:30 - 17:30
Room: 30.S02 S. Expo

These posters are the result of the Barcelona Dades Obertes Data Challenge organized by the city of Barcelona. For more information on the institutions and the data challenge please see:

Poster Title High School
Investigating Complaints in GraciaInstitut Vila de Gràcia
Social Cohesion and Type of NeighborhoodInstitut Ferran Tallada
Free WI-FI Points in BarcelonaInstitut Juan Manuel Zafra
Access to Housing in BarcelonaInstitut Joan Brossa
A Study of Traffic Accidents in BarcelonaInstitut J. Serrat i Bonastre
WI-FI PointsInstitut Josep Comas i Solà

Count Regression Modelling on Number of Migrants in Households

Mr Tsedeke Lambore Gemecho (PhD Student) - Presenting Author
Professor Ayele Taye Goshu (Associate Professor of Statistics )

The main objective of this study is to identify determinants of the number of international migrants in a household, and to compare regression models for count response. A total of 2288 data are collected from sixteen randomly sampled districts in Hadiya and Kembata-Tembaro zonal areas, Southern Ethiopia. The Poisson mixed models, as special cases of the generalized linear mixed model, is explored to determine effects of the predictors: age of household head, farm land size, and household size. Two ethnicities, Hadiya and Kembata, are included in the final model as dummy variables. Stepwise variable selection has indentified four predictors: age of head, farm land size, family size and dummy variable ethnic2 (0=other, 1=Kembata). These predictors are significant at 5% significance level with count response number of migrant. The Poisson mixed model consisting of the four predictors with random effects districts. Area specific random effects are significant with variance of about 0.5105 and standard deviation of 0.7145. The results show that the number of migrant increases with heads age, family size and farm land size. In conclusion, there is significantly high number of international migration per household in the area. Age of household head, family size, and farm land size are determinants that increase the number of international migrant in households. Community based intervention is needed so as to monitor and regulate the international migration for the benefits of the society.


Testing Analytical Methods Related With the Unstructured Data Analysis From Perspective of 'Data Scientists and Methodologists'

Professor Piotr Tarka (Poznan University of Economics and Business, Department of Market Research) - Presenting Author

In the market, opinion, survey, and social science researches, there arises a growing amount of data that in majority is based on unstructured formats derived from: customer feedbacks, social media conversations, blogs, news, articles, voice, and photos. Such data continues to grow in volume, variety, velocity, but also in the overall value. For the data scientists and classical methodologists, this trend offers new opportunities but also challenges. There has even been evidenced a slow shift in the analytical approaches that some data experts in practice use to derive values, in particular from the unstructured data analysis. In this presentation, having based on the conducted empirical research, we try to diagnose to what extent and which respective groups of experts relate, in their analytical work, to: textual, audio (spoken language), and picture data formats, assuming there may appear differences in context of intensity of application of these formats in practical data analysis. In article, by investigating experts' experience, we compare their opinions regarding unstructured data methods by implementing analytical approach based on the confirmatory factor analysis as well as multiple group analysis. With this assumption in mind, we constructed CFA model which allowed us testing the equivalence level of the data experts' views. The data was collected in the course of international online survey (through the agency of LinkedIn social network) among the data experts with three various educational backgrounds as: 1) economics/business, 2) sociology/psychology, and 3) mathematics/statistics/computer science.


A Joint Modelling Approach in SAS to Assess Association Between Adult and Child HIV Infections in Kenya

Mr Elvis Muchene (University of Nairobi) - Presenting Author

Download presentation

Recent studies have adopted a joint modelling approach as a more stout technique in studying outcomes of interest simultaneously, especially when the interest is in the association between two dependent variables. This has been necessitated by the fact that modelling such outcomes separately often leads to biased inferences due to existing possible correlations, especially in medical studies. This paper demonstrates the application of linear mixed modelling approach using SAS analysis software to evaluate the correlation between adult and child HIV infections for each county in Kenya, while adjusting for several predictors of interest. Using HIV data extracted from the Kenya open data website for the year 2014, we visualize in each county the HIV prevalence on the Kenyan map. High infection incidences are observed for counties located in Nyanza province. We further fit a joint model for the two outcomes of interest using the linear mixed models approach to capture possible correlation between the two outcomes for each county. Results indicate that there is a correlation between infections in adults and children. Further, there is a significant effect of ART coverage, adults and children in need of ART and number of people undergoing testing voluntarily. Researchers or students who have little understanding in application of linear mixed models, both theoretical understanding and practical analysis in SAS, as well as application on real datasets, will find this article useful. Findings from this article would interest the health sector, practitioners and other institutions working in HIV related interventions.


Comparison of Artificial Neural Networks and Generalized Linear Models for NBA Outcomes

Dr Shan Wang (Northeastern Illinois University) - Presenting Author
Mr William Johnson (Northeastern Illinois University)

Artificial neural networks are statistical learning models, inspired by biological neural networks, that are used in machine learning. It is a nonlinear regression method that provides alternative ways to logistic modeling, or more generally, generalized linear modeling. Comparing to the traditional regression method, such as the generalized linear model, the well-known advantages of ANN include the ability to detect more complex relationships between dependent and independent variables, fewer requirements of statistical training and the ability to analyze big data. This article presents an overview of the artificial neural networks method and the generalized linear modeling method and compares the predictive accuracy and computational burden using the National Basketball Association dataset from 2010 to 2017.


From Data Points to Data Dan: Combining Log Analysis: Survey Analysis and Interviews to Segment Google Analytics Customers

Ms Laura Eidem (N/A) - Presenting Author
Ms Yinni Guo (N/A)
Mr Sundar Sdorairaj (N/A)

Google Analytics has a wide user base, from hobbyist bloggers to employees of Fortune 100 corporations. In order to better understand our users, and to get more precision around the proportion of each user type that make up our customer base, we embarked on a customer segmentation project. This long-term research project used both qualitative and quantitative methods to scope and define customer “use cases,” or particular tasks that directed the front-end interactions of a user’s session. Our quantitative approach consisted of collecting all front-end user interactions, and performing Latent Dirichlet Analysis to arrive at groupings of 25 use cases, as well as conducting a survey to investigate how users’ background impact their usage. In parallel, our qualitative approach included over 50 subject interviews to understand what use cases were important from the user’s perspective. We used this research, along with product subject matter experts, to help assign labels to each of our use case parameter groupings. Using the labeled LDA topics, we measured engagement by user across each, and performed k-means clustering on individual users to arrive at 12 user segments. The qualitative interpretation of these clusters through 40 interviews led to a set of personas, which will provide further inspiration for product development.


SurveyMotion: What Can We Learn From Sensor Data About Respondents' Actions in Mobile Web Surveys?

Mr Jan Karem Höhne (University of Göttingen) - Presenting Author
Mr Stephan Schlosser (University of Göttingen)

Download presentation

Recently, the use of mobile devices, such as smartphones and tablets, in web survey responding has increased markedly. As shown by previous research, this is particularly observable for smartphones. The reasons for this trend seem to be twofold: first, the number of people who own a smartphone has accumulated and, second, high-speed mobile Internet access has increased. However, previous research has also shown that smartphone respondents are frequently distracted and/or multitasking. We propose "SurveyMotion (SM)," a JavaScript-based tool for mobile devices, in general, and for smartphones, in particular, that enables researchers to gather information about respondents' motions during web survey completion by using sensor data. Technically speaking, SM collects data about the total acceleration (TA) of smartphones. In addition, we collected several kinds of client-side paradata (e.g., response times and screen taps) and employed survey questions with different response formats (e.g., radio buttons and sliders). We conducted a lab experiment (N = 120) and varied the form of mobile web survey completion. Respondents were randomly assigned to one of the four groups. Group 1 was seated in front of a desk with the smartphone lying on the desk during survey completion (control group). Group 2 stood at a fixed point and held the smartphone during survey completion. Group 3 walked along an aisle with the smartphone in their hands during survey completion. Group 4 climbed stairs with the smartphone in their hands during survey completion. The results reveal that SM registers higher TAs of smartphones for respondents with comparatively higher motion levels, which indicates a proper measurement of the TA. Although this study was not able to reveal a distinct connection between motion levels and response times, it did reveal a connection between TA and screen taps. Apparently, finger taps on the smartphone screen cause identifiable patterns in the TA data. To conclude: the SM tool promotes the exploration of how respondents complete mobile web surveys and could be employed to understand how future mobile web surveys are completed.


Measuring the Official Statistics Capability of the Public-Sector Organizations in Presence of Big Data Sources

Mr Wasim Syed (National College of Business Administration and Economics, Lahore, Pakistan) - Presenting Author

Download presentation

Good governance of any state is based on quality statistics. Reliable and timely statistics provide the basis for evidence-based planning and decision making which helps to address the real needs of citizens. In the modern era, government organizations are rapidly adopting advanced technologies and systems. This adoption triggers out more and more digital data sources instead of manual records, which are defined as digital administrative data sources. It may prove a treasure trove for the national statistical organizations if these sources fall under the definition of Big Data. These sources required modern tools for data processing with statistical care. Without having advanced computational skills, it is awful to optimally utilize these sources. Here, we have tried to measure the statistical and Big Data processing capability of the public-sector organizations. A national level survey has been conducted to capture the real picture of the public-sector organizations in Pakistan. Data has been collected from 171 Federal and Provincial level organizations using postal inquiry. Results are summarized and scores are developed to rank the public-sector organizations based on their capability to produce official statistics and Big Data processing. Convex logistic principal component analysis has been used as a dimensionality reduction tool for the development of scores and relative capability index. The results can be used to explore the sectors that need capacity building, as well as to remove barriers and to resolve issues in the adoption of modern data sources.


Spatial Influence in Basque Country's Hotels Price

Mr Ander Juarez Mugarza (Eustat) - Presenting Author
Mr Asier Badiola Zabala (Eustat)
Mr Jorge Aramendi Rique (Eustat)

Download presentation

When we talk about hotel prices there are two main features we must analyze: the season and the location. We think it would be interesting to study the evolution of the prices in the different places of the Basque Country. The following pages explain how an interactive chart was developed which shows intuitively the spatial influence on the hotel prices over time.

In order to do that, the prices of all hotels in the Basque Country were collected using web scraping techniques. This chart shows the evolution of the prices of the different hotels over time, giving the option to filter them by category. It also allows the users to visualize only the already existing hotels price evolution (with a black layer in the areas without hotels) or a map merging the estimated hotel prices in areas without hotels with areas with already existing hotel prices. The number and category of the hotels added to the towns and cities without already existing ones was determined by interpolating the amount and type of real hotels in similar towns to the empty ones and estimating their prices.

The result for all this is a very accessible chart for all the population in which it is possible to see the influence of the different Basque regions in the prices of the hotels and, indirectly, in tourism, richness and international influence.


Recognizing Patterns in the Price Time-Series of the Basque Country Hotels

Mr Asier Badiola Zabala (EUSTAT) - Presenting Author
Mr Ander Juarez Mugarza (EUSTAT)
Mr Jorge Aramendi Rique (EUSTAT)

Download presentation

Different web browsers have been developed in the last years in order to try to find the best price on the market. That increase in the number of browsers has created more fluctuation and variability in the final price offered. In this study, we focus on the price of hotels and hostels located in the Basque Country with the data collected from the Internet using web scraping techniques. The main aims are recognizing patterns in the time-series of prices, analysing similarities with other time-series that may affect the price and forecasting. The tools used for those purposes are the different clustering and modelling methods for time-series. As a result, we have obtained different classifications depending on the cluster they belong to, we have identified some influential factors in the price evolution and made some forecasting using the modelling methods.


A Cross-Sectional vs. Longitudinal Case Study of Twitter and Presidential Approval

Ms Robyn Ferg (University of Michigan) - Presenting Author
Dr Johann Gagnon-Bartsch (University of Michigan)
Dr Fred Conrad (University of Michigan)

Relationships found between data from social media and public opinion polls have led to optimism about supplementing traditional surveys with new sources of data. Many of the analyses of social media data have tracked the frequency and sentiment of posts that contain a set of words over time. We focus on the relationship between Twitter data and presidential approval. Following previous analyses, we look at sentiment of tweets that contain the word Trump and President Trump's daily presidential approval through 2017. Under optimal smoothing and lag parameters, we find a correlation similar to what others have found in previous analyses. Using sentiment time series of words assumed to be unrelated to presidential approval, we construct an empirical null distribution of correlations under optimal smoothing and lag parameters. This null distribution suggests that the correlation we found between Trump tweets and presidential approval and correlations found in previous analyses are not as strong as they initially appear.

In addition to looking at Trump tweets cross-sectionally, we perform a longitudinal analysis with politically active Twitter users. We implement a random forest to classify politically active users as Democrat or Republican. By performing a change point analysis, we detect a clear change in sentiment immediately following the 2016 presidential election for these users, with Democrats more positive before the election and Republicans more positive after the election. The longitudinal analysis produces a stronger relationship with presidential approval than the cross-sectional analysis. However, in both cases, the inclusion of Twitter data fails to improve the prediction of presidential approval relative to our base model.


Developing an Effective Procurement Performance Data Approach for Predicting Expectations Gaps in Construction Contracts at District Local Governments In Uganda

Mr Charles Kalinzi (PhD Student-Makerere University) - Presenting Author
Professor Joseph Ntayi (Makerere University Business School)
Dr Moses Muhwezi (Makerere University Business School)
Dr Levi Kabagambe (Makerere University Business School)

Download presentation

In a bid to save road expenses incurred at the local government level, the Government of Uganda purchased and distributed and started using road construction equipment on their local road network extension and continuous improvement. The road users, while comparing the performance standards before (while being outsourced) and now (while being worked on internally) are increasingly becoming dissatisfied with evidence-based reports showing growing concerns of DLGs failing to meet road users' performance expectations. This study attempts to investigate the existence and nature of performance expectations gaps through contextualizing public procurement performance expectations utilising 'Big Data' on public works contracts at DLGs to yield information about patterns and practices that may be indicative of procurement performance expectations gaps in relation to construction contracts to meet stakeholder performance expectations.

The design and methodology will involve using a “success story case” where variables are drawn, supported by the Cultural Historical Activity theory, Stakeholder theory and Path Dependence theory, in trying to explain the current phenomena. Extant literature is used to develop a conceptual framework of an integrated model that will be used to monitor procurement strategy implementations in the subsequent contracts. It is expected that the model will be applied to construction contracts in the future to identify its strengths and weaknesses in assessing performance expectations of stakeholders where such contracts are being implemented.

The studies addressing expectation gaps are predominantly in auditing (see C. Adams & Evans, 2004; Brennan, 2006; Humphrey, Moizer, & Turley, 1993). Such performance gaps in audit and finance do resemble the performance gaps in public procurement management, but available knowledge is limited in the public procurement context basing on road construction data from DLGs. This study intends to borrow this concept and apply it to investigate procurement performance expectations gaps that could explain the performance lapses in public procurement.

Practical implications: Identifying these critical variables in future work and exploiting them as a means of improving procurement performance expectations would support government performance and accountability at DLGs, optimise the interaction between public procurement systems, and shape future strategic decisions by enhancing the holistic stakeholder embracement and management to promote greater openness and transparency in the future.
Originality/value: From a theoretical perspective, this paper will attempt to identify and combine theories that can be used to create a model to explain the procurement performance expectations gap in relation to construction contracts to meet stakeholder performance expectations. This study opens up avenues for enriching future studies on public procurement performance expectations gaps using performance data supported by a combination of CHAT, Stakeholder Theory, the theory of Path Dependence, and other theories from various disciplines. This might provide new insights into managing procurement performance expectations to the satisfaction of stakeholders basing on real-time road construction data.


Income Inequality Through People's Lenses: Evidence From the OECD Compare Your Income Web-Tool

Dr Carlotta Balestra (OECD) - Presenting Author
Mr Guillaume Cohen (OECD)

Over the past few decades, inequality has increased in most developed countries. Standard theory suggests that growing inequality should raise support for redistribution policies because politicians react to the preferences of the median voter. However, the data does not provide clean support for the suggested relationship between inequality and redistribution.

Various studies have investigated reasons for this seeming contradiction between theory and data, including, among
others, the prospect of upward mobility, systematic differences in demand for redistribution by subgroups of the population or a lack of connecting a problem like inequality with public policy. Another way to reconcile these conflicting findings is that individuals misperceive the true state of inequality and that policies are based on perceived inequalities, which differ from the true state. Cross-country evidence already shows that important misperceptions of inequality persist and that indicators for perceived inequality are a better predictor for redistributive preferences than objective measures such as a standard Gini coefficient, thereby providing support once again for the median-voter model on the basis of subjective inequality. Against this background the following questions come to mind: Do perceived inequalities differ from the inequality measured with more objective indicators? What can explain differences in perceptions of inequalities among the population? How do such perceptions influence personal attitudes towards matters related to inequality?

To answer these questions, in May 2015 the OECD launched Compare Your Income (CYI), an innovative web-tool that allows users from OECD countries to compare perceptions and realities, by looking at where they fit in their country’s income distribution. The tool received a lot of media attention at the time; since the launch, CYI has gotten over 2 million visits. The paper presents results from the analysis that the OECD has conducted on CYI users’ replies and sheds light on what people have in mind when they think of income inequality.