BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





New Digital Data Sources and Official Statistics

Chair Professor Ralf Münnich (Trier University)
TimeSaturday 27th October, 14:00 - 15:30
Room: 40.063

New digital data sources and, in particular, the combination with survey and administrative data is becoming more and more an important Official Statistics production. However, still many challenges have to be solved including data privacy, sample selection, and integration into the statistical production process. The present session will provide four different approaches of using new digital data sources in combination with traditional data showing challenges and opportunities of its use. The data sources cover satellite data, mobile data, as well as twitter and related data.

The Use of Big Data to Improve Small Area Estimates of Multidimensional Poverty Indicators

Professor Monica Pratesi (University of Pisa) - Presenting Author
Dr Stefano Marchetti (University of Pisa)
Professor Caterina Giusti (University of Pisa)
Mr Vincenzo Mauro (University of Pisa)

The use of big data in many socioeconomic studies has received a growing interest in the last few years. In this work we review and comment on the results obtained for the estimate of poverty rates using auxiliary variables coming from big data such as Google trends, Twitter and GPS in small area models specified at area level (provinces, Regions and Local Labour Systems in Italy).

We show that the indexes based on big data sources have a potential in predicting our target variable, provided that the auxiliary data are treated focusing on some sources of errors as selectivity, error propagation and endogeneity and that using those indexes as auxiliary variables in the small area working model can result in a further reduction of the estimated mean squared error with respect to the same estimator that does not make use of these indexes.


City Data From LFS and Big Data

Mrs Sandra Hadam (Statistisches Bundesamt) - Presenting Author
Professor Timo Schmid (Freie Universität Berlin)

Download presentation

Reliable knowledge on labour force indicators of a country’s population is essential for sound evidence-based policymaking. For instance, the geographic distribution of employment rate, unemployment rate and educational attainment is used to make decisions regarding the allocation of resources. The Labour Force Survey (LFS) is generally designed to provide reliable estimates of these indicators for larger domains such as the national or regional level than the city level. However, to make policy proposals in urban areas we need to get deeper into the functional urban area level. One possible way to derive estimates on spatially disaggregated levels, like communes or cities, is by using small area methods. The production of precise small area estimates of indicators relies on the availability of predictive auxiliary variables. Therefore, in addition to the usage of LFS information, alternative sources of passively collected data like mobile phone data will be used for small area estimation. The main idea for this application is to use anonymized and aggregated mobile phone data of the German Telekom as auxiliary variables to estimate LFS indicators for functional urban areas. The methodology depends on the approach and procedure of Schmid et al. (2017), who predicted socio-demographic indicators by using mobile phone data for Senegal in combination with survey data. They used the Demographic and Health Survey (DHS) of the year 2011 and estimated the literacy rate for women and men for regionally disaggregated areas. In the Essnet project ’City data from LFS and big data’ we will adopt this method. The motivation for using mobile phone data is that they are collected without interruptions and include valuable information on the timing of mobile events and intensities of aggregate mobile events. Since we possess aggregated mobile phone data for North Rhine-Westphalia, we might be able to predict the employment and unemployment rate at spatially disaggregated levels for cities like Cologne, Düsseldorf or Dortmund. For this purpose we use an area-level small area model, the Fay-Herriot model, in combination with covariates from mobile phone data. The area level model proposed by Fay and Herriot (1979) links the direct estimates with area-level covariates, which will depend on the auxiliary variables. Since the aggregated estimates on regional level can differ substantially from the corresponding direct estimator, a benchmark approach will be used to achieve the internal consistency with the direct estimator on regional level.


From Experimental to Official Statistics: The Case of Solar Energy

Dr Bart Buelens (Statistics Netherlands)
Dr Sofie De Broe (Statistics Netherlands)
Dr Ralph Meijers (Statistics Netherlands) - Presenting Author
Dr Olav ten Bosch (Statistics Netherlands)
Dr Marco Puts (Statistics Netherlands)

Traditionally based on censuses, surveys and administrative data, official statistics will increasingly incorporate or rely on big data, open data and sensor data. This requires a new approach to producing official statistics, including the use of statistical methods typically attributed to the fields of data science, machine learning and artificial intelligence. To gauge acceptance of statistical products composed in this way, Statistics Netherlands launched a website with experimental statistics. This website contains so-called beta products, which are preliminary research results that may or may not lead to new official statistics in the longer run. Feedback from the public is used as input for further improvements. Price information data collection by web scraping is an example that went through this process and has been adopted in the official statistics production of the Consumer Price Index. We illustrate our innovation strategy with work-in-progress: the case of solar energy. In the Netherlands, registration of domestic photo-voltaic installations is not compulsory. Consequently, accurately estimating solar energy production and consumption by households is difficult. We identified new data sources that can be used to improve solar power estimates. These include high-frequency measurements of the transmission grid load, meteorological data on solar irradiance and temperature, aerial imagery, electricity meter readings from distributors, data on tax-deductible energy efficient home improvements. We apply a range of data analysis, modelling and prediction methods to combine these data sources to obtain a better understanding of the contribution of solar power in the national energy accounts. Beta products are under development with the intention to roll them over into official statistics soon.


Satellite Data for Developing Social and Economic Indicators

Professor Ralf Münnich (Trier University)
Professor Markus Zwick (Statistisches Bundesamt) - Presenting Author

The Horizon 2020 project ‘MAKing Sustainable development and WELL-being frameworks work for policy analysis (MAKSWELL)' proposes to extend and harmonies indicators able to capture the main characteristics of the beyond-GDP approach, proposing a new framework that includes them in the evaluation of the public policies. In particular, satellite data are used to develop social indicators. Furthermore, satellite information will be used to provide small area estimates of the indicators of interest.

The ESSnet project ‘Smart Statistics’ researched beside other how business cycles could be descripted by satellite data. Satellite data as well as in situ data are well suitable to explore economic activity as well as changes of activities over the time like numbers of ships and container at harbors, airport using or thermal images of industrial facilities as examples.

MAKSWELL as well as the project Smart Statistics are using data coming from the Sentinel 2 satellites of the ESA program ‘Copernicus’. The both Sentinel 2 satellites collect the whole earth surface each five days. This data are tailor made to detect changes over short time periods. Both projects use additional to satellite data survey and administrative date.

The presentation will give a project overview as well as first results coming from the projects that started beginning 2018.