BigSurv18 program


Wednesday 24th October Thursday 25th October Friday 26th October Saturday 27th October





Computers vs. Humans: Who's Better at Social Science?

Chair Mr Masahiko Aida (Civis Analytics)
TimeFriday 26th October, 11:30 - 13:00
Room: 40.213

Data-Inspired Life - How Data Science and AI is Pushing the Boundaries of Human Behavior

Dr Zeeshan-ul-hassan Usmani (MiSK Foundation) - Presenting Author
Miss Sana Rasheed (Telenor)

Humans have been generating data since Adam and Eve. We have been recording this data for the past couple of centuries and storing it digitally since half a century. Now, we have the tools and computational power to record, process and analyze it economically with the speed human race has never seen before. Internet-of-Things (IoT), Open Data Initiatives, Census and crowd source platforms like Kaggle has opened many venues that were not possible even a few years ago. With so much data being generated, shared, analyzed and loop-backed into the decision making process, it’s the data that is generating and influencing humans’ behavior and social life. The heterogeneous, diverse, altruistic and pathological recursive approach of data generation and feedback is impacting how we live, learn, work and give. This paper explores the ambiguous relationship between open datasets, humans, artificial intelligence and cyber-connected systems (IoT and Robots) and analyzes the weightage it receives in a given situation. We will also present the modern day tools and systems like machine learning libraries (scikit-learn, stacknet), dataset crowdsourced platforms (Kaggle) and statistical programming tools (R, Python etc.) to understand the undercurrents of all the happenings that would lead to super-intelligence.

This paper also presents the visual presentation of modern day human beings and how we are consumed by the data we generate and observe. At the end, we discuss the ethical dimensions of living in a cyber-connected world and what it would take to work with machines.


Can Computers Compete With Human Experience?

Dr Gaye Banfield (Birkbeck College) - Presenting Author

Download presentation

Artificial Intelligence has moved out of the realms of science fiction and into our everyday lives, particularly in the consumption and processing of large volumes of data. Data science - the scientific methods, processes, algorithms and systems that extract knowledge or insights from data in various forms, either structured or unstructured, is an integral part of daily living. Artificial Intelligence is used in the internet search that is personalised to our needs, digital advertisement that targets our wants; recommendation systems, used by Amazon et al, to personalize our virtual shopping lists. The method by which this is achieved is to process extensive amounts of data, recognising patterns in that data and drawing conclusions. This paper examines if Artificial Intelligence can compete with human experience – the ability to draw upon stored memories and to arrive at a conclusion – a gut feeling, an instinct to reach a better conclusion. True, a human cannot process data as fast as today’s computers, but evidence suggests that a human can look at data and know instinctively that something is wrong, based upon experience. This paper examines the definition of human experience, looking at the current studies on intuition and Artificial Intelligence in the context of “real-world” examples where human experience outplays the computer, for example, the flash crashes seen recently in the trading arena. Finally, this paper aims to demonstrate how human experience is a valuable asset in the development of Artificial Intelligence in the progression of data science.


Detecting and Comparing Survey Research Topics in Conference and Journal Abstracts

Ms Alison Thaung (Booz Allen Hamilton) - Presenting Author
Mr Stas Kolenikov (Abt Associates, Inc)

Historically, large scale analyses did not incorporate much text data because analyzing it was labor-intensive and time-consuming. Text data, however, continues to grow as people increasingly connect over digital platforms and communicate in writing. Text content and semantics data provide valuable context that complements quantitative analyses and broadens our understanding. Text mining and natural language processing are approaches used in data science that facilitate large-scale analyses of text data by increasing efficiency; uncovering overlooked patterns; and improving consistency.

To reveal the power of text mining using data science methods, we will develop topic models that assess the array of research topics from conference abstracts and journal abstracts in the field of survey methodology and statistics. Utilizing ten years of abstracts from the Joint Statistical Meetings (JSM) conference and abstracts from relevant journals (e.g. Journal of Survey Statistics and Methodology, Public Opinion Quarterly, Journal of Official Statistics) from the same time period, we conduct a descriptive analysis of their relative topics and themes using text mining algorithms. The groups of presentations and publications relating to big topics such as nonresponse, weighting, small area imputation, sampling design, use of admin and paradata, are very clearly identified in our text analysis.

We compare results from a Latent Dirichlet Allocation (LDA) and a Non-negative Matrix Factorization (NMF). In LDA, each document or abstract is assumed to be an unstructured combination of words or n-grams similar to a bag of words. It assumes that a small set of frequently used words denotes a topic, and generates a set of probabilities that a document has a specific topic. NMF utilizes two matrices that represent the data - a topic-term matrix and a document - topic matrix – and approximates a new matrix by factorizing the two matrices, thus compressing the original set of data. Dimensionality reduction afforded by NMF directly, or by LDA via multidimensional scaling, allows to visualize the text data to better understand and describe the differences, similarities, and interconnections of topics among conferences vis-à-vis journals.

We believe our work will help the profession in terms of identifying big topics and trends that modern survey statisticians and methodologists need to be familiar with. The breakdown of differences between the “conference topics” and “published topics” is an interesting one. On one hand, journals are publishing new hot stuff on the new directions of research, such as paradata or privacy and confidentiality considerations, which take some time to propagate into practice. On the other hand, the topics more prevalent in conference presentations, such as weighting and nonresponse, are more typical of the day-to-day applied aspect of survey statisticians’ work, and would inform curriculum developers aiming to provide their Master-level students with marketable and applicable statistical skills. Breaking analysis over time also allows for identifying the changes in how the survey world operates, and what it considers the most important research topics and issues.