In this project, we analysed over one trillion online/offline observations to build the world’s richest data set on internet activity around the globe. With this data, we can examine human behaviour on a previously unimaginable scale. The data has already been used to study global sleep patterns, the diffusion of the internet and the internet’s impact on the economy, but that is just the beginning.The culmination of over three years’ work, our ground-breaking database provides a first glimpse of the potential for global internet activity data to profoundly change the way research on human behaviour and social interactions is conducted, and the types of questions we can ask and answer.This has become possible because, for the first time in human history, half of the world’s population is connected to a single network on which every device can instantly, and at negligible cost, passively query the other’s online or offline status, rendering the internet a powerful and unprecedented social data-science platform.
Working paper: https://arxiv.org/abs/1701.05632
The goal of this project is to go beyond nighttime light luminosity as a measure for subnational economic development or social activity and use daytime imagery. High resolution daytime satellite images are becoming relatively more available for the entire world and compared to night lights,it contains more information about the landscape which are reflective of economic activity. As mentioned before, night lights havethe limitations of distinguishing between the poor and densely populated areas and in these cases, the daytime images can fill in the gap.However, with more information in the daytime images, they are highly unstructured and thus makes it rather difficult to extractinformation which can be scaled to some economic measure.We first employ a convolutional neural networks (CNN) approach extract physical features from the daytime images (e.g., roads, railways, buildings)The predicted values then will be used, to build a second model, to predict economic indicators (e.g. GDP) fromthe aggregated OSM predictions, from grid cells to regions.
Social unrest impacts societies all over the globe. Causes of unrest are many and varied, but are often triggered by either machinations in the political class, or basic Economic factors. Unfortunately, in many authoritarian or autocratic regimes, social unrest leads to violence, destruction of property, and sadly physical harm and death. The aim of this project is to develop a method to predict the likelihood of social unrest occurring in a given location by using alternative and economic data. Using alternative data from the GDELT [link http://www.gdeltproject.org/] textual analysis project, together with economic series from the OECD and other sources, we use AI methods to predict the propensity of social unrest one to four weeks in advance. It is anticipated that such a method will provide the international community with much needed forewarning of likely flash-points around the globe, such that diplomatic, observer, and other actions can be taken to diffuse the crisis, protect citizens, and document any resulting harms.
This project studies empirically the effect of the Internet on protests worldwide. We compile a novel panel dataset that combines geo-referenced data on Internet quality and weekly protests for over 18,907 subnational (ADM2) districts from 236 countries and the years 2006-2012.The Internet penetration data was constructed by combining over a trillion (1.5 X 10^12) IP activity(offline/online) observations to a commercially-available, IP-geolocation library. Our identification strategy exploits random weekly variation in global Internet latency to identify the causal effect of the Internet on local protests. According to our estimates, latency-adjusted Internet increases the occurrence of local protests. We show that most of the variation in the effect of the Internet on local protests comes from national differences in political institutions and local differences in Internet penetration.
The aim of this project is to develop methods to provide international socio-cultural similarilty measures, at scale, from revealed human preferences. For decades the World Values Survey (WVS) has been the gold-standard of international cultural similarity and comparison, however, the WVS's strength is also its weakness: by conducting face-to-face values surveys with carefully constructed representative sample sub-populations, the WVS delivers data-source control up-scaling potential. However, face-to-face surveys are known to be highly problematic in eliciting personal values information, especially in repressive or semi-autocratic political contexts. Furthermore, the WVS machinery is highly costly to implement, with survey waves necessarily separated by five year periods. With more than 50% of humanity now online, alternative data sources, like Google search queries, provide promising opportunities in this space: queries are self-revealed and often highly intimate, queries can be geo-located and analysed in near real-time, and in almost all countries queries to Google cover increasingly large fractions of the population.
This project explores the relationship between extrinsic (monetary) rewards and intrinsically motivated behaviour in the context on the online discussion platform reddit. Data was obtained from the online forum over selected periods between January 2014 and August 2017.A python script was written to scrape data from the website, including the following variables for each comment; opening poster of the thread in which the comment is contained, date of that thread, comments in that thread, textpost of comment, upvote ratio, and time of the comment. Our python script to scrape reddit subreddits can be downloaded here.