It all sounds very good in principle, but what happens when you explore big data for development in practice? It is this “learning by doing” philosophy that inspired our big data for development challenge with the World Bank and the UN Global Pulse.
Last week’s Masters of Networks provided the perfect opportunity for us to get hands-on and start delving into datasets.
First challenge: access to data. As Chris Kreutz noted, the world of big data is still, by and large, a world of closed data.
Luckily, the World Bank’s excellent open financial data site (including the details of World Bank contract awards) came to the rescue. It was great to see the eyes of data geeks shining when they realized the wealth of data sets available (development organizations, please take note: follow the Bank’s example!).
Second challenge: What questions? Since we were blessed with having a wealth of social network analysis gurus and economists in the room (including, among others, Guy Melancon, Bruno Pinaud, Marie-Luce Viaud, Raffaele Miniaci, Paolo Gurisatti, and Margherita Russo), we decided to query the data from a social network angle:
- Do certain networks of companies tend to win the majority of World Bank contracts in any given sector?
- What are the linkages of winning companies with other suppliers within a project?
- What is the pattern of knowledge transfer (for example, do projects foster south-south collaboration)?
Where companies gravitate in relation to other suppliers may give us information about whether some win contracts as a function of their location in a network (the more intricate one is, the higher the exposure to new information).
Conversely, that exposure may have negative effects in case of a financial crisis spreading through: since it spreads really fast, the more intricate one company is, the more vulnerable to catching the ‘flu’ (single dependency on a highly exposed company being even worse).
As an important note, these claims need to be refined – possibly using using Burt’s Structural Holes theory for instance – with respect to the relative position of an actor in the network. In other words, if you catch the flu and are nested within a community with no other connection to the outside world, you’ll recover when everyone does. Conversely, if you sit at the border of a “hole” (in Burt’s sense), you’ll recover faster – and hopefully help (one of) your community to recover.
But the overall point remains – knowing where some projects or firms are located in relation to others may give us important clues about risk management or factors that may lead some projects to under-perform.
Third challenge: What methodology? In our analysis we used information from one particular dataset (major contract awards from 2007 to 2013).
We created a network of all suppliers (nodes in the charts below) who won contracts in the given time period across all sectors. The edges (links connecting nodes) show that two companies are working together under one or more projects.
The brighter the edge, the more projects two related suppliers cooperate on. The size of a node reflects the total funding each supplier received (the larger the node, the larger the amount), and the colour of a node a country that the supplier comes from.
So… what did we find?
Before we get there, a quick caveat: none of us have an in-depth knowledge of the World Bank’s operation.
Yet, we think it is worthwhile sharing our findings (if we can even call them that) as an example of the challenges of querying and telling a narrative from open data sets from a social network analysis perspective.
1. First, we wanted to see what the networks of suppliers within different sectors look like.
Do companies tend to be tightly linked among each other (many companies working together on many projects) or is there less connection between individual suppliers?
Graph 1 below shows companies working in the health sector, and they form a very dense network compared to education (Graph 2), where connections are more spread and less tangled.
The overall look of a network may allow a comparison between sectors to see whether there are dominant companies who tend to win majority of contracts versus contracts being evenly spread.
In this particular case, the topology of a network may indicate that in the education sector, there are less dominant companies or that projects tend to spread contracts more evenly among different suppliers.
2. We followed up looking at whether there are companies who tend to win a majority of contracts in any given sector and whether we can learn anything interesting from the linkages between them and other suppliers within any given project.
Graph 3 (below) shows a subset of suppliers who won contracts in 2007 in the transport sector. We can see that two companies are dominating the network (pink nodes), and that these are also densely connected to several sub networks of companies (showing that they supply services across several projects). We also see that companies from certain countries are dominant (pink or orange, and blue in the left part).
One question jumps out as we’re looking at this network – if any type of shock affects the largest companies (pink nodes), how would that impact projects they are supplying services to?
How would that impact other companies that collaborate with them?
What factors lead to a single company winning most contracts relative to others working in this project (blue node on the left, the amounts show a difference between 1 and 2 orders of magnitude with the other suppliers in the group)? Contrast that with a sub-network to the upper part of the network, where there seems to be an even spread of contracts among the companies.
3. We then wondered about patterns of knowledge transfer. Do a majority of suppliers win contracts in geographical proximity to their country of origin or not? We found three different cases with a quick analysis (see below, Graph 4).
On the far left, we see a network of companies from Brazil providing services to a project in the same country.
In the middle Graph, most companies are located in Iran, with few from Europe (France and Germany).
In the last case, suppliers from many different countries are involved in the same project.
Given more time, we would find it interesting to analyze the data further to answer questions like:
- To what extent do World Bank contracts facilitate collaboration of suppliers from a close geographic region?
- How does the historical relationship between different countries change throughout time as reflected in their cooperation on implementation of the World Bank contracts?
4. Lastly, we wondered if there are networks of companies that tend to apply jointly for certain projects, what are the patterns of collaboration between companies who cooperate on many projects, and what drives this?
In the Graph on the left, we found a case of four companies collaborating on two projects (white edges connecting the dots) with an additional company (smallest node connected to other via red edges) joining in subsequently.
When we look at the Graph on the right that represents a larger network of projects and suppliers, we see cases where these companies collaborate on many other projects. We’d need to do more analysis to understand the complexity of these relationships across many projects, and to understand better what drives companies to collaborate in any one sector.
We have still quite a lot of work to do…
So, one big take from this exercise is that, if we really want to get down to the granularity that would help improve development operations, we still need to drill down to answer more questions, for example:
- What characterizes an environment in which certain projects tend to under-deliver?
- Are there patterns in terms of what partners are involved in implementation (NGO vs. Gov’t vs. private sector); geographic location; development sector; conflict or post conflict setting; size of contracts issued under a project?
- Is there a good match between a project’s location and development priorities for that geographic region?
- What is the pattern between contracts awarded to companies from any given country and that country’s relationship with the World Bank (e.g. amount of loans and programs it receives)?
- How can we account for the role of intermediary firms (e.g. consulting companies who prepare documentation that a lead supplier submits) who may play an instrumental role in bidding for projects, but are rarely captured in the official process?
- Are there links between companies who tend to win most contracts and their employees (e.g. do they tend to employ ex-development workers, or former government officials)?
And with the new questions, came suggestions on additional datasets that could be useful in providing answers, such as:
- Partners for each project
- Audit reports
- Project and staff evaluations
- Data on all companies/bidders in the contracting process (not just those who win in order to understanding variables that lead to companies winning or losing contracts)
- Cities in addition to countries and regions that project is implemented in
- Roles (and specific knowledge) of companies involved.
Last but not least, from a network science and data mining perspective, a few questions emerged for future analysis:
- What model(s) would best answer our questions? (single entity networks such as the supplier example studied here, or multi-level/multiple entities networks)
- How to combine network topology with data attributes/variables so we can cluster the network and then what would represent a cluster?
- What meaning can we attach to traditional network measures (such as Centrality)?
- What other variables may we consider representing in our networks in order to better understand what drives the interaction of specific supplier network?
- Considering the multitude of parameters for each project, would interactive visualization provide a better tool for understanding and analyzing patterns related to companies’ behavior?
Considering that this analysis looked at the level of supplier network, we may want to transpose Ronald Burt’s theory on structural holes in organizations that considers topology of a company’s internal network and actors connecting communities to better understand the company’s behavior and the roles of its actors.
At the end, Masters of Networks proved very valuable for the World Bank, UNDP, and the UN Global Pulse initiative on using big data to improve operational effectiveness of development organizations.
We are at the beginning of understanding better the potential buried in the heaps of operational data of development organizations.
Still, there is a lot we have to do but in having more iterations in refining the questions we’re asking and more analysis to ensure that we add a layer of context to the data.
We will take that next dive in Vienna on February 23rd with the Open Knowledge Foundation, that will bring us one step closer to getting more out of big data for development.
P.S. Special thanks to all colleagues who participated in the work of this group, and who provided insight and advice that fed into this write-up.