Filed under: Development Development 2.0 Disaster response Peace and security

We’ve all heard it many times before and I’ll repeat it: we live in an age of constant disruption. Being caught off guard has slowly become a part of the everyday parlance. This is painfully true in development. A conflict or a disaster can set back years of development.

A sudden drop in unemployment can have unpredictable and long lasting impacts on health, education and productivity years after it takes place. So we are on a constant lookout for methods that would give us the smallest hint about the upcoming changes and signals that something is ‘cooking’ so we can better prepare.

A growing number of private sector companies and, increasingly, development organizations, are looking at tools to augment their current ability to monitor the external environment to detect potential anomalous patterns.

A whole new generation of companies is growing to meet the demand for this type of intelligence. But for all the new gadgets, no single one can replace human intelligence, the analyst’s experience, intuition and expertise that contextualize the investigation – though it can help make smart people, well, smarter.

UNDP and Recorded Future tested whether their methods for analyzing big data, the vast amount of public source information, can make our organization better at detecting early warning signs.

Recorded Future specializes in web intelligence, providing forecasting and analysis tools that improve analysts’ ability to detect irregular patterns. It scans over 100,000 online sources, extracting, measuring and visualizing data according to given parameters.

We picked Georgia as a case study because of UNDP’s strong presence in the country and because of a watershed moment in 2008 that we thought would be a good case study for testing various hunches on managing risk (although with a benefit of a hindsight). So what did we learn?

Helicopter view of patterns and trends

First, we wanted to see whether the tool would show the rising frequency of mentioning Georgia in online sources in the period leading up to the August 2008 conflict.

We analyzed the period from April (when President Putin instructs the Russian Government to boost ties with authorities in Abkhazia and South Ossetia) and August 2008. We expected to see a rising trend of Georgia being mentioned in online news as we neared August, and that is exactly what we found.

Graph - Georgia overview: April - August 2008: dots-stories, larger the dot, the higher the number of sources

Georgia overview: April – August 2008: dots-stories, larger the dot, the higher the number of sources

Graph - Overview of Georgia from April to August 2008: Momentum

Overview of Georgia from April to August 2008: Momentum

Not surprisingly, the two charts above show a clear spike in frequency of mentions of Georgia around August 2008. More importantly, the system aggregates vast amounts of information, turning them into a pattern that can guide a more specific inquiry and analysis.

Ability to see the forest and the trees

Second, we wanted to get a sense of who was writing about Georgia at that time, helping us understand what sources are factoring into the analysis (and which are not).

An individual could not possibly hope to go through thousands of sources on any given topic and make sense of it in time for any meaningful decision making. Media sources greatly influence the analysis and we needed to understand what are the strengths and limitations of the data that the tool was presenting to us.

Tree Map of Sources Reporting about Georgia in summer 2008

Tree Map of Sources Reporting about Georgia in summer 2008

With one click, we were able to see the proverbial forest and the trees. The system transformed the previous two charts into this colorful one showing us who was writing about Georgia between April and August 2008, revealing some shortcomings that we had to factor into our analysis:

  1. Only English speaking sources are available
  2. Partly related, there is a lack of local, niche media
  3. Micro-blogging sources are not factored into the analysis (the data doesn’t represent the full spectrum of online sources)
  4. Insufficient data from 2008 (the system picks up significantly more stories as it nears to the present day)

Challenge your assumptions – and make it a habit

After the initial exercise with media sources, we were ready to start testing some hunches. Our immediate tendency was to rely on what we knew from the past, construct a model around it, and test it. To be more precise, our first hunch was that statements about violence would be the key leading indicators of future violence. We thought that violence would be preceded by an exchange between leaders and different parties that would lead to the actual violence.

Interestingly, we didn’t find any supporting evidence for this. The lesson here is important: we may be caught by surprise if we analyze the future by relying on the past alone.

Identifying irregular patterns of potential instability is incredibly contextual and requires continuous, almost real time learning about the environment in which we operate and a feedback loop of that knowledge into our scanning process. The analyst is constantly experimenting and discovering new information that becomes a part of the model through which we understand the situation.

Consider the data, then build assumptions about what you’re seeing and not what you’d like to find

Sounds simple enough if it weren’t for our inherent decision making biases, but if you can muster it, really fun things happen. For example, the system picked up many statements by the Georgian president in the run up to the October elections on investments and development.

So we wondered if content of a message may provide insights about the speaker’s attitude to the status quo. There are two related points here.

First, if a statement contains references to military and security issues (as opposed to tourism development), focuses on response and reaction to what others are saying, it is is indicative of a speaker’s discontent with the status quo. We present several broad categories of tone of the statements and their impact of speaker’s attitude about the status quo:

Content of message

Attitude about status-quo

Military and security Extremely negative
Political and diplomatic Negative
Trade and investment Positive
Tourism and development Extremely positive

In the chart below, we look at the sample of actual stories and statements across two extreme time periods in order to test for the tone of statements’ hunch.

Graph - Content of statements, April 1 2008- September 7 2012

Content of statements, April 1 2008- September 7 2012

We see a progression from statements about the military as well as reactions to what others are saying and doing in 2008 to stories about investments and even reconciliation leading up to the elections in October 2012.

Then we wondered if a statement about the future indicates low expectations about upcoming instability – in other words, a higher volume of forward looking statements indicates positive attitude about the status quo.

The chart below shows all forward looking statements by President Mikhail Saakashvili from January 2008 to September 2012, when he refers to various plans that would take place in the future (such as, a large infrastructure project will begin next year, or an international song contest will take place at the end of the year).

Graph - President Saakashvili – Forward Looking Statements

President Saakashvili – Forward Looking Statements

The graph offers two takeaways. First, there don’t appear to be any statements about the future until early 2009. Second, this changes as we move toward the present day, with increasingly more forward looking statements.

Though we wanted to quickly conclude that our hunch checked out fully, the specialists on Georgian affairs flagged a serious issue with this analysis. The very successful donor conference held in Georgia at the end of 2008 was sure to result in President Saakashvili’s focus on the future – where would the funds go and which projects would be funded. The system didn’t pick up these statements, and it appears that the reason is connected to one of the shortcomings we highlighted at the beginning – insufficient data going back to 2008 and 2009.

Next steps

So what did we learn? This tool focuses on just a segment of data, and while it isn’t exact science, it can help make our smart analysts even smarter and more effective.

It lays out a helicopter view of trends in almost real-time (the forest), and provides easy access to individual sources of information (the trees).

It helps to aggregate incredibly large amounts of data into patterns that no single analyst could hope to do, but patterns that would make little sense without the analyst’s intuition and context. It challenges our instincts to view the future by studying the past.

Based on our work, UNDP and Recorded Future will continue joint work on analyzing public source information for the purpose of risk management. This next phase will focus on conducting a regional political risk analysis and forecasting for South Eastern Europe and Central Asia.

The research that underpins this blog post resulted from cooperation with Munish Puri, from @RecordedFuture – follow him on twitter at @whypurifly

  • Miroslav

    Great blog and great tool Millie to predict future activities (political, social, business etc.) in the country or region!

  • Nice to see that international organisations are using power of big data.

    • Millie

      Miroslave, thanks for the comment- we’re starting to work on using this type of a tool to improve region-wide analysis.

      Adriana, thx! 🙂 It’s no longer a question of whether someone wants to use it or not but rather how quickly to we learn from big data and make it operational for development! We’re all on a steep learning curve with this one!