17 May 2015
Last week, I took part in a debate on the role of big data in development, at re:publica here in Berlin. The session was fun, and interesting, and a couple of things have stayed running around my mind since then, helped by some other inspiring talks that I saw at the conference, and conversations I had.
The biggest one: that, especially in international development, we seem to assuming that within big data holds all sorts of answers. Within the context of the “data revolution”, big data is put on somewhat of a pedestal, and in my opinion, we’re putting too much faith in the insights that can actually be gained from big data alone.
Tricia Wang did an excellent talk on the current obsesession with big, quantitative data, and called for qualitative data to be added to big data before we draw any conclusions. She referred to this additional data as thick data - small sample, qualitative data, with cultural context.
As she explains far better than I can - for us to genuinely draw insights from big data, we need actual context of the people or actions reflected in the data to combine with it. I would go so far as to say many of the points Tricia pointed out aren’t only relevant for big data, but also for many smaller quantitative datasets which, without context, can also be meaningless. Suffice to say, she makes the point that we shouldn’t be using big data alone to draw any insights, without supporting qualitative data.
That no data is neutral has been said before, but the implications of this, and the possibilities for discrimination and badly informed decision making, seem to be somewhat lost in the narrative I’m seeing around the data revolution. Our understanding of “official” data as something similar to authoritative “truth” - or talk of “data as a resource”, for example, seem to gloss past many of the problems that data can hide, and cause - take, for example, this excerpt from the data revolution report:
Data is a resource, an endless source of fuel for innovation that will power sustainable development, of which we must learn to become effective and responsible stewards. Like any resource, it must be managed for the public good, and to ensure that the benefits flow to all people and not just the few. Data must be available, and must be turned into the information that can be confidently used by people to understand and improve their lives and the world around them.
from A World That Counts written by the United Nations Secretary-General’s Independent Expert Advisory Group on a Data Revolution for Sustainable Development.
Numbers can hold bias, just as much as words can, and in many cases, bias within a spreadsheet is a lot harder to find, and much easier to hide. Perhaps more worryingly, the number of people with high enough levels of data literacy to understand and mitigate these biases is, from my experience, orders of magnitude lower than the number of people talking about the benefits that data can bring to the internatioanl development sector. Concrete examples might be thinking about who is collecting the data, how the numbers are manipulated and collected, who is represented (or not) - and, perhaps most importantly in the case of international development, the huge impact of varying cultural contexts, and - as Tricia mentions - the need for deep cultural understanding.
In the debate I mentioned above, we were shown examples of the academic research that took place with Orange’s telecommunications data from Senegal as part of the D4D challenge - but none of the ones we were shown were carried out by Senegalese researchers. It turns out that from the 260 research labs that showed an interest in the data, just 11 came from within Senegal. I can’t help but wonder how the data might have been differently interpreted by people who had an understanding of the country and context.
In the same debate, Nanjira gave a great example of how researchers visiting the Kibera slum in Nairobi hugely misinterpreted data they gathered on women collecting water near their homes, because they simply hadn’t taken into account human behaviour, and cultural differences.
But gathering thick data, as Tricia calls it, is in many ways a lot harder than gaining access to big data sets. It requires a considerable time and resource investment, people with understanding of the area in question, and the abilities to conduct ethnographic research within those communities - all somewhat less “cool” or impressive than a shiny infographic or quantitative data set, unfortunately.
For the data revolution to really lead to positive social impact, I echo Tricia’s sentiment - it can’t comprise solely of quantitative big data, and it needs to have context. I’d highly recommend watching her talk, if you haven’t already!