Software as Craft

I’ve been doing ABC (“always be coding”) lately. I also admired an Icelandic sweater at the CTO Club yesterday. And it got me thinking about the connections between the two.

Mathematics and knitting have a long history together (spoiler alert: I accidentally opened my birthday presents from my sister early: thank you so much AJ for Making Maths with Needlework and Crocheting Adventures with Hyperbolic Planes), and early hackers like Babbage reused equipment (e.g. punched cards) from the textile industries for their code.  But I’m not sure whether creating an app has been described in terms of knitting a sweater before. So here goes.

We have the basic knitting stitches (stockinette, garter, ribs etc) – these to me are like the languages in computer science (Ruby, Python etc).  Each of them is composed of smaller things (individual stitches/if statements, themselves created from lengths of wool/characters) which can themselves be more or less complex, but the basic idea of a foundation is there.

Going up, we have the framework that we’re building in – for a garment, that’s the shape that we’re building it in (you thought a jumper was just a front, back, a neck and two arms? Oh boy…), for a language, that’s a framework like Django, Rails or Kohana.  Both of these have a set of modern conventions (e.g. ORM) that come out of earlier and for-a-while-forgotten idioms.

But even if you have a framework and a stitch in knitting, that’s not the end of the story.  Knowing Python and understanding Django doesn’t make you a good app developer.  And knowing cable stitch and raglan shaping doesn’t make you a great Aran sweater (and no, you don’t want to know what a three-needle bindoff is, honest!).  What makes a great Aran sweater is years of looking at other people’s Aran designs, years of actually getting down and knitting different designs to really understanding the history of why some subpatterns keep recurring.  Basically, if you want to be a great knitter, then knit. Lots (it helps to have small relatives – the mistakes take less time to create, and generally said relatives don’t care so much when your dinosaurs look like strangely-shaped ponies). Design your own sweaters and learn from them why the classic designs are shaped the way they are. Seek out experts, examine their work in detail (in knitting, many of those experts are long dead – in computing, not so much at the moment), listen – really listen – to their experience, and ask “why” they did things a particular way. And just plain keep on building stuff.

And so it goes with code.  If you want to be a great coder, then code.  Read other people’s code. Make mistakes. Commit to something sizable of your own design and learn the hard way. Understand not just the stitches and the shape, but also the reasons for the patterns that you see (and which you rarely see tutorials for: language and framework tutorials are easy to come by; tutorials on how to best structure frameworks, not so much…).

Hmm. Agile. That, perhaps, is more of a patchwork quilt… you can start small and add pieces, swap out pieces that don’t work, but at some point early in the process, changing out the top-level design (unifying patterns and colour scheme) becomes really really painful.

I could head off on a tangent about regional differences in coding style now, but I won’t.

FIXIT: Ushahidi location lists

Today, a conversation amongst crisismappers went something like: “when are we going to migrate to IRC?” (this in response to yet another Skype idiosyncrasy getting in the way of the team), then “not until we have a decent interface for it on all platforms” and “we asked the weekend hackathons for this, but it’s not sexy: you can’t tell people you’re saving the world, or helping starving children with it”.  That’s a whole pile of cynicism and frustration, but behind it are three things: 1) mappers still don’t have all the tools they need, and are relying on people-as-processors to get round that, 2) mappers don’t know how to ask for tools in ways that gets them what they need, and 3) hackathons may not be the best place to get those non-sexy tools built.

So. Options. Best is to find a good piece of existing open-source kit that almost meets mappers’ needs, and extend it with the things needed.  Less-good is to build all-new kit – mappers aren’t hackers, and new kit inevitably breaks and needs maintaining and training.  In-between is adding things onto proprietary kit using APIs (if they don’t get removed – yes, I’m looking at you, Skype team), and adapting existing open-source kit that doesn’t meet needs, but could be adapted (with the programming equivalent of a carefully-swung sledgehammer).  But that’s just options from the mapping teams’ perspective: another option is to team up with a coding company that wants to build tools for a similar (or adjacent) market.

I’m just as guilty for not documenting the things that I think need improving.  I’m used to writing “FIXIT” in my code when I know there’s a potential problem in it – so here’s the start of a set of posts about things that could be upgraded.  Some of them, I’ll start trying to fix – and document that in updates to each FIXIT blogpost.

 

ushahidi_find_location

There are lots of little things that bug me about the technologies I use in mapping.  One of them is repeating geolocation – specifically, having to find the same addresses multiple times in Ushahidi because I can’t upload a new list of addresses to the “find location” box (see above).  Now, I’m not going to go very far with this thought because it’s quite possible that someone in the Ushahidi community has already fixed this, but here’s the how-and-where (for Ushahidi version 2.7, which will be superseded sometime next year by 3.0):

  • When you press the “find location” button, Ushahidi calls (eventually) a function called geocode($address) in file application/helpers/map.php
  • This calls the google map api: for instance, for the address “Mount Arlington”,it calls http://maps.google.com/maps/api/geocode/json?sensor=false&address=Mount%20Arlington (don’t worry about the %20: it’s the internet’s way of saying “space”).  The API call produces json data (go ahead and look at the page – it’s all there), which Ushahidi then pulls the country, country id, location name, latitude and longitude from.
  • Erm. That’s it.

A geolocation team on a deployment is responsible for finding lat/longs for location names.  They usually keep a list of these lat/longs and location names, and that’s usually in a Google spreadsheet.   So what we need here is:

      • An Ushahidi variable holding the address of the geolocators’ spreadsheet.
  • That spreadsheet to be in a recognisable form – e.g. it has a column each for latitude, longitude and placename
  • A piece of code inserted into function geocode($address), that when the Google map API comes up blank, checks the Google spreadsheet for that location instead (or maybe checks the spreadsheet first: that depends on how the geolocation teams usually work). That piece of code will need to use the googledocs API, which is possibly the hardest part of this plan.
  • Maybe even (shock, horror), check the *other* map APIs (openstreetmap etc) too.

None of this is horrendous, but it does take time to do.  Perhaps someone somewhere will think that it’s worth it.

Knowing Ourselves

Excuse me while I geek out on some data.  I’ve been wondering for a while about retention rates in volunteer groups.  And I just happen to have a lot of data about signups for one of them.  So I thought I’d start asking some questions.  The types of question that I want to start asking are:

  • What are the basic demographics for this group (ages etc)?
  • How many of these people were active this year?
  • What’s the geographical distribution… which countries are light on mappers, which timezones are light on mappers?
  • What’s the geographical distribution of active people?
  • How long do people stay before they drop out?

First, 80% of all data science (at the moment) is data cleaning, so I had to do a few things to make this possible.

Clean all the location information – there were two sets of location fields in the original data, and issues included:

  • not listing which countries they were in (generally USA people assuming we all knew that Ohio was in the US),
  • listing multiple countries (which is fair – development people often move between 2 or 3 ‘home’ sites).
  • America being a continent playing at being a country – it makes more sense to break US data into states, so we can see where on the continent people are distributed instead. Looking up the abbreviation for Minnesota, so the state column was consistent.
  • People living in a dependency (e.g. an island like Madeira) of a country (e.g. Portugal).
  • People who only gave their timezones as an address (also fair – it’s a way round declaring that you’re in a country hostile to mappers; also this only happened with US addresses).
  • People also got confused about US timezones (and I had to look them up too): there are 4 timezones in the contiguous United States – they’re called PST, MST, CST and EST.

File:US-Timezones.svg

European timezones are also less confusing than they look (unless you’re working out whether and when summertime occurs):

(blues=GMT; pinks=CET;yellow=EET;orange=FET!; green=Moscow time)

So I now have an anonymised file (a lot of the work above was to get the address fields to a state where they don’t give anything away), and start feeding it into Tableau Public (which is free, so you can follow along if you want…).

  •  First, I drop the “country” dimension into the middle of the tableau box – this automatically gives me a map of the world with a dot on every country mentioned.
  • Then I select the “pie” mark types, and drop the “last visit” dimension onto “color” in the “marks” box. This turns the dots on the map into little pie charts, coloured for each year of “last visit”.
  • Then I play with the “size” button a little to get good-sized marks, fiddle with the colours a bit so they don’t show “never visited” as green, and produce this:

ning_visits_dated

But that’s just telling me the percentage dropout rates per country. What about absolute rates? So I drop “number of records” onto the “size” box, and get

ning_visits_sized

Okay. That’s a lot of Americans. And a lot of countries with very few mappers in them.  But maybe it’s a lot of Americans because it’s a big place… a continent labelled as a country. So before I stop looking at this data, I have a look at the numbers by US state… I click on country, then filter, and select only the USA, then I drop the “state” dimension onto the map, and exclude Hawaii (sorry guys: I know there are two of you over there, but it was messing up the map) to give:

ning_visits_sized_USA

 

Yay! Go New Yorkers!  Or put less emphatically – there appear to be clusters of mappers, who might be local mapping groups.   Looking at those neat pie charts, I start wondering what the growth rate is like in each country – i.e. how many people joined the group when.  And about how long they stayed active after they joined. But first, a question about retention: plotting year and quarter of the mappers’ first visit to the site against their last visit looks like this:

ning_join_vs_leave

A few things to say about this.  First, the list of people who’ve never visited the community site just stops after 2011 – probably because a site visit is part of the joining process.  The expected batch of people who look at the site in the quarter they join, then ignore it after that are there (the diagonal line of bigger dots). But this just tells me who dropped out when… what I really want to see is a simpler graph of how long people have stayed, and whether the date they joined is related to this in any way.  I can’t quite get Tableau to do that yet,  but I’m working on it…

Writing an Ignite Talk

Ignite talks are the standard format of events like ICCM and other GIS-focussed events. They look great on stage, and might seem impossible to do if you’re not used to speaking. But it’s not that bad really. You too can write and present and ignite talks!

So what *is* an Ignite Talk?

An ignite is a 5-minute talk where you supply 20 slides. Each of those slides is shown for 15 seconds before automatically moving to the next one.

5 minutes. That’s not too bad.

How do you start planning an Ignite talk?

Here’s how I do it. This isn’t the “right” way – it’s just one of many – but it’s a place for you to start.

First, know what you want to talk about. For instance, I want to give an ignite talk about giving ignite talks. You know the general area of the event (e.g. “crisismapping”) – what about or around that area excites or worries you? What have you been talking a lot about this year? Tell the stories you’re already telling… for example, this might be Leesa talking about virtual PTSD, Om about organisation, Rose about some VOST work she loved. Or tell a new one you want to explore – “if we could do this, then…”. Write a first sentence about each story.

Then start thinking about what’s important to you about that story. Where are you going for information about it? Who’s done really useful things about it? What would you do if you had unlimited resources? Berkun suggests picking 4 important points to make in your story, but you might have 2, or 5 or 6. Start listing your points for each story.

Then try talking for 5 minutes about each story. It took me years to figure out that a talk isn’t about you standing up and being judged by the audience – it’s a conversation between you and them, a way of getting people to talk about and act on things that you care about. Think about telling your mother or grandmother or best friend about this theme… what would you tell them? Write it down – or if you’re not great at writing things down, either record yourself talking and write it up later, or talk to someone else and get them to write down what you say. And draw pictures (they’ll be useful later).

Outline your script. I usually start a googledoc that looks like this:

 Title of talk
 Slide 1: introduction
 Slide 2: point 1
 …
 Slide 20: thank you and goodbye

- I usually have the first slide for an introduction (and getting on the stage), the last slide for thankyous and reiterating those major points (and getting off the stage), and give each of points an equal number of the remaining 18 slides. At least, that’s where I start – I often realise that some points are bigger than others, and adjust the slides accordingly. Sometimes it makes sense to devote some of the earlier slides to background – that’s fine too. The important thing is that you start writing, and that you know that at this stage it’ll be a long way from the perfect performances you see up on the stage.

Start writing your script. You should by now have 1) an outline document, and 2) the text from talking to your friends, grandmother etc. Start putting them together: put your words into your outline, and adjust both of them to fit. Remember that it’s okay to “cheat”: for example, if you want to talk longer about one slide, then repeat it; and go watch some videos of ignite talks (www.crisismappers.net has lots of these) to see how other people do it. At this point, you don’t need to write essays – 15 seconds of talking isn’t much more than one paragraph of text, so a sentence of two per slide is fine.

Find images. You’re going to need something on your slides. At this point, your talk isn’t polished, and that’s a good thing – because when you start looking for images, you’ll probably want to adjust it again. We’re lucky – we do a lot of work that’s visual (e.g. maps and documents) and can be either used directly (jpgs) or captured using a screen grabber (see below). There are also a lot of free images and clipart (cartoon images: try googling “free clipart”) on the internet too. Avoid bulletpoints and lots of words if you can – your audience will be reading those rather than listening to you (which isn’t a good thing,no matter how shy you are); using a single word or sentence can be very powerful though, so consider this as an option too.

Tidy up your script. By now, you have 20 images and a script. You remember that 15 seconds per slide? Time to practice it. Pick a random piece of text, find a stopwatch, breathe slowly, talk slowly and read out the text for 15 seconds, leaving a short gap between each sentence. For me, that’s a small paragraph – about 3 sentences. Go back over your script, and first tidy up by eye (editing and moving text so you get your points across in the time that you have available), then time reading out the script for each slide, and adjust until you’re somewhere near 15 seconds, speaking slowly.

Record your talk. Now you have 20 slides and a 5-minute script that matches them. Time to record yourself. Powerpoint allows you to auto-advance slides and include an audio track (see below for details); it also allows you to re-record the audio for each slide, so you can record each slide separately and overwrite anything you’re not sure about. Go do this. And now you have an ignite talk!

Write an abstract. Nearly done. A lot of conferences ask you for an “abstract”, or summary of what the talk is about. You have your story above- write a paragraph that describes it, and send it on in!

Where can you find more advice?

Here’s some advice from people who’ve given ignite talks before:

Useful Tools

Hungry for more

So today I’m at #indieconf, in the blogging class, and asked to write about the future. Hmm. What am I hungry for? Well, apart from that seemingly-contradictory combination of a stable life with lots of adventure and chances to change the world for the better in it, I have to say that I’m hungry for what I was always hungry for: seeing people get more equal chances in this world.

Which, from a crisis and development data nerd, pretty much means more people in the “developing world” having access to global opportunities. But what do we mean by “developing world”? Developing how? For whom? Is this term outdated already in a world full of mobiles, internet, and people in pretty much every country who can access them, travel and become part of the global collective consciousness. I can, and do, help any coworking space, any collective effort to learn and help and be part of the global geek world, but sometimes we need to think about who needs this help – about how they can help each other, about how we don’t have to *be* there to help, and that one of the most beautiful things in the world is to watch a subculture emerge that’s truly local, truly part of the external cultures around its members.

It also means more people who’ve been disadvantaged, or even potentially deeply trashed by things like disasters, being able to get themselves back together faster. And that’s everywhere. One thing that Hurricane Sandy made really really clear is that you don’t have to be in Africa, South America etc to have your life destroyed by a disaster. That people live on the margins in all societies, and that even the richest countries in the world still have people who need help to get themselves back together.

So that’s what I’m hungry for. I need to ask whether it’s the right thing to ask for – especially since it involves effects on a lot of other people, who may or may not need this to happen. Introspection – a useful part of the crisis aid toolkit. And perspective – one thing that years of working with crisis data has taught is that I cannot look at the world from just my own western, white, middle-class eyes – that I must perhaps think of a skeleton plan then listen very very hard, work with others and adapt. Which is how to feed the hunger. Ask, then listen, then do…

(That was 7 minutes. Perhaps I should do this more often!).

Installing Ushahidi on a Wamp Server

A repeat of some notes I left on the Ushahidi wikisite, just in case they’re useful to anyone who wants to play with Ushahidi on an Windows machine offline.

Yes, yes, I know, it’s Windows, it’s WAMP, it’s difficult… but I want to run an Ushahidi instance offline, on the Windows machine that I take everywhere with me. And that Windows machine is a 64-bit Windows 7 machine which just adds that little bit more complexity to the process. Here’s what I know so far.

The process:

  • Install a WAMP server on your PC. For my 64-bit machine, I have the 64-bit php5.3 version from http://www.wampserver.com/en/. Because I’m using the 64-bit version, I also have to install Visual C++ libraries, e.g. “Microsoft Visual C++ 2010 SP1 Redistributable Package (x64)” from http://www.microsoft.com/en-us/download/details.aspx?id=13523
  • Fix the Curl problem. This is a 64-bit issue: the Curl files that were included in WAMP don’t work properly on 64-bit machines. Go to this blogpage, look for “fixed Curl extensions” (you need this – not the Curl files above them), and find the file for your WAMP’s version of PHP. There are 2 of these: get the one without nts in its title, e.g. for my WAMP server using PHP 5.3.13, that’s php_curl-5.3.13-VC9-x64.zip. Unzip that file, and you’ll see a file called “php_curl.dll” in it. Move php_curl.dll into directory c:/wamp/bin/php/php5.3.13/ext/ (you’ll overwrite the existing copy of php_curl.dll, but that’s okay). You also need to enable Curl by uncommenting a line in two WAMP files (for reference, these are c:/wamp/bin/php/php5.3.13/php.ini and c:/wamp/bin/apache/apache2.2.22/php.ini in my copy of WAMP).
  • Put Ushahidi into the WAMP directories. You can’t just download Ushahidi from http://download.ushahidi.com/ because the git command for updating the submodules (below) won’t work: you have to use git. Luckily, there’s a Windows version of git that makes this easier: http://windows.github.com/. Wamp will have created a directory c:/wamp (unless you told it to put the directory somewhere else); the place to put packages is in c:/wamp/www. Go to c:/wamp/www, and use git to install the web version of Ushahidi into it: I have the command-line version of git on my machine, and type “git clone https://github.com/ushahidi/Ushahidi_Web” into my terminal window from directory c:/wamp/www to do this. NB: In the notes below, I assume that your Ushahidi directory is called c:/wamp/www/ushahidi
  • Update Ushahidi submodules. Open a windows command prompt, and go to the Ushahidi directory (e.g. type ‘cd c:\wamp\www\ushahidi’ in the command prompt). Type “git submodule update –init” in the command prompt (or you’ll see ui_main.alerts etc instead of labels on the ushahidi main page). Note that this takes a while…
  • Log out of Skype. You need to do this because Skype interferes with the WAMP server, i.e. WAMP’s “W” icon will go orange, but not green. You can restart Skype once you’ve started up WAMP…
  • Start WAMP.  In Windows7, that’s start menu->all programs->wamp server->start wamp server. You’ll see a little red “W” symbol appear on your toolbar (look at the bottom right hand side of your screen). Click on it, to see the WAMP menu.
  • Create the Ushahidi database. Click ‘phpmyadmin’ on the WAMP menu. Click the “databases” tab, then put you database name (make one up) after “create database” and click “create”.
  • Make the Ushahidi URLs work.  Ushahidi typically uses clean addresses like “http://www.yourushahidi.com/reports” rather than php addresses like “http://www.yourushahidi.com/index.php?kohana_uri=reports”.  You need to: click ‘Apache’ on the WAMP menu, then ‘apache modules’.  This gives you a list of modules.  Scroll down, find “rewrite_module” and click on “rewrite_module” to get a tick next to it.
  • Install Ushahidi. Click ‘localhost’ on the WAMP menu, select ‘ushahidi’ under “Your Projects”, and follow the Ushahidi install instructions.
  • Make the Ushahidi work with localhost. Edit file wamp/www/yourushahidi/application/config/config.php – change the line “$config['site_domain'] = ‘yourushahidi’;” to “$config['site_domain'] = ‘localhost/yourushahidi’;”.

Things that can go wrong:

  • Wamp says ‘Aestan Tray Menu has stopped working’. This is a 64-bit problem. Fix: either install the 32-bit version of WAMP instead, or install the 64-bit version and the “Microsoft Visual C++ 2010 SP1 Redistributable Package (x64)”.
  • Wamp starts, but the little “W” won’t go green. Wamp fights Skype for a port. Fix: log out of Skype and try again. You can log back into Skype once the W has gone green.
  • Ushahidi install window says “the Curl extension is disabled”. Fix: replace the curl .dll file, as described above; check that curl is uncommented in both WAMP php.ini files (see above).
  • Labels on ushahidi front page look wrong, e.g. Ui_main.alerts. Fix: type “git submodule update –init” in a command window in the ushahidi directory.
  • “Not Found. The requested URL /yourushahidi/reports was not found on this server.” appears when you try to access http://localhost/yourushahidi/reports, but address http://localhost/Ushahidi_Web-wvspeed/index.php?kohana_uri=reports works fine. Fix: follow the instructions above to “make the ushahidi urls work”.
  • Your Ushahidi homepage looks wierd – you can’t see the map, and the text is all on the left-hand side.  Fix: follow the instructions above to “make the Ushahidi work with localhost”.
  • Other things go wrong. Fix: see Ushahidi wiki or post on the Ushahidi forum.

This blog is being reconstructed…

A small late-night misunderstanding (thinking that backing up the database for a wordpress site would make it easy to reinstate elsewhere) has left me reconstructing this blogsite post-by-post.  Which is not the best way to review where I’ve been over the past 6 years, but it’s interesting nonetheless. Common recurring themes include the importance of treating people as people, not numbers; a combination of data and human reasoning as essential to decisions (yes, but it sadly still needs to be said) and the importance of technology innovation to development.

I’ve made a deal with myself: one new post for every two reconstructed ones.  This could take a while…

Future cities

Cities are apparently the future. All the predictions I’ve seen for the next few decades show the world\’s population concentrating in cities, but our development indicators and policies are still listed by nation state. Perhaps they should be wider, for instance by including developing cities on the lists.

I said “developing” there – which begs the question “how are these cities developing?”.  This isn’t just a Las Vegas-style spreading of suburbia across the desert: many of the cities I’ve visited in the past year have shanty towns, and these appear, at least from outside, to be where a lot of the city development is happening (btw, I wanted to use a less emotive word than ‘slum’ here: although it’s what Slum Dwellers International uses, there’s still a lot of negative feeling about it).  From Lagos to Guatemala to Haiti, I’ve seen dozens of homes and businesses under tin roofs looking across at smaller numbers of tower blocks, and wondered “how do these economies fit together”, and “where does it go from here”.

Good old BBC gave me a few more answers… and a few more questions (like how does the nation-based world fit with people this adaptable and informal), and case studies, Medellin and London, of both positive and negative ways that the shanty and non-shanty worlds can start to fit together.

Perhaps it\’s the way you look at it.  If you look at the Wikipedia links above, you’ll see shanty towns and slums described in very negative terms… impoverished, illegal, lack of services.   If you hang out with people who live or work in shanty towns, they’re communities and neighbours and businesses and services – and quite possibly the adaptable, informal, majority economic future of the cities they\’re part of.  Whichever way you look at it, there are a lot of people in shanty towns, and how (and what) they develop is important.

ACAPS crisis indicators

There are several types of data used by responders in a crisis. Most crisismapping has focused to date on the data generated during and immediately after a crisis: the tweets, messages, reports, alerts and other social media traffic that happens in response to crisis situations and needs.

But a country doesn’t suddenly change socially because a crisis happens.  It doesn’t leap, ready-formed, into a new incarnation where the only thing that’s ever happened is the current earthquake, flood, famine etc.  Countries have histories.  Events happen, societies form, agencies and other countries, if needed, help them to develop and become resilient. And all this activity creates data.

One of the first things that a response agency does, as it goes into a crisis, is create a profile of that country.  How many people are there?  What’s it society like?  What are its existing needs?  Which crises have happened before (and how did people respond to them)?  Which NGOs are already active there?  Where is everything (towns, hospitals, clean water sources etc)?

I’ve been studying the crisis indicators from 3 organisations (UNOCHA, WWHGD, ACAPS), and the development indicators from many more (UNSTATS, UNDP, WHO, World Bank etc).  And the big questions that I’ve been asking are: what do responders going into a crisis need, where can we find that information, and how can we make that faster and easier to do.

IMHO, ACAPS has one of the best indicator lists available. Which is handy because I’m about to co-lead a Standby Task Force deployment to help make obtaining and organizing their indicator data faster and easier.  The numbers that they’re looking for are:

Country size in square kilometres
Average population growth rate
Climate
Corruption Perceptions Index
Country size in comparison to the world
Distribution of poverty (areas, vulnerable groups)
GDP
Gross national income per capita
Human Development Index
Labour force per occupation
Land cover, elevation, soils and geology
Literacy rate
Malnutrition prevalence, height for age (Percentage of children under 5)
Maternal Mortality rate as deaths per 100,000 live births
Mobile cellular subscriptions (per 100 people)
Natural Disasters Risk Index
Percentage of population living below poverty line – urban
Percentage of population living below poverty line – rural
Percentage of the population – rural
Percentage of the population – urban
Population density (people per square km)
Poverty rate
Seasonal migration patterns, by region and reasons
Socio cultural characteristics of the population – ethnic group, language, religion etc
under-5 Mortality Rate as deaths per 1,000 people
Unemployment (Percentage of total labour force)
Vulnerability and Crisis Index

An SBTF team is spending the next week looking for these indicators for a specific country (DR Congo) so we can learn about how difficult that is, alternative sources of information, indicator provenance etc. whilst other teams look for demographics, crisis timelines, existing surveys and ways to generalize and automate parts of this process.  I’s going to be quite an exciting week.

References:

Where do the crisis indicator numbers come from?

I’ve been tracking the provenance of some GIS datasets lately, working on the licenses and attributions that have to be attached to them when they’re released as open data. I’ve also been working on automatically generating development indicator sets for a country – the numbers that help responders understand the state of a country *before* a crisis happens (social problems don’t go away just because an earthquake happens).

A lot of crisismapping comes down to persistence, capability and trust. Trust is a biggie: as someone (Gisli Olafssen?) said, a disaster is not the time to be handing out business cards. We need to trust the people we’re working with, the systems we’re using (within sensible limits and the occasional online equivalent of kicking the case in the right place), and we need to (again within sensible limits) trust the data we’re using. Trust in the data that happens during a crisis has been talked about a lot recently (see conversations about verification, spoofing etc). It hasn’t been discussed so much when we’re talking about crisis indicators.

First, it’s not enough just to have a number, even if it’s only used as a rough rule-of-thumb for how bad a prior situation was. We also need to know how much we can trust that number, which means knowing where the number came from: who collected the data, how big a survey it was, how accurate the numbers are likely to be.

So, in recent work, we’ve started by looking for sources. Most databanks have a list of sources somewhere in the dataset description (or for individual datapoints, in the dataset footnotes); most maps have these somewhere in the margins. If they don’t, they should, because they can tell us a lot about how much we can trust the numbers.

Take, for example the MMR, the maternal mortality rate (annual number of deaths per 100,000 live births) for the Democratic Republic of Congo in the last few years. There are several versions of this number, and here’s a quick search for them and their listed sources:

Source Value Year Quoted source(s) for the number
CIA World Factbook”s DR Congo page 540 2010 No source quoted.But this is the same number as the World Bank indicator SH.STA.MMRT, so it”s probably from there.
World Bank Indicator SH.STA.MMRT 540 2010 WDI and GDF 2010. The data are estimated with a regression model using information on fertility, birth attendants, and HIV prevalence. Trends in Maternal Mortality: 1990-2010. Estimates Developed by WHO, UNICEF, UNFPA and the World Bank.
Trends in Maternal Mortality: 1990-2010 540 2010 Explains in great detail how the number was calculated, and which datasources were used for it. Gives an uncertainty range for the number (300-1100), number of deaths (15000), lifetime risk (30) and PM, the percentage of maternal deaths in deaths of women of reproductive age (18.4%). Also explains that the MMR has been rounded to the nearest 10.
Data.un.org: UNICEF State of the World”s Children 2010 report 670 2008 UN_WHO, UNICEF, UNFPA and World Bank . This is probably the same source used in the World Bank figures, but there”s no reference to follow on the data webpage.Periodically, the United Nations Inter-agency Group (WHO, UNICEF, UNFPA and the World Bank) produces internationally comparable sets of maternal mortality data that account for the well-documented problems of under-reporting and misclassification of maternal deaths, including also estimates for countries with no data. Please note that owing to an evolving methodology, these values are not comparable with previously reported maternal mortality ratio “adjusted” values.Data.un.org: UNICEF State of the World”s Children 2010 report: MMR Reported 550 2006-2010 UN_Nationally representative sources, including household surveys and vital registrationThe maternal mortality data in the column headed “reported” refer to data reported by national authorities.
Data.un.org: Millenium Development Goals 670 2008 Trends in Maternal Mortality: 1990-2008. WHO/UNICEF/UNFPA/WBUNICEF State of the World”s Children 2012 report (on unicef.org): Reported rate 550 2006-2010 The maternal mortality data in the column headed “reported” refer to data reported by national authorities.UNICEF State of the World”s Children report 2012 (on unicef.org): Adjusted rate 670 2006-2010 The data in the column headed “adjusted” refer to the 2008 United Nations inter-agency maternal mortality estimates that were released in late 2010. Periodically, the United Nations Inter-agency Group (WHO, UNICEF, UNFPA and the World Bank) produces internationally comparable sets of maternal mortality data that account for the well-documented problems of under-reporting and misclassification of maternal deaths, including also estimates for countries with no data. Please note that owing to an evolving methodology, these values are not comparable with previously reported maternal mortality ratio “adjusted” values.Comparable time series on maternal mortality ratios for the years 1990, 1995, 2000, 2005 and 2008 are available at www.childinfo.org.
Data.un.org Gender Info: MMR (estimate), female 15-49 yr 990 2000 WHO_Reproductive Health Indicators Database_Jul2007 (International estimate)
Data.un.org Gender Info: MMR (low estimate), female 15-49 yr 250 2000 WHO_Reproductive Health Indicators Database_Jul2007 (International estimate)
Data.un.org Gender Info: MMR (high estimate), female 15-49 yr 1800 2000 WHO_Reproductive Health Indicators Database_Jul2007 (International estimate)
WHO Reproductive Health Indicators Database Database not found. Might have been superceded by WHO”s Global Health Observatory.WHO Global Health Repository global burden of disease death estimates by sex, maternal conditions (GBD code W042) 34.7 2008 There”s a set of references in the spreadsheet”s Notes page, but no specific reference given for this number.

We learn several things from this.

  • The numbers are estimates. It’s not always clear that this is so.
  • Even on the same datasite, numbers for the same indicator can differ and aren’t always updated.
  • You can sometimes use the numbers to guess at their source (e.g. the CIA figure) – but you have to be careful about generalizing this.

Maternal mortality turned out to have a very strong single source (the trends in maternal mortality reports). But for many development indicators, the numbers from different sources very rarely match, and are sometimes very wildly different. So if we want a number (or even a couple of numbers) to use, we need to do some detective work. More on this soon.