Online version: Open Data, Democracy and Public Sector Reform

A report based on the research for my MSc Dissertation is now available here*.

Thank you to everyone who contributed to the research whether in discussions, interviews or responding to the survey. As promised, I’ve started to share data from the survey, and will add to this as time allows in coming weeks.

Published for digressions
Over the weeks since I handed in my MSc Dissertation I’ve been trying to work out how best to share the final version. Each time I’ve started to edit it for release I’ve found more areas where I want to develop the argument further, or where I recognise that points I thought were conclusions are in fact the start of new questions. After trying out a few options, I settled on the fantastic Digress.it platform to put a copy of the report online – giving each paragraph it’s own URL and space for comments and trackbacks.

Hopefully this can help turn a static dissertation into something more dynamic as a tool for helping take forward thinking about the impacts of open government data.


*Note: This is not the copy of the dissertation I submitted. That is still with the University being marked. When I submit a hard and digital library copy later this year I’ll post a link to those as the ‘official’ literature.

Four thoughts on improving open government data

Hadley Beeman has been doing some fantastic work developing conversations and ideas to meet many of the challenges that current open government data users are encountering.

You can read her latest update on the project over here. What I meant to be a short comment turned into a blog-length set of reflections, so – reposted here to log for future reference…

Data is not just for developers

(A small mantra I’m going to end up repeating lots over the next few months just to try and rebalance things a bit).

There are lots of cases where actually people are happy with an Excel spreadsheet with some numbers in. In fact, I suspect they are happier with the Excel spreadsheet with numbers in, than they are with the flashy website with a new query interface they have to learn and which is sitting between them and government data having an interesting psychological impact on how much they can trust the numbers etc.

For example, a charity applying for funding might want the fairly ‘pretty’ and entirely un-programatically accessible DfE spreadsheet in order to browse for educational attainment statistics across london boroughs and to decide what performance targets to set for themselves in a funding bid.

So in any of the diagrams here there is a need for a direct arrow from the original raw data to re-users – and for thought about how they can get the little bits of extra data they need to work out and understand a spreadsheet. Even if people end up finding data through an API-enabled web-interface searching across datasets, they may end up wanting to go back to the original file to be confident they’ve understood the context of the data and it’s creation.

The social infrastructure around data-use matters as much as the technical infrastructure

There are always some datasets that are tricky to work with – and aspects of working with data that cannot be 100% automated. Around existing use of data.gov.uk data the collaboration people have gone through to make sense of data has not only been about finding what categories mean – but plays a role in them understanding all sorts of other things about a dataset //and// finding people who can help them / share code / share insights etc.

In building on a technical infrastructure to help wider groups of people make use of data, we also need to think about how the social infrastructure is widened.

It’s not just formatting that’s the challenge

Hadley’s post mentioned that “Differences in formatting and, most importantly, undefined codes and values mean that much of the published data can’t be analysed or compared to any other data.”

However, often the problem of comparison is not the file-format or schema of the data – but the very nature of the datasets. E.g. one collected by school-year; one collected by calendar-year.

Either comparison needs normalisation (technical solution: include some sort of standard normalisation modules in here) or meta-data needs to help people understand what is and isn’t possible in terms of comparisons.

Developers like bulk data

In all the cases I looked at for ODI even when data was available through APIs people liked to grab the full dataset, cache it locally and work with it there.

Any architectural plans would need to be sure they work for people grabbing bulk data.

Submitted.

The University bit is done. Having a few days to breathe – and then I’ll try and work out how best to draw out the practical learning from the project…

Almost there…

After a few weeks battling with a very tight word-limit (note to self: don’t try an exploratory, mixed-method and strongly qualitative research design if faced with a 10,000 word limit in future…) I’ve almost completed the dissertation part of this project.

I believe the dissertation itself needs to be marked before I can share it in full, but over the next few weeks I’ll try and blog some of the elements that didn’t go into the final dissertation write-up, and to put together a presentation of findings with a more practical focus.

Whilst I’ve not managed to blog this research as much as I’d hoped, if from what’s gone so far you’ve got a particular question you think should be addressed in the informal blogging write-ups of the next few weeks, do let me know…

Who is using open government data? Survey analysis

I’ve been uploading more of the raw data from the survey to the survey page today alongside putting together a reflective-journal/open-workbench presentation detailing the analysis behind the ‘Open Government Data User Motivations’ section of my dissertation write up.

The presentation is embedded below or accessible on SlideShare here.

Comments, critiques and questions welcome.

Why put government data online?

I’m revisiting my literature review as I enter the last few weeks of writing up this study for my MSc dissertation, and I wanted to try and get a sense of all the different arguments that have been put forward for the release of Open Government Data. So, here’s the ongoing list of different arguments or reasons for OGD I’ve found across the literature. I’m not making these arguments or assessing them here – just collecting and sharing my notes. I’ll probably update this post as I do some more work on this part of the final dissertation in the next week or so…

The Power of Information Taskforce Report (Feb 2009) suggests that the restriction of data is “is bad for democratic expression, the economy and citizen customers” (p. 22) and quotes Gordon Brown speaking in 2007 stating:

‘..to protect individual liberty we should have the freest possible flow of information between government and the people…Public information does not belong to Government, it belongs to the public on whose behalf government is conducted.’

The report also argues that “Data and information are the lifeblood of the knowledge economy.” (p.4) and argues that the release of data can support co-production of services and innovation (p. 14). Suggesting a ‘BBC Backstage‘ model of working with data, the report notes of this:

  • It would create an ongoing source of innovative ideas for the use of government data, some of which may be rolled back into the principal websites whilst others remain free-standing.
  • It has the potential to build stronger working relationships between developers inside and outside government strengthening the capabilities of both parties.
  • And it would provide a useful channel for resolving some of the technical issues around access to government data that is made available under the Public Sector Information reuse regime.

Much of the reports focus was restricted to geospatial data, the lack of which at that time was seen to proven the delivery of accurate and innovative services to help people locate public services or find out who runs particular public services.

Putting the Frontline First: Smarter Government (Dec 2009) suggests that opening up data could:

  • “… harness people’s appetite and ability to drive up service standards” and that “In the past, much public service improvement was driven by the force of government targets set by central government. In the future, much more of the pressure for improvement can come from the local level” (§1.3)
  • It argues that “a more informed citizen is a more empowered citizen”.
  • It notes “In a modern democracy citizens rightly expect government to show where money has been spent and what the results have been.
  • Innovation also featured as a reason: “Data can also be used in innovative ways that bring economic benefits to citizens and businesses by releasing untapped enterprise and entrepreneurship.”

The Conservative Party 2010 Manifesto included a commitment to a ‘Right to data’. Under the heading “Make politics more transparent” it notes:

  • “People will have a right to government data to make the performance of the state transparent.” (p. 69)
  • Data will allow the public to “hold government to account”.
  • Transparency is accorded a very powerful role in phrases such as: “We will ensure british aid money is properly spent by publishing full details of british aid on the DfiD website.”

At the bloggers launch of the data.gov.uk developers beta (Sept 2009) Director of Digital Engagement Andrew Stott shared four reasons for open data (my own paraphrasing):

  • Transparency and accountability
  • Empowering citizens to drive public sector reform
  • Releasing the economic and social value of information
  • Putting Britain at the leading edge of semantic web developments

Obama’s US Open Government Memo (2009) frames the open data initiatives in the US in the context of “Transparency, participation and collaboration”.

The possibility of data releases leading to collaborative correction of errors in data is often mentioned in talks / presentations, but has not, as far as I can find, been advanced as a substantive reason for opening access to data.

Tim Berners-Lee in his 2009 design note on putting government data online suggests:

Government data is put online typically for 3 reasons:
1. Increasing citizen awareness of government functions to enable greater accountability;
2. Contributing valuable information about the world; and
3. Enabling the government, the country, and the world to function more efficiently.

Each of these purposes is best served by using Linked Data techniques.

In Unlocking the Potential of Public Sector Information with Semantic Web Technology, Alani et. al (2007) argues that “Public Sector Information (PSI) can make an important contribution to bootstrapping the SW, which in turn will yield many gains.” with the gains cited being “greater efficiency through information sharing and integration to realise broader economic and social gains”.

Rufus Pollock’s Models of Public Sector Information Provision (PDF) and the follow up paper The Economics of Public Sector Information (PDF) make an economic argument for the value that is gained from releasing specific datasets currently managed under trading funds. (Although I’m damned if I can find any way of making sense of the £6bn value claim drawn from these papers that makes it into the Conservative Manifesto. £6b over what time period?)

Tim O’Reilly in ‘Government as a Platform‘ (published in the wide-ranging Open Government book) argues that “Data is the ‘intel inside’” that not only “is a key enabler of outside innovation” but that it creates significant economic value.

Government Data and the Invisible Hand‘ also focusses on the innovation potential of opening access to data – but with an argument about the public benefit when the government can harness the creativity and innovation of the open market and bring information closer to citizens.

David Eave’s case study of exploring open data on charitable fundraising suggests a many-eye’s scrutiny argument for opening data.

It’s not just about machine readable

One of the approaches I’ve been using in trying to analyse the 50 or so different cases of open data use I’ve collected in this study is to divide up the different processes of data use.

I started with a simple schematic, and building on the theoretical reading I’ve been doing around data and information, looked at different cases that took data and tried to give it context, creating information. A distinction between fixed representations of data, and interactive representations of data quickly became apparent. Some uses of data present it in just one way with a fixed context, or present just one extract from a dataset; other uses of data try and provide a way for others to navigate the data, to explore it in the contexts that matter to the viewer, such as when, for example, you search by postcode to just see data about things local to you. I’ve called this a distinction between a data->information; and data->interface process of using data. Many of the cases I looked at, instead of providing interfaces that represented the data, simply provided new ways of accessing the data – through an API, or combined with other datasets. I’ve called this data->data processes. A few of the cases I looked at are not showing the data to end-users at all, but are using it behind the scenes in order to provide people with a service: for example, using administrative geography data to route reports of faults to the right authorities. I’ve called these data->service uses of data, although my sample doesn’t include enough of them to explore them in depth.

However, the set of data uses I was most surprised to find, given the discourse tends to focus on ‘data for developers’, were what I’ve called data->fact uses. Where people have downloaded a dataset in whatever form it was, and rooted through it until they found a particular fact they were after. Perhaps it’s a fact they’ve been asking their local authority to give them for years about local school provision, and that they can now find in a nationally released dataset. Perhaps it’s a fact that will help them in writing a funding application for their local charity, enabling them to give real local statistics and set better outcome measures for their project. Perhaps it’s just something they were curious about.

What is interesting about data->fact uses, is that they can exist at the very long-tail of data-use. They may be the bits of data that a developer drops out of an application because it’s only of very niche interest. Or they may be the bits of data that no-one will ever build an application around. Which means an implicit or explicit focus on machine-readable data only misses out a vast range of use cases for open government data. Human readable data can be just as important.

The first three of the five-stars of linked data offer a good model for thinking about the release of open data:

★ make your stuff available on the web (whatever format)
★★ make it available as structured data (e.g. excel instead of image scan of a table)
★★★ make it non-proprietary format (e.g. csv instead of excel)

but far to often these get seen as just steps to be leapt over – on the way to a machine-readable web of data, rather than valuable parts of the release of data in-and-of themselves – incredibly useful to many citizens who aren’t app builders – but who do want to know what their government is doing – and who want to be empowered in their interactions with government, rather than operating in circumstances of informational inequality.

A half-star?

Reflecting on these three-stars, and on the first of the draft Public Data Transparency Principles which states that “Public data policy and practice will be clearly driven by the public and businesses who want and use the data, including what data is released when and in what form”, suggests there may be a valuable 0.5 star before these one’s even get started:

★/2 – Publish and keep updated a list of the data you have even if it’s not open yet – and provide a clear way for people to get in touch to talk about opening up datasets.

As many respondents to the Public Data Transparency Principles have noted, not all data will be opened overnight, but making sure citizens searching for facts, as well as developers creating apps, can be drivers of data release could be an important part of the onward process.