Datablog badge new 620

The Shakespeare review: what's the future of UK open data?

Stephan Shakespeare, CEO of YouGov, has published his independent review into open government data in the UK. What's he said, what's he missed, and what will it mean?
Stephan Shakespeare founder of Yougov
Stephan Shakespeare is chief executive of YouGov and chair of the Open Data Strategy Board. What's he advocating? Photograph: guardian.co.uk

The UK is the world leader on open government data, according to YouGov CEO and Open Data Strategy Board chair Stephan Shakespeare, but needs to avoid being the "boffins ... we generate the excitement but don't mint the money".

The finding is one of the core messages of The Shakespeare Review, a government-commissioned report on what should happen next with opening up government data for the benefit of government, business and (of course) citizens.

Open data has the potential to deliver a £2bn boost to the UK economy in the short-term, the research concludes, with a further £6-7bn further down the line. But to do it, the UK will need a clear, cohesive strategy on what to do next – no more ad-hoc, "out-the-window", chuck out the data approach.

So: what's Shakespeare recommending, and what does it mean? Our take on the key points, and our analysis, is below, alongside recommendations from the report proper.

Twin-track data release

If we wait for data to be perfect quality before releasing it, we'd never get anything released.

That was a view from both the report and from Rohan Silva, the outgoing senior adviser to David Cameron widely (and rightly) credited as one of the key drivers of the government's transparency initiatives, who spoke of the struggle in preventing civil servants stalling the open data drive by insisting on perfect quality.

As he put it "It turned out a lot of the data collected, and occasionally used, in Whitehall was often pretty crap."

Shakespeare's response to this is to suggest two tiers of data release: get stuff out quickly and imperfectly, and then for core datasets (defining core datasets is left to the government), work on getting them improved to top quality:

[T]he perfect should not be the enemy of the good: a simultaneous 'publish early even if imperfect' imperative AND a commitment to a 'high quality core'. This twin-track policy will maximise the benefit within practical constraints. It will reduce the excuses for poor or slow delivery; it says 'get it all out and then improve'.

Getting it done matters

Few people in the UK's open government community doubt the sincerity of the advocates at the very top of Whitehall – including Cabinet officer minister Francis Maude.

Where things go awry is often in the gap between agreeing something should be done, and then actually doing it.

This creates lots of mess. Take, for example, the vaunted publication of every item of government spending over £25,000. That's been agreed, it happens, and even comes out in a standardised format. But every department releases it in a separate Excel sheet for each month.

Councils, with their equivalent releases, do the same. But if a council staffer wants to see if he's being charged more than anyone else, he still has to look manually through hundreds of other websites. 90% of the effort of an open data tool has been done, for only a fraction of the benefit.

It's likely with this proposal in mind that Shakespeare has – sensibly – focused on the need for decent implementation, and an "auditable" plan. That would be a significant and useful development – but one which easily could be opposed by some within Whitehall.

There should also, he says, be someone to oversee that stuff actually happens:


There should be clear leadership for driving the implementation of the National Data Strategy throughout the public sector. There are many committees, boards, overseers and champions of data; but no easily understood, easily accessed, influential mechanism for making things happen.

Companies House, Met office: we wouldn't start from here

The current trading funds – Companies House et al – are a bugbear of many in the open data movement. They sit on masses of data, charge for what they do release, and sit on piles more. But what they produce has been made for a long time, is important, and people are worried about rocking the boat. Including Shakespeare:

One would be hard-pressed to find any expert who, asked to create new structures for core reference data from scratch, would advocate the current Trading Fund model in today's world of open data ...

But we are not starting afresh, and we have, in the Trading Funds, organisations of high quality which one should hesitate to disrupt ... [but] that does not mean we should not press hard for significant adaptation of the model to the new potential for open data.

Shakespeare added "open data" needn't mean "free" and needn't mean open "to everyone". Some of this smacks of compromise – likely sensible compromise – but it would be a waste of an opportunity if progress got lost here.

Efforts like OpenCorporates have been working valiantly to get access to this data from the accountability side, while projects like DueDil show the potential scale of business advantage – spotting fraud, default, corporate structures and more – inherent in the information held in funds.

Greater openness in the trading fund data might be where the biggest advantages lie – so they can't (and shouldn't) easily be ignored.

Stealing data's a bit like burglary...

Data protection and privacy is always something of a hot-button topic, and it's one Shakespeare clearly took seriously, saying citizens had a right to expect even better protection than they have now. But he also warned it shouldn't be allowed to turn into a constant roadblock on advancement.

As a result, his conclusions – which seem sensible and considered – might turn out fairly controversial.

Here's what he said:

No method, including traditional non-digital information storage, is proof against determined wrong-doers. We do not require builders to only build houses that cannot be burgled. We do our best and impose consequences on the burglar not the builder.

We currently have an unrealistic degree of expectation of any data controller to perfectly protect all our data - an attitude that inhibits innovation. Following 'best practice' guidelines should be enough, so long as we are willing to prosecute those who misuse personal data.

Skills to pay the bills

If the UK's going to reap the benefits of all this open data, it needs people trained to take advantage of it, says Shakespeare – another very reasonable point, but a tricky one to do anything about.

That means more university courses, more data scientists in government and elsewhere, and more stats literacy all round. In another economic climate, he added, he'd be pushing for big investment here, for future returns. The report notes:

At the moment, the USA invests massively more than us and continuously reaps the benefits in world-leading business applications of science and technology; yet Britain is capable of being first in this field, given our expertise in data science and the fact we have large, coherent datasets. For example, nowhere in the world has such good health data, due to the scale of the NHS as a single provider. There is huge potential here for building social and economic value if we are willing to invest smartly.

That's all well and good – but as a questioner from the Royal Statistical Society noted, data literacy and statistics is currently much reduced in the new proposed national curriculum, in favour of "traditional" maths. Which, if unamended, borders on madness.

The UK's culture on maths and numbers is also a likely barrier. New Ipsos Mori research shows the mountain to climb: 55% of parents would be most proud if their child was good at writing, versus just 13% at numeracy.

Only 2% of people strongly agree that either politicians, newspapers or TV report statistics accurately.

Only 26% of British adults correctly answered that the probability of tossing two (fair) coins and getting two heads was 25%.

And only 13% said they were "very confident" they understand statistics on government spending cuts.

So, stats fans, there's a lot of work to do there.

Private sector data

Lots of government work is actually carried out by private sector companies, and they should be encouraged to open up more data relating to that, says Shakespeare – though he stops short of saying this should be mandatory, at this stage.

The review also specifically (and repeatedly) cited other open data issues beyond government – most particularly, the need to open up data on medical trials, a cause regularly championed by Ben Goldacre, the UK's unofficial chief nerd.

Where there is a clear public interest in wide access to privately generated data, then there is a strong argument for transparency (for example in publishing all trials of new medicines) ...

A company working with government should be willing to share information about activity in public-private partnerships, as information about activity in public-private partnerships held by private companies is not currently subject to the Freedom of Information Act. This could be greatly enhanced without the need for legislation by creating a field in procurement forms asking for the company's open data policy regarding the sought contract.

A good start – but without legislation, progress in this particular area (which is becoming ever-more important) could be markedly slow.

What's missing

The main thing missing from the report is detail: what should be core data? How fast is fast? The trading funds should work differently, but how so? Who's actually going to implement this? What about [insert pet dataset here]?

Those questions aren't for Shakespeare to answer, but the answers will impact the entire scope of the review.

There's also potentially significant missed opportunities through not specifically addressing criminal justice data in detail: police data is opening, but court records (and detailed, granular sentencing information) lags significantly behind the rest of the UK government.

How's it gone down?

One bit of reaction to the report matters more than everyone else's – including ours – writing an official report is no use whatsoever if the government isn't going to do anything about it. It's too early to tell how that will work out, but the initial noises from the official response are, at least, good, though fall short of any firm commitments:

Stephan's excellent review shows how government and business can work together and create new business opportunities. Encouraging business innovation through open data could transform public services and policy making to be more efficient, timely and effective.

We warmly welcome the review and will look at the recommendations carefully to see how they could be implemented to enable wider access to public sector information so that we can strike the right balance between affordability, data security and value for money.

The Open Data Institute wants fast action to implement the report:

This is the time to be bold and ambitious. What happens now on the back of this report is crucial in unlocking the value of open data. If the Government is serious about making data open, it has to be made available and fast.

While the Open Knowledge Foundation also seem broadly pleased, noting among other points:

Getting more data released quickly, without agonising over quality concerns, is an excellent recommendation and we look forward to seeing this in practice. Alongside this we welcome the demand for high quality information in the National Core Reference Data plan, including key entity data; such reference data, following clear open standards, will transform what can be done with UK data. The request that Trading Funds should remove restrictive PSI licensing and work towards releasing all raw data for use and reuse is particularly warmly welcomed.

But who says they should be the only ones delivering a verdict? Let us know what you've spotted – or what's been missed out – in the comments below, or via Twitter to me directly @jamesrbuk or to the official @GuardianData account.

Today's best video

;