Back in 2006 as we rolled out the first public draft of the Talis Community Licence, the world of data licensing seemed a simple place. Today, the Open Knowledge Foundation‘s Data Hub contains 3,888 data sets, many of which are explicitly licensed with respect to the Open Definition. But many are still not explicitly licensed. Over at the UK Government, there are 8,619 data sets today, and an assertion that “in general, the data is licensed under the Open Government License.” Too much still isn’t, of course, but they’re getting there. And then there are the many, many more data sets out on the web, not registered with repositories like the Data Hub or data.gov.uk at all.
More than four years on, how are we really doing?
As a scoping exercise for a larger project that I might be undertaking, I’d be really grateful if you could take a moment to fill in this brief survey [which will open in a new window or tab].
It simply sets out to assess the relative proportions of data that are not openly licensed, that are implicitly open, explicitly open with some home-grown statement, or explicitly open and using a recognised data license like CC0 or one of the Open Data Commons licenses.
We’ve seen a welcome burst of enthusiasm for ‘open’ release of data. This has been driven most visibly by government transparency agendas here and overseas. But libraries, the scholarly publishing community and others have also been enthusiastic adopters in recent years. Less welcome has been the sometimes rampant license proliferation. Everyone, it seems, finds something not quite right about one of the licenses on the table. Everyone, it sometimes appears, has a burning desire to create their own license that is just a little bit different, just a little bit closer to their world view. Everyone, perhaps, has a lawyer who sees the opportunity to write themselves a blank cheque alongside a new — ’better’ — license. Every local tweak to a common license, however well-meaning, is a barrier to interoperability. Every new license, however laudable the aims behind its creation, is a further complication to an already complicated picture; another excuse to wait rather than do. Although the meaning and the intent may be the same in all of these licenses, every different set of legalese requires careful — repeated — study as everyone else tries to work out whether or not some incompatibility or impediment has (unintentionally, we hope!) been introduced. Unconstrained license proliferation is, simply, bad.
So… I’ll be taking a look at figures from the Data Hub, data.gov.uk and elsewhere, to get some solid numbers on license proliferation, and on the geographies, domains and volumes in which each license is used. I’ll track all of that and more here, when it happens.
Until then, a couple of minutes of your time for the survey will be very valuable in setting the scene. I’d also be grateful for anything you can do to get your peers to complete the survey themselves. The more data we get, the clearer a picture we’ll see. I’ll provide updates on progress with this survey as your responses begin to come in, and make all the results available here.
And if you have data, and it’s even a little bit open, why not take a moment to register it with the Data Hub? That should make it so much easier for others to find.
Image, Open Data Stickers, from Wikimedia Commons.