Sunrise over San DiegoIn a pair of blog posts yesterday, Andreas Blumauer of Austria’s Semantic Web Company touched on an area that has been absorbing my attention recently, and raised some questions worth exploring here.

I am travelling to San Diego next week to speak about the importance of evolving Enterprise attitudes to data. Borrowing some nice turns of phrase from Sir Tim Berners-Lee‘s recent TED talk and JP Rangaswami‘s keynote to Powered by Cloud, amongst other things I’ll be suggesting that they ‘stop hugging their data’ and move ‘from data centre to data centric.’

The Linked Data initiative, which began in March of 2007 as a community project supported by W3C‘s Semantic Web Education & Outreach (SWEO) Interest Group (of which I was a member), has been a huge success. Described by Berners-Lee as ‘the Web done right,’ the notion of Linked Data rests upon the acceptance of four simple principles, yet opens the door to previously unanticipated re-use of data scattered across the Web.

The most rapid adoption has, unsurprisingly, been seen in terms of liberally licensed data already visible on the Web in some form. DBpedia, for example, is a community effort to extract structured information from Wikipedia and expose the individual facts for use across the Web. There have also been examples — as always justified by hacker mentality, ‘academic freedom,’ the imprimatur of ‘research,’ or the expectation that the perpetrators are ‘too small’ to be noticed — in which data have been appropriated to the cause without due care and attention to the rights of the data owner, but these isolated cases should certainly not detract from the value of the broader effort.

Public Interest data from organisations such as the BBC has also begun to appear in the ‘Linked Data Cloud‘ (click on individual data sets for more),  and the frequency and strength of reciprocal links between participating resources grows rapidly.

Enterprise data is effectively invisible to this Cloud, which brings me back to Andreas’ first post. In it, he asks;

“Since the [Linked Data] cloud is kind of the basic infrastructure which drives the whole process – this layer should remain a freely accessible one. But how could new business models be built on top of it (and constantly spend money on maintaining and extending the underlying infrastructure)?

Where could enterprises start using Linked Data? Only by retrieving data from the ‘outside’ and mash it up with the ‘inside’ – only one way?”

I can certainly see cases in which cautious corporates will be willing to consume without contributing in return, and there’s clearly work to do in demonstrating the value that they could gain from more balanced participation; participation that should never mean unwillingly ‘giving away’ competitive advantage or sensitive data.

We have an annoying tendency to view data in our databases as an indivisible mass, vigorously and unthinkingly applying the same (expensive) protections to an uninteresting and low-value factoid of underlying context as we do to the core attributes of our next big lead.

Andreas concludes this post by suggesting something very similar to JP Rangaswami’s notion of ‘data centric’;

“Information has no ‘place’ anymore, energy can’t be shipped around the world. We should rethink the meaning of a ‘data store’ and information will flow without flooding us. Linked Data might become the essence.”

Andreas’ second post followed after he’d listened to the most recent episode of the Semantic Web Gang, which I Chair. During the show, recorded last month, we discussed the latest release from Thomson ReutersOpen Calais activity, which sees it embrace Linked Data’s principles whilst continuing to run and grow a viable global business.

Andreas extrapolates from the conversation to suggest that a viable business model for the data-curating Enterprise might be to expose timely and accurate enrichments to the Linked Data ecosystem; enrichments that customers might pay a premium to access more quickly or in more convenient forms than are available for free. He also sees a market for application builders that optimise the flow of information, and both of these are certainly possible.

The Linked Data — the Data Web — opportunity is far greater, though, and too little attention is being devoted to it by Linked Data’s advocates as they concentrate their efforts on big public datasets of the sort Berners-Lee discussed in Long Beach last week. Big public data sets are important, and Berners-Lee is right to suggest that more Open and Linked access to the outputs of scholarship will help in our efforts to tackle many of the world’s ills. There’s as much value locked up inside our commercial enterprises too, though, and yet the rationale that will ultimately lead to us unlocking this is quite different.

It is that rationale which we need to get right, almost certainly without mentioning ‘RDF’, ‘Semantic Web,’ or even ‘Open.’

And if you’re in Southern California next week too, why not come and say ‘Hi’…?

