
- Image by PhOtOnQuAnTiQuE via Flickr
Before going any further, let’s get a few things crystal clear;
- The recent success of the Linked Data meme is long overdue, very welcome, and entirely capable of carrying the Web of Data far beyond its current niche adherents. A lot of my current work involves arguing that more organisations should adopt this approach;
- The Resource Description Framework, RDF, is a key — and powerful — piece in W3C‘s Semantic Web Architecture. Since its earliest days, I have played various parts in advocating the potential of RDF and will continue to do so;
- RDF is an obvious means of publishing — and consuming — Linked Data powerfully, flexibly, and interoperably. I will continue to argue this, and to advocate its wider adoption.
So far, so good.
The problem, I contend, comes when well-meaning and knowledgeable advocates of both Linked Data and RDF conflate the two and infer, imply or assert that ‘Linked Data’ can only be Linked Data if expressed in RDF.
This dogmatism makes me deeply uncomfortable, and I find myself unable to agree with the underlying premise.
The rest of this post attempts to explain why, hopefully more lucidly than I or those with whom I was debating managed on Friday evening via the largely unsuitable medium of the 140 character tweet.
Andy Powell started things off lucidly enough on Friday, asking;
“is there an agreed name for an approach that adopts the 4 principles of #linkeddata minus the phrase, ‘using the standards (RDF, SPARQL)’ ??”
I was amongst those to respond, suggesting as I usually do that;
“well, personally, I’d argue that Linked Data does NOT require that phrase. But I know others disagree…
”
Other pieces of that conversation can be extracted from the stream; start by scrolling to the bottom, find Andy’s tweet, and work back toward the top.
It’s worth noting that two of those arguing most vehemently against me were former colleagues Ian Davis and Leigh Dodds. I have massive respect for the technical prowess of both (which is certainly greater than my own), and have learned a great deal from Ian in particular over the years that we have known one another. This issue, though, is one on which we have long disagreed, and it was interesting to see the subject of many a difference of opinion in the bars of various conference hotels spill into this public arena.
Anyway, now let me try to explain what I meant.
Perhaps the most commonly cited definition for Linked Data is the one to which Andy was referring; Sir Tim Berners-Lee‘s Linked Data – Design Issues document. It’s worth noting that this document is clearly flagged (in the current version amended on 18 June 2009, at least) as being both a ‘personal view only’ and ‘imperfect but published.’ So a very long way from being a ‘standard,’ ‘specification,’ or ‘definition,’ but certainly still a pretty good starting point, and one to which I often direct clients and others.
Berners-Lee begins,
“The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.”
(my emphasis)
That sounds good, doesn’t it? Indeed, we talked about that on the Linked Data panel I moderated at the recent Semantic Technology Conference, and I’ve embedded the video here.
It is the next section of Berners-Lee’s document that is used to validate the view that Linked Data needs RDF;
“1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
4. Include links to other URIs. so that they can discover more things.
(my emphasis)”
On one reading, an unambiguous validation of the view with which I disagree. On another, a suggestion of best practice, expressed as part of a ‘personal view’ with which we are perfectly entitled to take issue.
Would the zealots be calmed by the simple insertion of ‘preferably’ or ‘ideally,’ immediately after point three’s second comma? Maybe. Or perhaps the fires of Linked Data’s self-appointed Inquisition would be stoked for Berners-Lee himself.
Talk of Linked Data, Open Data, the Web of Data and related concepts in recent years have led to a quite remarkable shift in attitude amongst individuals, public bodies and private corporations. Almost everywhere my work takes me, clever people are seriously grappling with the implications of consuming from or contributing to these emerging ecosystems. Not all of their questions have good answers, and not all of the technological, strategic and business implications have necessarily been fully worked through. But these people are asking the questions, and they are asking them in all seriousness.That is a dramatic and welcome shift.
Some, such as the BBC, Thomson Reuters and the UK Government’s Central Office of Information are sufficiently persuaded of the benefits to take risks and to open the previously closed in taking a lead. Others will follow, as fears are assuaged, doubts eased, and benefits realised.
Despite this undoubted progress, the green shoots of a Linked Data ecology remain delicate. By moving from a message that stresses the value of unambiguous and web-addressable naming (HTTP URIs), providing ‘useful information,’ and enabling people to ‘discover more things’ by linking toward a message that elevates one of the best mechanisms (RDF) for achieving this to become the only permissible approach, we do the broader aims great harm.
Yes, those already in the club will probably be very pleased with the purity and functionality of the toys in their playground. But they will have barred a far larger group with data to share, a willingness to learn, and an enthusiasm to engage. At best, they will have slowed the growth of the pool of Linked Data quite dramatically. At worst, they will have created an increasingly irrelevant backwater that more pragmatic people will simply route around. Perhaps, in their pragmatism, those people will now never look seriously at RDF and its power, scared away by the fervour of those who sought to elevate it too high, and too fast.
What are we after? More Linked Data, or more RDF? I sincerely hope it’s the former.
So let’s see loads more Linked Data, and plenty of evangelism as to why RDF could be the best way to do it. But let’s not ostracise the vast majority of potential participants, contributors and beneficiaries in the world of Linked Data, just because they haven’t wholeheartedly embraced RDF yet.
Related articles by Zemanta
- Linked Data is Blooming: Why You Should Care (readwriteweb.com)
- Nodalities (Talis): Ivan Herman talks about the Semantic Web and W3C (blogs.talis.com)
- ReadWriteWeb Interview With Tim Berners-Lee, Part 1: Linked Data (readwriteweb.com)
- Linking bbc.co.uk to the Linked Data cloud (derivadow.com)
- Tim Berners-Lee: “We need data on the Web to work better together” (semantic-web.at)
- What does the history of the web tell us about its future? (derivadow.com)
- Tim Berners-Lee’s Eloquent Ted Speech on Linked Data (byronmiller.typepad.com)
- Sir Tim, the web and silos (bbc.co.uk)
- Talis Connected Commons: Linked open data repository opens up shop (mndoci.com)
- Jeff Pollock talks about his new book, The Semantic Web for Dummies (blogs.talis.com)
- CNET Partners with Thomson Reuters on Linked Data Initiative (readwriteweb.com)
- ReadWriteWeb Interview With Tim Berners-Lee, Part 2: Search Engines, User Interfaces for Data, Wolfram Alpha, And More… (readwriteweb.com)
- How Open is ‘Open’ ? (cloudofdata.com)
- Crossref, OpenURL and more Linked Data Heresy (go-to-hellman.blogspot.com)
« « Garlik releases Open Source RDF triple store, claims capacity for 60 billion triples
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=97d330c5-4a35-403e-b18b-dd5e970d306e)
Paul Miller works at the interface between the worlds of Cloud Computing and the Semantic Web, providing the insights that enable you to exploit the next wave as we approach the World Wide Database.
View Comments Comments until now.
Paul,
Very interesting and valid question, indeed. IMHO, a clear answer from my side: yes, RDF is, next to URIs and HTTP at the core of linked data. The *model*, I mean, not necessarily a specific serialisation: compare this with the initial focus on RDF/XML in the linked data world and our effort to promote RDFa [0] and you’ll see that there are parallels. I’d claim that virtually any structured data sources such as RDBs, XML, CSV, etc. can be transformed to RDF (e.g. RDB2RDF, GRDDL, etc.) and hence is able to participate in the linked data cloud (given HTTP URIs are used and an interlinking exists and/or can be established).
Second aspect I’d like to throw into the discussion is to understand linked data as a sort of specialised RESTful architecture (in the ROA sense of [1] and [2], read-only mainly, so far) where we still have to sort out some issues [3].
KUTGW!
Cheers,
Michael
[0] http://ld2sd.deri.org/lod-ng-tutorial/
[1] http://oreilly.com/catalog/9780596529260/
[2] http://dret.net/netdret/docs/soa-rest-www2009/rest#(11)
[3] http://dret.typepad.com/dretblog/2009/05/rest-and-rdf-granularity.html
>Would the zealots be calmed by the simple insertion of ‘preferably’ or ‘ideally,’
>immediately after point three’s second comma?
Oooh, our own filioque controversy!
>message that elevates one of the best mechanisms (RDF) for achieving this
I think this would be easier to discuss if you could suggest some of the other mechanisms for achieving everything in TimBL’s four points about Link Data, minus the bolded part above. (Everyone writes code from scratch in their favorite programming language?) I wouldn’t want to see the discussion lead to debates about specific advantages and disadvantages of each approach, but instead to generalize from the selection to gain insight about why the RDF data model was essential or non-essential to this.
To build on what Kingsley has said, the entity/attribute/value combination is a great abstraction to help data reach across boundaries, and standards to make this easier should be a big help.
I think that it’s useful to preserve the idea of the linked data meme together with the specific implementation embodied in the “four rules” articulated in the TimBL’s design issues document, which is why I’ve started talking about “4-rules linked data”. I also think that it’s premature to lock “linked data” onto a specific implementation based on RDF. Posting MARC records on the web is not linked data, but you could imagine a linked data implementation that includes that. Well, maybe not a plausible one, but certainly a possible one.
There are many unanswered questions about which elements of the linked data battle plan will survive contact with “the enemy”. Thanks for articulating this, Paul.
Paul, I agree with every word. The cause of the semantic web is being weakened by the “zealots” as you put it and I think the inquisition analogy is right on. We are many to have experienced its wrath and witnessed its narrow vision… While I understand their wish to clearly define technical terms, the intensity of the response that comments like yours (or mine, in the past) invariably provoke, makes me feel like there is more at work, likely rooted as usual in the self-interest of each one of those voices, and a fear of “letting the baby go”. Or it may also be the result of a quest for technical perfection, and the belief that Linked Data won’t work well without RDF. Maybe, but why don’t you prove it instead of asserting it, and in the meantime let us see for ourselves.
Technically, there are pros and cons to using RDF, and the fact that it has gathered so much criticism, next to the praises, would suffice to launch a formal debate in any self-proclaimed democratic institution. It’s not too late. Such a debate would allow us to explore alternatives to RDF, using the other pieces of the stack and maybe marrying them to other technologies. It would also surface the best usage cases for RDF. Given the slow adoption and the progress of its “light” version RDFa, I think it’s safe to say that RDF will only impose itself through its intrinsic qualities and not because of the royal quality of its father. Let’s discuss the former and compare them with other approaches, starting with simply tying concept in web pages and relational databases to URIs.
Yes the same mistake was made with the rise of the web.
Once you had URIs and HTTP you already had plain text which is a perfectly good way to encode content. By adopting the STANDARD convention of HTML, all sort of existing text based formats with their various mark ups were locked out. That locked out a lot of content that already existed and required anyone who wanted to play to convert existing content into a html format.
Of course it did have the small side effect that to consume web content you only needed a browser that understood one convention i.e. html.
The same is true of RDF. XML is the equivalent of ascii in this regard. Sure it is a good way to write down data, but it isn’t sufficient to actually use that data unless you understand the various special conventions.
RDF gives you a standard way to understand TYPES of data that you have never seen before. You simply cannot do that with XML alone. You must build a convention at a different level from syntax, which can be expressed as XML. We have, its called RDF!
Ask yourself the question. Why hasn’t the linking of data taken off before? If there is all this data out there, why didn’t it just get linked together?
Because linking between different conventions isn’t very useful.
The problem has never been the linking of data, that is easy as soon as you have URIs. It is meaningfully linking different data so that you have something useful not just a mess. This itself then pulls in more data. Just as we have seen with the growth of the web and just as we are now seeing with the growth of “linked data”
Linked data certainly needs to be *linked*, and after that, it’s pretty important to describe the relationship that each link between resources represents (i.e. “this is a link to a parent resource”, “this is link to a resource that represents a place nearby to this place”).
Once you have that, the idea of a triple emerges almost by itself, and what you have is suddenly starting to look very much like RDF. If your format is not RDF, then it’s likely to be convertable to RDF fairly trivially.
[...] Miller, a good friend and ex-colleague, has been having a tough time arguing that perhaps Linked Data doesn’t need RDF. Don’t misunderstand that, he thinks RDF is a Good Thing and Best Practice for Linked Data. [...]
Paul,
An important discussion that should bring critical clarity.
As per my tweets, what is data in your world view? In mine its about “units of observation”: I encounter something and express its sense of being (in my worldview) by identifying it. We’ve always done this btw. What’s different now is that the Web allows us to use HTTP based Identifiers (HTTP URIs).
When I observe and Identify something (i.e. Tag it), I also want to do the same for the constellation of characteristics associated with my “unit(s) of observation” which is where Attribute & Value pairs come into play i.e., they too can be endowed with Identifiers.
The RDF (Resource Description Framework) Data Model is just an example of an EAV/CR model [1], and I think the deeper question is: can you have an “HTTP based Web of Linked Data” without the combination of HTTP Identifiers (HTTP URIs) and an Entity-Attribute-Value + Classes & Relationships model?
The answer to the question above is an emphatic “NO”
Any DBMS system that maps its record and attribute identifiers to HTTP URIs ultimately plays well with the Linked Data meme.
Links:
1. http://tr.im/t1SM
Kingsley
[...] So Paul asked recently: Does Linked Data need RDF? If you drink a certain sort of coffee, I guess you are familiar with my answer: What else? [...]
Paul,
I thought this topic is worth it to write up a blog post [1], essentially a +1 to what Toby says, and a bit beyond …
Cheers,
Michael
[1] http://webofdata.wordpress.com/2009/07/20/what-else/
This is a good and fair question to answer.
My response is in comments to http://webofdata.wordpress.com/2009/07/20/what-else/#comments … essentially “yep, go crazy … more data is good, whatever format”.
I do believe it will be RDF that provides the over-arching Web that links together scattered datasets to make something bigger and more integrated. But each dataset may we have value in other formats too.
Paul,
Is RDF a Data Model or a Format re. this discussion. The answer to this question is of utmost importance re. coherence.
Paul – great thoughts here, thank you for sharing. I didn’t read your post as an attack on RDF, I think it’s productive to look at systems and ask ourselves “what would we do if we didn’t have $foo?” In this case the question is, “could we build the web of data without RDF?”
Without even addressing the question of should we do it, I think there’s no reason we couldn’t. I can easily imagine a web of data filled with HTTP URI’s that output JSON.
Ross,
We need to be clear about the topic of conversation here. Are we talking about a Data Model or a Data Format.
Kingsley
Very interesting discussion! I am glad, glad it is happening. Maybe we can get somewhere.Science can help answer the question, let’s set up some tests! we call it ‘benchmarking’. Take a set of case studies, 5-6 different examples, and wrap a LOD model around them, using RDF on the one hand, and using other means on the other. Then evaluate the pros and cos in terms of speed, reliability, resource consumption etc. The results will be partial until the test set will be big enough, but we can build a library of test cases and will be able to infer our evaluations on facts, rather than just opinions… collectively sourced test results will also be at the heart of research innovation…
Hi Paul – interesting article and ongoing discussion.
We should talk soon. Our work is related to all this, though without the emphasis on linking.
I think this is a question of how you want to pick your technical battles, and what mix of social and technological work and infrastructure is most likely to yield the intended result. There’s a real danger that “linked data” could be as empty a phrase as “open” without care.
The microdata in HTML5 discussions suggests to me that the first thing that goes out the window when you accept RDF as optional (or more typically, a more pejorative unneeded overkill) is ironically the feature most important to both RDF and linked data: the URI (microdata allows one to use string or reverse DNS identifiers instead for property names and types).
[...] recent success…” July 20, 2009 Paul Miller writes: The recent success of the Linked Data meme is long overdue, very welcome, and entirely capable of [...]
Good post for discussion, Paul. To amplify what Michael said above: The RDF family provides a metadata umbrella that non-RDF can fit under. It’s possible to avoid religious arguments by allowing alternatives as long as they can be converted to fit under the umbrella. RDF’s advantages as the only real metadata umbrella that exists–and one that allows inferencing of what’s under the umbrella–will mean it will be the lingua franca, but DBMS systems are ubiquitous and other formats will continue to be used as well. The need to accommodate and convert or translate them must be assumed. People who don’t understand the value of a base layer of RDF graphs may suggest a performance bakeoff that won’t get at the real advantages, such as inferencing, that RDF provides.
All,
You need a data model for metadata. Without a data model for Metadata you will not have Linked Data anywhere (including HTTP based networks such as the World Wide Web).
RDF is part Data Model and part Data Representation Formats re. Metadata. It is simply an EAV/CR based data model. Remember, RDF/XML != RDF, so the JSON comparisons or suggestions are non sequitur.
There’s no Linked Data on the Web without a data model that intrinsically accommodates HTTP URIs for identifying Entities (Subjects).
There’s no Linked Data injected into the Web if the DBMS data that’s being plugged into the Web doesn’t accommodate HTTP URIs for record ids, field ids, and field values.
Bottom line, we are simply talking about link granularity re. the Linked Data meme. The lowest layer of link granularity is facilitated by EAV/CR models like RDF since they are scoped to the Datum level (as opposed to Data Container level).
As of this time I know of no other EAV/CR based Data Model that intrinsically accommodates HTTP URIs. Thus, the Linked Data meme and RDF are inextricably linked, and for the right reasons. That said, implementation details like RDF don’t need to be emphasized in the Linked Data meme rules.
Remember, most important point of all re. HTTP URIs: they implicitly bind an Entity (Subject) and negotiated representations of its Metadata via a single HTTP URI.
The powerful HTTP URI feature above has never been delivered, to date, with the degree of platform agnosticism inherent to HTTP.
Kingsley
[...] you to everyone who took the time to share a wide range of views in response to yesterday’s post in its comments, on Twitter, and out on your own blogs. Although reduced to silence throughout the [...]
What I was trying to say about JSON was that you could use it as both a model and a format, just like RDF + N3. It may be flat, it may be nested, the point is you could basically take a table and denormalize it to the cell level in whatever format/language you’d like. To me what this says is that RDF isn’t so much about the technology as is it a way of doing things.
On a related note, I’ve always thought it was a stretch to call RDF a model, to me a model is something that looks the same no matter what you put into it. These are just thoughts, not trolling. Viva el Web of Data!
@rossbates
Ross,
RDF is comprised of a model and framework. Of course I accept that my comment my be somewhat revisionist since the initial incarnation was about RDF/XML with the proper partitioning of model and data representations coming much later.
The tweak to RDF messaging for the most part coincides with the “Linked Data” meme. Ditto the emergence of alternative data representation formats for the model such as N3, Turtle etc..
As of today, it is best to look at RDF as being comprised of a model and a plethora or data model representational formats (including the most RDF/JSON addition. The model also honors the dictum you outline above since you can consistently negotiate representations (via HTTP GETs) of the same Entity-Attribute-Value or Subject-Predicate-Object graph no matter what you successfully put into it
I blogged further about this at http://www.semanticsincorporated.com/2009/07/if-linked-data-is-a-brand-it-has-big-problems-to-address.html
[...] read a few articles through blogs lately regarding Linked Data and the Semantic Web. Ross Bates, Paul Miller, Ian Davis, and Semantics Incorporated have all explored the ideas of Linked Data and Web [...]