<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
		xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>Paul Miller - The Cloud of Data &#187; Open Data</title>
	<atom:link href="http://cloudofdata.com/tag/open-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://cloudofdata.com</link>
	<description>Linked Data, Cloud Computing, Semantic Web, SaaS, PaaS, more</description>
	<lastBuildDate>Thu, 17 May 2012 15:04:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<copyright>Licensed under the Creative Commons Attribution License, version 3.0 http://creativecommons.org/licenses/by/3.0/</copyright>
	<managingEditor>paul.miller@cloudofdata.com (Paul Miller)</managingEditor>
	<webMaster>paul.miller@cloudofdata.com (Paul Miller)</webMaster>
	<ttl>1440</ttl>
	<image>
		<url>http://cloudofdata.com/logo144x144.jpg</url>
		<title>Paul Miller - The Cloud of Data</title>
		<link>http://cloudofdata.com</link>
		<width>144</width>
		<height>144</height>
	</image>
	<itunes:subtitle>conversations with the executives shaping Cloud Computing and the Semantic Web.</itunes:subtitle>
	<itunes:summary>Linked Data, Cloud Computing, Semantic Web, SaaS, PaaS, more</itunes:summary>
	<itunes:keywords>Cloud Computing, Semantic Web, Linked Data, Open Data, SaaS, PaaS</itunes:keywords>
	<itunes:category text="Technology" />
	<itunes:category text="Business" />
	<itunes:author>Paul Miller</itunes:author>
	<itunes:owner>
		<itunes:name>Paul Miller</itunes:name>
		<itunes:email>paul.miller@cloudofdata.com</itunes:email>
	</itunes:owner>
	<itunes:block>no</itunes:block>
	<itunes:explicit>no</itunes:explicit>
	<itunes:image href="http://cloudofdata.com/logo300x300.jpg" />
		<item>
		<title>Data Market Chat: Rufus Pollock and Irina Bolychevsky discuss the Open Knowledge Foundation and CKAN</title>
		<link>http://cloudofdata.com/2012/03/ckan/</link>
		<comments>http://cloudofdata.com/2012/03/ckan/#comments</comments>
		<pubDate>Thu, 01 Mar 2012 17:00:49 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[data market chat]]></category>
		<category><![CDATA[data markets]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Podcast]]></category>
		<category><![CDATA[SaaS]]></category>
		<category><![CDATA[ckan]]></category>
		<category><![CDATA[data.gov.uk]]></category>
		<category><![CDATA[DataMarket]]></category>
		<category><![CDATA[Irina Bolychevsky]]></category>
		<category><![CDATA[okfn]]></category>
		<category><![CDATA[Open Knowledge Foundation]]></category>
		<category><![CDATA[rufus pollock]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1948</guid>
		<description><![CDATA[The Open Knowledge Foundation (OKFN) promotes the creation, dissemination and use of &#8216;open knowledge.&#8217; As part of this activity OKFN developed a data repository called CKAN, and has seen this become increasingly important to a range of data dissemination activities such as data.gov.uk and publicdata.eu. In this podcast I talk with OKFN Director Rufus Pollock [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 310px"><a href="http://en.wikipedia.org/wiki/File:Expendituremap.jpg" target="_blank"><img class="zemanta-img-inserted zemanta-img-configured" title="Screenshot of expenditure map app, using data...." src="http://cloudofdata.com/wp-content/uploads/2012/02/300px-Expendituremap.jpg" alt="Screenshot of expenditure map app, using data...." width="300" height="244" /></a><p class="wp-caption-text">Image via Wikipedia</p></div>
<p>The <a href="http://okfn.org/">Open Knowledge Foundation</a> (OKFN) promotes the creation, dissemination and use of &#8216;<a href="http://opendefinition.org/okd/">open knowledge</a>.&#8217; As part of this activity OKFN developed a data repository called <a href="http://ckan.org/">CKAN</a>, and has seen this become increasingly important to a range of data dissemination activities such as data.gov.uk and publicdata.eu.</p>
<p>In this podcast I talk with OKFN Director <a href="http://uk.linkedin.com/pub/rufus-pollock/48/863/a">Rufus Pollock</a> and CKAN Product Owner <a href="http://uk.linkedin.com/pub/irina-bolychevsky/b/11a/91">Irina Bolychevsky</a>, to learn more about CKAN, its use in the context of open data, and the wider implications for dissemination of <em>any</em> data (whether open or closed).</p>
<p></p>
<p><em>Following up on <a href="http://cloudofdata.com/2012/01/nurturing-the-market-for-data-markets/">a blog post that I wrote at the start of 2012</a>, this is the tenth in <a href="http://cloudofdata.com/category/podcast/data-market-chat/">an ongoing series of podcasts with key stakeholders in the emerging category of Data Markets</a>.</em></p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://www.readwriteweb.com/hack/2012/02/open-knowledge-releases-open-d.php" target="_blank">Open Knowledge Releases Open Data Handbook 1.0</a> (readwriteweb.com)</li>
<li class="zemanta-article-ul-li"><a href="http://cloudofdata.com/2012/02/data-market-chat-leigh-dodds-discusses-kasabi/" target="_blank">Data Market Chat: Leigh Dodds discusses Kasabi</a> (cloudofdata.com)</li>
<li class="zemanta-article-ul-li"><a href="http://radar.oreilly.com/2012/02/data-public-good.html" target="_blank">Data for the public good</a> (radar.oreilly.com)</li>
<li class="zemanta-article-ul-li"><a href="http://cloudofdata.com/2012/02/data-market-chat-stephen-ogrady-of-redmonk-examines-the-bigger-picture/" target="_blank">Data Market Chat: Stephen O&#8217;Grady of RedMonk examines the bigger picture</a> (cloudofdata.com)</li>
<li class="zemanta-article-ul-li"><a href="http://cloudofdata.com/2012/02/data-market-chat-nick-edouard-discusses-buzzdata/" target="_blank">Data Market Chat: Nick Edouard discusses BuzzData</a> (cloudofdata.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=2c7c5888-936a-45a1-9d38-a3a6b74f18d3" alt="" /></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2012/03/ckan/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2012/03/ckan/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
			<enclosure url="http://cloudofdata.com/podpress_trac/feed/1948/0/20120223-ckan.mp3" length="25187937" type="audio/mpeg" />
		<itunes:duration>0:52:23</itunes:duration>
		<itunes:subtitle>Image via Wikipedia
The Open Knowledge Foundation (OKFN) promotes the creation, dissemination and use of &#8216;open knowledge.&#8217; As part of this activity OKFN developed a data repository called CKAN, and has seen this become increasingly impor[...]</itunes:subtitle>
		<itunes:summary>Image via Wikipedia
The Open Knowledge Foundation (OKFN) promotes the creation, dissemination and use of &#8216;open knowledge.&#8217; As part of this activity OKFN developed a data repository called CKAN, and has seen this become increasingly important to a range of data dissemination activities such as data.gov.uk and publicdata.eu.
In this podcast I talk with OKFN Director Rufus Pollock and CKAN Product Owner Irina Bolychevsky, to learn more about CKAN, its use in the context of open data, and the wider implications for dissemination of any data (whether open or closed).

Following up on a blog post that I wrote at the start of 2012, this is the tenth in an ongoing series of podcasts with key stakeholders in the emerging category of Data Markets.
Related articles

Open Knowledge Releases Open Data Handbook 1.0 (readwriteweb.com)
Data Market Chat: Leigh Dodds discusses Kasabi (cloudofdata.com)
Data for the public good (radar.oreilly.com)
Data Market Chat: Stephen O&#8217;Grady of RedMonk examines the bigger picture (cloudofdata.com)
Data Market Chat: Nick Edouard discusses BuzzData (cloudofdata.com)



</itunes:summary>
		<itunes:keywords>Podcast, SaaS</itunes:keywords>
		<itunes:author>Paul Miller</itunes:author>
		<itunes:explicit>no</itunes:explicit>
		<itunes:block>no</itunes:block>
	</item>
		<item>
		<title>Open is good &#8211; but encouragement better than mandate</title>
		<link>http://cloudofdata.com/2012/02/open-is-good-but-encouragement-better-than-mandate/</link>
		<comments>http://cloudofdata.com/2012/02/open-is-good-but-encouragement-better-than-mandate/#comments</comments>
		<pubDate>Mon, 06 Feb 2012 14:13:23 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[1OdataLicenseEU]]></category>
		<category><![CDATA[Andrés Nin]]></category>
		<category><![CDATA[Creative Commons]]></category>
		<category><![CDATA[epsi]]></category>
		<category><![CDATA[epsiplatform]]></category>
		<category><![CDATA[neelie kroes]]></category>
		<category><![CDATA[Open Data Commons]]></category>
		<category><![CDATA[open licence]]></category>
		<category><![CDATA[open license]]></category>
		<category><![CDATA[psi directive]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1801</guid>
		<description><![CDATA[Openness is undeniably cool right now, at least if you move in the slightly odd circles that I do. Openly available scientific papers are disrupting the world of scholarly publishing (which may not be all good, but that&#8217;s a post for another day). Openly available university courses are finally beginning to work out how to [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 310px"><a href="http://commons.wikipedia.org/wiki/File:Open_Data_stickers.jpg"><img class="zemanta-img-inserted zemanta-img-configured" title="English: Open Data stickers" src="http://cloudofdata.com/wp-content/uploads/2011/07/300px-Open_Data_stickers5.jpg" alt="English: Open Data stickers" width="300" height="225" /></a><p class="wp-caption-text">Image via Wikipedia</p></div>
<p>Openness is undeniably cool right now, at least if you move in the slightly odd circles that I do. Openly available scientific papers are disrupting the world of scholarly publishing (which may not be all good, but that&#8217;s a post for another day). Openly available university courses are finally beginning to work out how to offer meaningful accreditation to students. Openly accessible data from government agencies around the world bulks out almost every data marketplace, and anchors many an analysis. Openly available code for cloud infrastructure or networking is challenging the hold of the tech world&#8217;s giants. Everywhere you look, &#8216;incumbents&#8217; are apparently being &#8216;challenged&#8217; and &#8216;disrupted&#8217; by the power of open.</p>
<p>The truth, of course, is a little more complex and a lot more nuanced, as business models shift and evolve just like they always have. In sustainable systems, some people still need to be rewarded (often through being paid) for their effort. And in sustainable systems, <em>paying</em> someone can often be a pretty straightforward means of ensuring that you have a throat to choke if something breaks; big companies adopting open source often seek a proper financial relationship with someone who installs and maintains the &#8216;free&#8217; software or hardware they&#8217;re depending upon.</p>
<p>One area of openness that I&#8217;ve been involved with for about ten years is that of open licensing for both creative works and data. And it&#8217;s come a very long way.</p>
<p>Here in Europe, for example, the (badly flawed) 2003 <a href="http://en.wikipedia.org/wiki/PSI_Directive">Public Sector Information Directive</a> is under review, and there&#8217;s every likelihood that the replacement will make a number of sensible moves toward greater openness, transparency, and reusability for publicly funded data. As <a href="http://epsiplatform.eu/content/single-eu-open-data-license-campaign">the EPSI Platform site notes</a> today, Andrés Nin proposes going a step further than the European Commission is currently contemplating, by <a href="http://actuable.es/peticiones/say-to-neeliekroeseu-we-want-single-opendata-licence-in-the">instituting a common open license across Europe</a>;</p>
<blockquote><p>&#8220;The creation of a single public information re-use space in Europe requires much more, it requires a common European OpenData license applicable to all data generated by European public administrations.&#8221;</p></blockquote>
<p>I would certainly welcome a <em>model license</em> that European member states might be enabled to use. I&#8217;d also welcome — and support — vigorous efforts to dissuade individual member states or ministries from their usual practice of tweaking and otherwise modifying perfectly good documents in order to demonstrate how &#8216;special&#8217; or &#8216;different&#8217; their circumstances apparently are. When will they all realise that they are neither as special nor as different as they like to think?</p>
<p>But — and it&#8217;s a big but — it seems unwise, premature, and unhelpful to even begin to suggest that such a license might be mandated across Europe. It isn&#8217;t required, and attempts to develop a single document that everyone could accept would be an unhelpful distraction that would result in something so bureaucratic, so ringed in opt-outs and prevarications, as to be utterly worthless. It would also, in all likelihood, be one of those exercises in which the process very quickly subsumed the point. A prime candidate for, in the words of an old boss, being too busy to be effective.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www.guardian.co.uk/science/2012/jan/27/academic-publishers-enemies-science-wrong&amp;a=72496211&amp;rid=76056481-0aaf-4346-84b0-0ed02aeddf27&amp;e=c5c38559b96c2a50e9bb649290e600df">Branding academic publishers &#8216;enemies of science&#8217; is offensive and wrong</a> (guardian.co.uk)</li>
<li class="zemanta-article-ul-li"><a href="http://opendotdotdot.blogspot.com/2011/12/open-data-europe-starts-to-get-it.html">Open Data: Europe Starts to Get It</a> (opendotdotdot.blogspot.com)</li>
<li class="zemanta-article-ul-li"><a href="http://thenextweb.com/eu/2011/12/12/open-data-in-europe-gets-a-huge-boost-from-new-eu-rules/">Open Data in Europe gets a huge boost from new EU rules</a> (thenextweb.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=76056481-0aaf-4346-84b0-0ed02aeddf27" alt="" /></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2012/02/open-is-good-but-encouragement-better-than-mandate/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2012/02/open-is-good-but-encouragement-better-than-mandate/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CloudCamp London: the Big Data Special</title>
		<link>http://cloudofdata.com/2012/01/cloudcamp-london-the-big-data-special/</link>
		<comments>http://cloudofdata.com/2012/01/cloudcamp-london-the-big-data-special/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 21:59:14 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[CloudCamp]]></category>
		<category><![CDATA[Open Data]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1761</guid>
		<description><![CDATA[The CloudCamp unconference returned to London for the 14th time this evening, regaling a capacity crowd in the Crypt below Clerkenwell&#8217;s St James Church with several hours of discussion and debate on the somewhat elusive topic of &#8216;Big Data&#8217;. Rather rough notes of the proceedings follow, after the break. LEF&#8216;s Simon Wardley kicked proceedings off as [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 250px"><a href="http://www.flickr.com/photos/48889057888@N01/6259499293"><img class="zemanta-img-inserted zemanta-img-configured" title="Big Data" src="http://cloudofdata.com/wp-content/uploads/2011/07/6259499293_b577b94cfd_m3.jpg" alt="Big Data" width="240" height="160" /></a><p class="wp-caption-text">Image by Kevin Krejci via Flickr</p></div>
<p>The <a href="http://cloudcamp.org/">CloudCamp</a> unconference <a href="http://cloudcamp.org/london">returned to London</a> for <a href="http://cloudcamplondon14.eventbrite.co.uk/">the 14th time</a> this evening, regaling a capacity crowd in the Crypt below Clerkenwell&#8217;s St James Church with several hours of discussion and debate on the somewhat elusive topic of &#8216;Big Data&#8217;.</p>
<p>Rather rough notes of the proceedings follow, after the break.<span id="more-1761"></span></p>
<p><a href="http://lef.csc.com/">LEF</a>&#8216;s <a href="http://blog.gardeviance.org/">Simon Wardley</a> kicked proceedings off as usual, once again managing to pepper an on-topic canter through the topic with a seemingly never-ending stream of Flickr images of cats… and analogies to electricity. You possibly had to be there? His core message, though? There&#8217;s nothing new under the sun… and the cycles of change just keep on coming.</p>
<p>Next, Peter Matthews from CA Labs, on &#8220;is big data mutually compatible with the cloud?&#8221; Erm, yes. Data volumes with big data are so large that it&#8217;s difficult to move it around… which creates opportunities for lock-in that vendors may wish to seize. And then he was out of time.</p>
<p>Next, Fujitsu&#8217;s Mark Wilson on &#8216;Structuring Big Data.&#8217; He&#8217;s actually talking about <em>Linked</em> Data, a topic I&#8217;ve dug into before here and over on semanticweb.com &#8211; Linked Data could be/ might be the effective realisation of the decade-old Semantic Web dream. Big Data means masses of unstructured or semi-structured content, presenting a management headache of previously unanticipated proportions. Linked Data, he argues, creates the mechanism to link all of this data together from across disparate sources. Yes, but it&#8217;s easier to say than to do… And in 5 minutes he really couldn&#8217;t explain enough to persuade the audience. Linked Data should be &#8220;the optimal reference source,&#8221; he said. It should be &#8220;a broker for all data sources,&#8221; and we should &#8220;think about integration, not duplication.&#8221; Yeeeeees… But.</p>
<p>Next, Canonical&#8217;s Nick Barcet, talking around scalability, Ubuntu, package management, configuration management, etc. Not wholly sure what the point was, I&#8217;m afraid.</p>
<p>Next, Chris Swan from UBS &#8211; big data and security. &#8220;If you&#8217;ve got security controls that aren&#8217;t properly monitored, then they don&#8217;t matter.&#8221;</p>
<p>Next, Tom Leyden of Amplidata &#8211; Big &#8220;Unstructured&#8221; Data in the Cloud. Data storage to increase 30x over the next decade, but staff will only increase 50% over the same period. Challenge in the 90s, as existing storage and analysis technologies struggled to cope with new data volumes. Seeing similar problems today with data streaming from sensor web, etc. Traditional file systems cannot cope. Object Storage the way forward ?</p>
<p>Next, Alex Farquhar &#8211; &#8220;Cloud v Big Data.&#8221; Not really versus… but intersection of the two. Too much discussion of his company, Forward. Just talking about how his company uses cloud to provision IT resources. Might work as a conference presentation or case study &#8211; not sure it fits as a 5 minute lightning chat. Around 60TB of data at Forward. Diverse and vital. Using Hadoop cluster &#8211; 24 nodes on-premise. Rationale (proximity to the cluster) seemed odd. That <em>can</em> be true, but not clear that it really needs to be the case here?</p>
<p>Next, Alaric Snell-Pym, on Scaling Hadoop. Trying to overcome Hadoop&#8217;s I/O bottleneck. Explaining basics of Hadoop and Map/Reduce &#8211; no one else has. Explains use of HDFS and &#8216;selective reading&#8217; to manage lots of small tables and overcome the problems of I/O.</p>
<p>Next, Matt Wood from Amazon. Talking about genetics and the human genome. It&#8217;s an analogy. Human Genome Project took years and millions of dollars. Development of gene sequencing machines led to a step change &#8211; dramatic drop in cost of sequencing DNA. Like the cloud, anyone? But… the machines create an analysis challenge, because they generate so much data. Cloud offers &#8220;collection of productivity tools&#8221; to help scientists work with this data collaboratively and (relatively) affordably. A perfect example of a lightning presentation, unlike most of those who preceded him.</p>
<p>And finally, an impromptu slot from HP&#8217;s Joe Weinman. A quick overview of current thinking behind his latest book. This one could have gone for <em>much</em> longer… Good stuff.</p>
<p>And that&#8217;s the lightning talks finished. Now, the panel, and Simon Wardley&#8217;s search for &#8220;experts&#8221; and &#8220;volunteers.&#8221;</p>
<p>…and unfortunately, your scribe was &#8216;volunteered&#8217; as an &#8216;expert&#8217; by Mr Wardley… and here end the notes. It <em>was</em> great to have Amazon&#8217;s Werner Vogels sneak in, and lob comments into the panel, though&#8230;</p>
<p>Great event, though with the usual mix of people you wish could have talked for longer&#8230; and people you wish wouldn&#8217;t have spoken.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://venturebeat.com/2012/01/24/big-data-server-efficiency/">The brave new world of big data &amp; Hadoop</a> (venturebeat.com)</li>
<li class="zemanta-article-ul-li"><a href="http://techcrunch.com/2012/01/25/big-vcs-invest-in-big-data-startup-continuuity/">Big VCs Invest In Big Data Startup Continuuity</a> (techcrunch.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Enhanced by Zemanta" href="http://www.zemanta.com/"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/zemified_e.png?x-id=35c6ea47-85f3-45da-9ee4-124d0591eda4" alt="Enhanced by Zemanta" /></a></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2012/01/cloudcamp-london-the-big-data-special/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2012/01/cloudcamp-london-the-big-data-special/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Top Level Domain for data answers the wrong question</title>
		<link>http://cloudofdata.com/2012/01/top-level-domain-for-data-answers-the-wrong-question/</link>
		<comments>http://cloudofdata.com/2012/01/top-level-domain-for-data-answers-the-wrong-question/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 14:41:35 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Enterprise Computing]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Web 3.0]]></category>
		<category><![CDATA[content negotiation]]></category>
		<category><![CDATA[Cybersquatting]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[data publishing]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[Data sharing]]></category>
		<category><![CDATA[Data Web]]></category>
		<category><![CDATA[Domain name]]></category>
		<category><![CDATA[Domain Name System]]></category>
		<category><![CDATA[ICANN]]></category>
		<category><![CDATA[Open University]]></category>
		<category><![CDATA[Southampton University]]></category>
		<category><![CDATA[Stephen Wolfram]]></category>
		<category><![CDATA[TLD]]></category>
		<category><![CDATA[Top-level domain]]></category>
		<category><![CDATA[Wolfram Research]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1640</guid>
		<description><![CDATA[British-born computer scientist Stephen Wolfram sees ongoing efforts to extend the Internet&#8217;s top-level domains (TLDs) beyond the familiar .com, .org, .uk etc as an opportunity to raise the profile of machine-readable data. In a blog post published yesterday, he argues that a new .data domain would increase &#8220;exposure of data on the internet—and [provide] added impetus for [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 310px"><a href="http://commons.wikipedia.org/wiki/File:Stephen_Wolfram_PR.jpg"><img class="zemanta-img-inserted zemanta-img-configured" title="English: Publicity photo of en:Stephen Wolfram." src="http://cloudofdata.com/wp-content/uploads/2011/07/300px-Stephen_Wolfram_PR2.jpg" alt="English: Publicity photo of en:Stephen Wolfram." width="300" height="428" /></a><p class="wp-caption-text">Image of Stephen Wolfram via Wikipedia</p></div>
<p>British-born computer scientist <a class="zem_slink" title="Stephen Wolfram" href="http://en.wikipedia.org/wiki/Stephen_Wolfram" rel="wikipedia">Stephen Wolfram</a> sees ongoing efforts to extend the Internet&#8217;s top-level domains (<a class="zem_slink" title="Top-level domain" href="http://en.wikipedia.org/wiki/Top-level_domain" rel="wikipedia">TLDs</a>) beyond the familiar .com, .org, .uk etc as an opportunity to raise the profile of machine-readable data. <a href="http://blog.stephenwolfram.com/2012/01/a-data-top-level-internet-domain/">In a blog post published yesterday</a>, he argues that a new .data domain would increase &#8220;exposure of data on the internet—and [provide] added impetus for organizations to expose data in a way that can efficiently be found and accessed.&#8221; Whilst wholly in favour of Wolfram&#8217;s stated aim, I can&#8217;t help feeling that his suggested solution is at best unnecessary and at worst a worrying segregration of data from the &#8216;proper&#8217; web that everyone else will continue to exploit.</p>
<p>Back in June of last year, the body responsible for coordinating the global domain name system <a href="http://arstechnica.com/business/news/2011/06/icann-approves-plan-to-vastly-expand-top-level-domains.ars">approved a plan to permit new top-level domains</a> (the letters after the final dot in an internet address — the .com in cloudofdata.<strong>com</strong>, the .uk in bbc.co.<strong>uk</strong>, the .edu in harvard.<strong>edu</strong>). Until recently, these top-level domains have been tightly controlled, with a small set of generic domains (<a class="zem_slink" title=".edu" href="http://en.wikipedia.org/wiki/.edu" rel="wikipedia">.edu</a>, <a class="zem_slink" title=".gov" href="http://en.wikipedia.org/wiki/.gov" rel="wikipedia">.gov</a>, <a class="zem_slink" title=".mil" href="http://en.wikipedia.org/wiki/.mil" rel="wikipedia">.mil</a>, <a href="http://en.wikipedia.org/wiki/.org">.org</a>, etc), a larger set of country domains (<a href="http://en.wikipedia.org/wiki/.uk">.uk</a>, <a href="http://en.wikipedia.org/wiki/.fi">.fi</a>, <a href="http://en.wikipedia.org/wiki/.nz">.nz</a>, etc) and one or two others such as <a href="http://en.wikipedia.org/wiki/.eu">.eu</a>. <a href="http://www.wired.com/epicenter/2012/01/icann-pushes-ahead-with-january-12-launch-for-new-top-level-domains/">From tomorrow</a>, anyone with $185,000 will be able to submit a proposal to create and manage a new top level domain, and it&#8217;s possible that there could eventually be <em>thousands</em> of them. Wolfram is keen to ensure that data doesn&#8217;t miss out on the &#8216;opportunity.&#8217;</p>
<p>As Wolfram himself recognises, there is already an awful lot of machine-readable data on the web. Some of it sits embedded within the web pages that humans read, with specially formatted code waiting to be triggered by the calendars, the address books, or the browser plugins of site visitors. Some of it is packaged up in data files, offered for download. And some of it waits inside a database, ready to be delivered in response to an API call or a query typed into a web form.</p>
<p>There is a growing enthusiasm for exposing this data for reuse. Government transparency agendas have driven public sector data sites like <a href="http://data.gov.uk">data.gov.uk</a> and <a href="http://data.gov/">data.gov</a>. Similarly, efforts such as <a href="http://data.open.ac.uk/">data.open.ac.uk</a> and <a href="http://data.southampton.ac.uk">data.southampton.ac.uk</a> see universities beginning to consciously collect data sets together and offer them up for reuse. Similar efforts in the commercial world are less easy to point to, but that reticence has nothing whatsoever to do with the lack of a ford.data, boeing.data, ge.data or astrazeneca.data domain!</p>
<p>In some ways, the convention for gathering significant chunks of data on a data.xxx.yyy site echoes Wolfram&#8217;s intention, but with a number of advantages. Data without context is far less valuable than data with context. Much of that context may be inferred from the domain in which the data lives, with data delivered from a .gov or .edu (or .gov.uk or .ac.uk) site perhaps interpreted differently to data hosted on .com, .biz, or .xxx. Southampton University, the Open University, and the US Federal Government are able to gather data up and make it available for download via their existing data. sites if they choose. This offers human visitors to their sites a degree of convenience, whilst retaining the power and brand attributes of their existing domain. Gov.data, gov.uk.data, open.ac.uk.data, southampton.ac.uk.data, though? All are messy, in ways that Wolfram&#8217;s own wolfram.data would admittedly not be, and all are simply additional registrations that the institutions would have to pay for in order to stop someone else grabbing the domain.</p>
<p>At the end of the day, the machines don&#8217;t actually care. The existing data.open.ac.uk-type sites are human conveniences, not machine enablers. The computers, and the software they run, are quite capable of crawling the public web and finding accessible data wherever it lies on a site. There are plenty of reasons to continue embedding little snippets of data inside human readable web pages, regardless of whether you have a data.wolfram.com or a wolfram.data site. <a href="http://en.wikipedia.org/wiki/Content_negotiation">Content negotiation</a> is becoming increasingly capable, such that there really is no need for what Wolfram calls a &#8216;parallel construct to the ordinary web&#8217; at all. A human being arriving at a web site sees human readable content, whilst various software tools would <a href="http://www.w3.org/TR/cooluris/#implementation">automatically</a> be presented with very different data or functions, optimised to their capabilities and requirements.</p>
<p>By all means, let us show the curious some of the existing techniques that work in making data more easily accessible. By all means, let us identify the gaps, the issues, the problems (<em>none</em> of which a new TLD even begins to address). Yes, let us definitely and unambiguously set about &#8220;highlighting the exposure of data on the internet—and providing added impetus for organizations to expose data in a way that can efficiently be found and accessed.&#8221;</p>
<p>But please, let us not be distracted by the false hope that adding yet another TLD to the babel that ICANN is about to unleash can do anything more than consign data to some online ghetto, wallowing unwanted, unloved and unused as companies and their customers lavish love, attention, and clicks upon the .com domain over on the &#8216;proper&#8217; web.</p>
<p><em>Thanks to <a href="http://www.eurecom.fr/~troncy/">Raphaël Troncy</a>, whose <a href="https://twitter.com/rtroncy/status/156850031670988800">tweet</a> first drew the story to my attention.</em></p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://techcrunch.com/2012/01/10/computers-data-domains/">Is It Time For Computers To Have Their Own .Data Domains?</a> (techcrunch.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.wired.com/epicenter/2012/01/icann-pushes-ahead-with-january-12-launch-for-new-top-level-domains/">ICANN Pushes Ahead With January 12 Launch For New Top-Level Domains</a> (wired.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.wired.com/epicenter/2012/01/icaan-president-beckstrom/all/1">The biggest change in DNS since Dot-Com</a> (wired.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=7fe922ae-4cad-445c-9f64-1df043b7dd90" alt="" /></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2012/01/top-level-domain-for-data-answers-the-wrong-question/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2012/01/top-level-domain-for-data-answers-the-wrong-question/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Trust, Big Data, Semantics, Data Marketplaces, and More Trust</title>
		<link>http://cloudofdata.com/2011/02/trust-big-data-semantics-data-marketplaces-and-more-trust/</link>
		<comments>http://cloudofdata.com/2011/02/trust-big-data-semantics-data-marketplaces-and-more-trust/#comments</comments>
		<pubDate>Sun, 27 Feb 2011 18:21:37 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[GigaOM]]></category>
		<category><![CDATA[gigaompro]]></category>
		<category><![CDATA[Microsoft windows azure data market]]></category>
		<category><![CDATA[rosslyn Analytics]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[semanticweb_com]]></category>
		<category><![CDATA[strataconf]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1518</guid>
		<description><![CDATA[I&#8217;ve had a few posts published over the weekend, picking up some things I have written about before. These are; My latest monthly column on SemanticWeb.com; Big Data Presents a Big Opportunity? My latest weekly wrap-up on GigaOMPro; Rosslyn Analytics, Microsoft Finding Value in Data Aggregation The teaser piece on GigaOM&#8217;s public Cloud site; In [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had a few posts published over the weekend, picking up some things I have written about before. These are;</p>
<ul>
<li>My latest monthly column on SemanticWeb.com; <em><a href="http://semanticweb.com/big-data-presents-a-big-opportunity_b17764">Big Data Presents a Big Opportunity?</a></em></li>
<li>My latest weekly wrap-up on GigaOMPro; <em><a href="http://pro.gigaom.com/2011/02/rosslyn-analytics-microsoft-finding-value-in-data-aggregation">Rosslyn Analytics, Microsoft Finding Value in Data Aggregation</a></em></li>
<li>The teaser piece on GigaOM&#8217;s public Cloud site; <em><a href="http://gigaom.com/cloud/in-exploiting-the-data-market-trust-is-key/">In Exploiting the Data Market, Trust Is Key</a></em></li>
</ul>
<p>I spot a theme building&#8230;</p>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2011/02/trust-big-data-semantics-data-marketplaces-and-more-trust/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2011/02/trust-big-data-semantics-data-marketplaces-and-more-trust/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Off to Santa Clara for O&#8217;Reilly&#8217;s Strata Conference</title>
		<link>http://cloudofdata.com/2011/01/off-to-santa-clara-for-oreillys-strata-conference/</link>
		<comments>http://cloudofdata.com/2011/01/off-to-santa-clara-for-oreillys-strata-conference/#comments</comments>
		<pubDate>Tue, 25 Jan 2011 15:08:06 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Web 3.0]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[BigData]]></category>
		<category><![CDATA[California]]></category>
		<category><![CDATA[edd dumbill]]></category>
		<category><![CDATA[O'Reilly Media]]></category>
		<category><![CDATA[oreilly]]></category>
		<category><![CDATA[Santa Clara]]></category>
		<category><![CDATA[strata]]></category>
		<category><![CDATA[strataconf]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1410</guid>
		<description><![CDATA[I&#8217;m off to California this weekend, heading for Santa Clara and O&#8217;Reilly Media&#8216;s inaugural Big Data conference, Strata. There are some great sessions on the Programme, and I look forward to comparing the diverse ways in which Big Data concepts and methods are being put to work across a range of market segments. I also [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http:/strataconf.com" target="_blank" class="broken_link"><img class="alignright size-full wp-image-1413" style="margin: 5px; border: 0px initial initial;" title="Attending Strata" src="http://cloudofdata.com/wp-content/uploads/2011/01/strata2011_attending_125x125.jpg" alt="" width="125" height="125" /></a>I&#8217;m off to California this weekend, heading for Santa Clara and <a href="http://oreilly.com/">O&#8217;Reilly Media</a>&#8216;s inaugural <a class="zem_slink" title="Big data" rel="wikipedia" href="http://en.wikipedia.org/wiki/Big_data">Big Data</a> conference, <a href="http://strataconf.com/strata2011">Strata</a>.</p>
<p>There are some great sessions on the <a href="http://strataconf.com/strata2011/public/schedule/grid">Programme</a>, and I look forward to comparing the diverse ways in which Big Data concepts and methods are being put to work across a range of market segments. I also look forward to exploring answers to<a href="http://cloudofdata.com/2010/11/is-there-a-disconnect-between-big-data-and-the-web-of-data/"> some of the questions I posed back in November</a>.</p>
<p>As usual, the diary is filling up with meetings, briefings, and <a href="http://strataconf.com/strata2011/public/schedule/share/03f67957da9917584420fed0083cd787">the odd conference session</a>, but there are still <a href="http://tungle.me/PaulMiller">some gaps to fill</a>. If you&#8217;re at the event &#8211; or in the area &#8211; and want a chat, why not <a href="http://tungle.me/PaulMiller">grab one of the slots over on Tungle</a>?</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://radar.oreilly.com/2010/12/six-months-after-what-is-data.html">Six months after &#8220;What is data science?&#8221;</a> (radar.oreilly.com)</li>
<li class="zemanta-article-ul-li"><a href="http://flowingdata.com/2011/01/06/oreilly-strata-conference-only-a-few-days-left-for-discounted-early-registration/">O&#8217;Reilly Strata Conference: Only a few days left for early registration + reader discount</a> (flowingdata.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.lockergnome.com/it/2010/12/07/big-data-oreilly-strata-conference/">Big Data Examined At Inaugural O&#8217;Reilly Strata Conference</a> (lockergnome.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=e488e4a8-22cc-4d67-ac18-85cfe1cd94c9" alt="" /><span class="zem-script pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2011/01/off-to-santa-clara-for-oreillys-strata-conference/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2011/01/off-to-santa-clara-for-oreillys-strata-conference/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Is there a disconnect between Big Data and the Web of Data ?</title>
		<link>http://cloudofdata.com/2010/11/is-there-a-disconnect-between-big-data-and-the-web-of-data/</link>
		<comments>http://cloudofdata.com/2010/11/is-there-a-disconnect-between-big-data-and-the-web-of-data/#comments</comments>
		<pubDate>Tue, 30 Nov 2010 16:36:08 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Enterprise Computing]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Web 3.0]]></category>
		<category><![CDATA[Big Data]]></category>
		<category><![CDATA[BigData]]></category>
		<category><![CDATA[Defrag]]></category>
		<category><![CDATA[Glue]]></category>
		<category><![CDATA[LinkedData]]></category>
		<category><![CDATA[OpenData]]></category>
		<category><![CDATA[strataconf]]></category>
		<category><![CDATA[structureconf]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1308</guid>
		<description><![CDATA[Image via Wikipedia &#8216;Big Data&#8216; is currently capturing the imagination, attracting hype, investment and ambitious startups in almost equal measure. Kim and Eric Norlin&#8217;s excellent Defrag and Glue events have gained big-name company, with O&#8217;Reilly&#8216;s Strata and GigaOM&#8216;s Structure both set to arrive in the first quarter of 2011. Venture firms like IA Ventures have emerged, specifically [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright" style="width: 310px;">
<dt class="wp-caption-dt"><a href="http://commons.wikipedia.org/wiki/File:WorldWideWebAroundWikipedia.png"><img title="A data visualization of Wikipedia as part of t..." src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/b9/WorldWideWebAroundWikipedia.png/300px-WorldWideWebAroundWikipedia.png" alt="A data visualization of Wikipedia as part of t..." width="300" height="216" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://commons.wikipedia.org/wiki/File:WorldWideWebAroundWikipedia.png">Wikipedia</a></dd>
</dl>
</div>
</div>
<p>&#8216;<a class="zem_slink" title="Big data" rel="wikipedia" href="http://en.wikipedia.org/wiki/Big_data">Big Data</a>&#8216; is currently capturing the imagination, attracting hype, investment and ambitious startups in almost equal measure. Kim and Eric Norlin&#8217;s excellent <a href="http://www.defragcon.com/">Defrag</a> and <a href="http://www.gluecon.com/">Glue</a> events have gained big-name company, with <a href="http://conferences.oreillynet.com/">O&#8217;Reilly</a>&#8216;s <a href="http://strataconf.com/strata2011">Strata</a> and <a href="http://gigaom.com/events/">GigaOM</a>&#8216;s <a href="http://gigaom.com/bigdata/">Structure</a> both set to arrive in the first quarter of 2011. Venture firms like <a href="http://www.iaventurepartners.com/">IA Ventures</a> have emerged, specifically targeted at finding, funding, and profiting from the <em>big</em> Big Data idea. Giants of the web from <a class="zem_slink" title="Yahoo!" rel="homepage" href="http://www.yahoo.com">Yahoo!</a> and <a href="http://www.amazon.com/">Amazon</a> to <a class="zem_slink" title="Twitter" rel="homepage" href="http://twitter.com">Twitter</a> and <a class="zem_slink" title="Facebook" rel="homepage" href="http://facebook.com">Facebook</a> solve their own Big Data problems in very different ways, contributing valuable code and experience to the community whilst simultaneously diluting focus and adding to the cacophony.</p>
<p>Flippantly reckoned by many to be &#8216;anything that requires more than a single machine to run,&#8217; the Big Data reality remains somewhat harder to pin down. To those seeking routine business insight, that mammoth Excel spreadsheet they laboriously query overnight at the end of each month might quite justifiably be thought of as &#8216;Big.&#8217; At the other end of the scale, data wizards scorn anything that doesn&#8217;t require a room full of servers, a mountain of empty pizza boxes, and the careful construction of a bespoke data ingest, management and querying system atop the most bare-bones version of the Linux kernel they can find. Somewhere between the two, a growing mass of cheaply gathered data holds out the promise of invaluable insight. Remote sensors, web clickstreams, social graph interactions, purchaser (and non-purchaser) behaviours. All these, and more, have much to tell planners, builders, makers, sellers, and buyers. If only we could formulate the right questions. If only we could devise the right sampling strategies. If only we had big enough machines to ask lots of questions using lots of sampling strategies. If only we had big enough machines to not bother sampling at all.</p>
<p>On the hardware side of things, even humble domestic laptops typically ship with at least two cores these days; two separate little computers ready to do the data processor&#8217;s bidding. Four, eight, sixteen and more cores are not far behind, but mainstream software products typically fail to exploit anything more than a single core. Push Excel as hard as you like, and it won&#8217;t do more than take <em>one</em> of your computer&#8217;s multiple cores to the max. On that 12-core Mac Pro you persuaded the boss to buy, only one core will be hard at work on your data. Twitter, Mail, YouTube, and ripping DVDs  will each be giving other cores a little light exercise whilst others sit idly by, waiting for the arrival of operating systems and applications capable of exploiting multi-core power. The same is true as jobs grow and move to run across multiple machines, whether under your desk, in your data centre, or out in the Cloud. Those big datasets need to be carved up and shared amongst the available computers before any analysis takes place. You&#8217;re typically not accessing a &#8216;big computer in the Cloud&#8217; at all&#8230; but lots of relatively small (commodity) computers, and it takes careful planning and smart software to manage the division and recombination of those jobs in a cost-effective manner. Projects such as <a href="http://db.cs.berkeley.edu/jmh/">Joseph Hellerstein</a>&#8216;s Berkeley Orders of Magnitude (<a href="http://boom.cs.berkeley.edu/">BOOM</a>) begin to demonstrate some of the potential for working natively with multiple processors, but there&#8217;s a long way to go before those advances reach the mainstream.</p>
<p><a href="http://en.wikipedia.org/wiki/Hadoop">Hadoop</a>, <a href="http://en.wikipedia.org/wiki/Apache_Cassandra">Cassandra</a>, <a class="zem_slink" title="MapReduce" rel="wikipedia" href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>, <a href="http://en.wikipedia.org/wiki/Dynamo_(storage_system)">Dynamo</a>, <a href="http://en.wikipedia.org/wiki/Project_Voldemort#SNA_LinkedIn">Voldemort</a>. These, and more, are solutions developed by the likes of Yahoo!, Facebook, Google, Amazon and <a class="zem_slink" title="LinkedIn" rel="homepage" href="http://www.linkedin.com">LinkedIn</a> to tackle the influx of data that each faced &#8211; and for which each had failed to find an existing solution. Hadoop, with the addition of <a href="http://www.cloudera.com/">Cloudera</a>&#8216;s commercial polish, is rapidly emerging as the front runner for an off the shelf Big Data solution, but all of these tools remain rather narrow in their abilities. Find the type of data or the nature of query for which each of these was built and its performance will be unbeatable, but we are a very long way from Big Data&#8217;s equivalent of the jack-of-all-trades SQL-powered relational database of old.</p>
<p>And there, for many enterprises, lies the problem. Useful Google searches require the crawler, index and UI to do a relatively small number of essentially similar tasks, very quickly, very cost-effectively, and at massive scale. Focus on that finite set of problems, and you build a solution that delivers the experience we&#8217;ve all come to know. Each type of data manipulation or analysis requires a different tool, differently optimised, with the inevitable result that a typically diverse organisation may require a plethora of Big Data tools to get their work done. Or they might just continue to muddle along with Oracle or <a class="zem_slink" title="MySQL" rel="homepage" href="http://www.mysql.com">mySQL</a>, churning inefficiently through their data analysis jobs for interminably long periods of time. These relational database tools are understood, they are mature, and they get the job done. Except in the most data-intensive industries, they have a market presence that will be difficult to disrupt.</p>
<p>The Big Data space is seeing remarkable innovation, but there is a long way to go in order to lift it out of the domain of the technically proficient specialist and place it on desktops across the organisation. As IA Ventures&#8217; Brad Gillespie notes, &#8220;Excel is where the world&#8217;s data lives&#8230; [and] Big Data has to get to that place&#8230; so that a CMO can leverage it directly.&#8221;</p>
<p>And in all of this fervent of innovation, to return to the title of the post, it strikes me that Big Data is becoming disconnected from the fabric of the web itself. Oh, much of the data certainly <em>comes</em> from the Web, and a lot of it might even be queried on the Web after processing. But, somewhere along the line, the <em>linkedness</em> of the Web has either been forgotten or ignored. That rich set of connections, interconnections and associations has been reduced to a table, an index, or a (large) set of key-value pairs. And in the process, something fundamental has gone away.</p>
<p>This is enough for now, though. Looking more closely at different Big Data approaches, and exploring the potential for re-introducing the Web must wait for future posts.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://venturebeat.com/2010/10/26/cloudera-raises-25m-to-help-deal-with-the-enterprise-data-deluge/">Cloudera raises $25M to help deal with the enterprise data deluge</a> (venturebeat.com)</li>
<li class="zemanta-article-ul-li"><a href="http://radar.oreilly.com/2010/10/strata-week-building-data-star.html">Strata Week: Building data startups</a> (radar.oreilly.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.readwriteweb.com/cloud/2010/09/hadoop-and-a-critique-on-geek.php">Big Data and a Critique of Geek Culture</a> (readwriteweb.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.nytimes.com/external/gigaom/2010/10/30/30gigaom-big-data-and-nosql-march-to-the-enterprise-73963.html">Big Data and NoSQL March to the Enterprise</a> (nytimes.com)</li>
<li class="zemanta-article-ul-li"><a href="http://news.cnet.com/8301-21546_3-20023969-10253464.html?part=rss&amp;subj=news">Does &#8216;big data&#8217; equal big opportunity for storage vendors?</a> (news.cnet.com)</li>
<li class="zemanta-article-ul-li"><a href="http://blog.programmableweb.com/2010/11/29/new-york-times-event-shows-the-promise-of-big-data/">New York Times Event Shows the Promise of Big Data</a> (programmableweb.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.readwriteweb.com/enterprise/2010/11/executives-are-addicted-to-big.php">Overwhelmed Executives Still Crave Big Data, Says Survey</a> (readwriteweb.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=5578c8b2-c0db-4f2b-b846-1aac2b8adc42" alt="" /><span class="zem-script pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2010/11/is-there-a-disconnect-between-big-data-and-the-web-of-data/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2010/11/is-there-a-disconnect-between-big-data-and-the-web-of-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Apps, App Stores, and Government Data</title>
		<link>http://cloudofdata.com/2010/10/apps-app-stores-and-government-data/</link>
		<comments>http://cloudofdata.com/2010/10/apps-app-stores-and-government-data/#comments</comments>
		<pubDate>Tue, 26 Oct 2010 09:11:08 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Open Data]]></category>
		<category><![CDATA[app]]></category>
		<category><![CDATA[app store]]></category>
		<category><![CDATA[ec]]></category>
		<category><![CDATA[epsi]]></category>
		<category><![CDATA[epsiplatform]]></category>
		<category><![CDATA[epsiplus]]></category>
		<category><![CDATA[European Commission]]></category>
		<category><![CDATA[psi]]></category>
		<category><![CDATA[psi directive]]></category>
		<category><![CDATA[Public sector]]></category>
		<category><![CDATA[public sector information]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1268</guid>
		<description><![CDATA[Image via Wikipedia A short report that I was commissioned to write for the European Public Sector Information Platform has just been published. The rise of the App: a PSI opportunity? introduces (smartphone) apps and app stores to those in European governments responsible for meeting their obligations under the 2003 Public Sector Information (PSI) Directive. [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright" style="width: 310px;">
<dt class="wp-caption-dt"><a href="http://commons.wikipedia.org/wiki/File:Olympic_Swimming_Pool_-_Fast_Lane.JPG"><img title="Olympic Swimming Pool Fast Lane Category:Outdo..." src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e1/Olympic_Swimming_Pool_-_Fast_Lane.JPG/300px-Olympic_Swimming_Pool_-_Fast_Lane.JPG" alt="Olympic Swimming Pool Fast Lane Category:Outdo..." width="300" height="200" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://commons.wikipedia.org/wiki/File:Olympic_Swimming_Pool_-_Fast_Lane.JPG">Wikipedia</a></dd>
</dl>
</div>
</div>
<p>A short report that I was commissioned to write for the <a href="http://www.epsiplatform.eu/">European Public Sector Information Platform</a> has just been published.</p>
<p><em><a href="http://www.epsiplatform.eu/topic_reports/topic_report_no_18_the_rise_of_the_app_a_psi_opportunity">The rise of the App: a PSI opportunity?</a></em> introduces (smartphone) apps and app stores to those in European governments responsible for meeting their obligations under the 2003 <a href="http://ec.europa.eu/information_society/policy/psi/actions_eu/policy_actions/index_en.htm">Public Sector Information (PSI) Directive</a>.</p>
<p>Unfortunately somewhat tangential to the more recent (and cooler?) enthusiasm for Open Data, governments&#8217; compliance with the <a class="zem_slink" title="Directive on the re-use of public sector information" rel="wikipedia" href="http://en.wikipedia.org/wiki/Directive_on_the_re-use_of_public_sector_information">PSI Directive</a> has largely failed to engage the community of active and enthusiastic developers who might build compelling tools atop all that data.</p>
<p>Although elected officials might love showing their friends a council-branded iPhone app that knows where all the publicly-funded swimming pools within a single local government area are located, is that <em>really</em> a useful tool for anybody? Would it not be more useful to see the data made available in forms, formats and locations regularly frequented by communities of third party developers? Then you might see all the swimming pools, you might cross (mostly meaningless) local government boundaries, and you might pull in other leisure activities, so that a real user can ask &#8216;where can I swim?&#8217; or &#8216;where can I go and have some fun?,&#8217; instead of the rather unlikely &#8216;where can I swim in a council swimming pool?&#8217; If you care that much about swimming in the pools of your local council, won&#8217;t you know where they are? And if you don&#8217;t, how likely are you to download an app to answer the question? Once you&#8217;ve answered it once (surely a Google query, rather than an app download) the app is useless.</p>
<p><em>As well as</em> being formally released via council, region, agency and national web sites, should freely reusable public sector data not be <em>actively</em> contributed to <a class="zem_slink" title="Factual" rel="homepage" href="http://www.factual.com/">Factual</a>, <a class="zem_slink" title="Infochimps" rel="homepage" href="http://infochimps.org/">Infochimps</a> and the like? <em>So long as a free copy is available somewhere</em>, is it really a problem if someone else can take that data, add value to it, and make a little bit of money?</p>
<p>Public Sector Information is wide-ranging, comprehensive, and authoritative. It is truly insane for these rich resources not to underpin a wealth of applications originating in both the public and private sectors.</p>
<p>All we need to do is abolish some of the weirder licensing restrictions, disabuse <em>some</em> governments of the idea that PSI will make them rich, and make the data easy to find, easy to select, easy to get, easy to integrate, and easy to keep current. Easy, huh? Let&#8217;s do it.</p>
<p><em>The European Commission recognises that the PSI Directive is due a refresh, and is <a href="http://ec.europa.eu/yourvoice/ipm/forms/dispatch?form=psidirective2010">currently consulting</a> on next steps in this area.</em></p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://go.theregister.com/feed/www.theregister.co.uk/2010/09/13/european_commission_info_consultation/">Europe begins revamp of rules on re-use of public info</a> (go.theregister.com)</li>
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www.guardian.co.uk/uk/2010/oct/25/transport-for-london-cuts-off-app-data&amp;a=27095506&amp;rid=811f41a9-080b-409e-88c2-ea0bfc5cc85f&amp;e=b6b84b880b004597228a56817ae262de">Transport for London locks app users out of online travel data feed</a> (guardian.co.uk)</li>
<li class="zemanta-article-ul-li"><a href="http://radar.oreilly.com/2010/10/is-there-a-government-app-for.html">&#8220;Shiny app syndrome&#8221; and Gov 2.0</a> (radar.oreilly.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=811f41a9-080b-409e-88c2-ea0bfc5cc85f" alt="" /><span class="zem-script pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2010/10/apps-app-stores-and-government-data/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2010/10/apps-app-stores-and-government-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>&#8216;Open&#8217; good, but there&#8217;s plenty of room for &#8216;almost open&#8217; and &#8216;not open&#8217; too</title>
		<link>http://cloudofdata.com/2010/10/open-good-but-theres-plenty-of-room-for-almost-open-and-not-open-too/</link>
		<comments>http://cloudofdata.com/2010/10/open-good-but-theres-plenty-of-room-for-almost-open-and-not-open-too/#comments</comments>
		<pubDate>Wed, 13 Oct 2010 13:23:36 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Animal Farm]]></category>
		<category><![CDATA[Data sharing]]></category>
		<category><![CDATA[George Orwell]]></category>
		<category><![CDATA[Open science data]]></category>
		<category><![CDATA[Ordnance Survey]]></category>
		<category><![CDATA[Russian Revolution]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1167</guid>
		<description><![CDATA[Image by Ben Templesmith via Flickr Towards the end of George Orwell&#8217;s allegorical take on the Stalinist Revolution, the pigs of Animal Farm take on the trappings of the humans they supplanted, shifting ideologically from &#8216;Four Legs Good, Two Legs Bad&#8217; to declare &#8216;Four Legs Good, Two Legs Better!&#8217; as they rise to stand on [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright" style="width: 164px;">
<dt class="wp-caption-dt"><a href="http://www.flickr.com/photos/24905220@N00/3145162135"><img title="Animal Farm" src="http://farm4.static.flickr.com/3211/3145162135_9a9492b1b5_m.jpg" alt="Animal Farm" width="154" height="240" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image by <a href="http://www.flickr.com/photos/24905220@N00/3145162135">Ben Templesmith</a> via Flickr</dd>
</dl>
</div>
</div>
<p>Towards the end of George Orwell&#8217;s allegorical take on the Stalinist Revolution, the pigs of <em><a class="zem_slink freebase/en/animal_farm" title="Animal Farm: Centennial Edition" rel="amazon" href="http://www.amazon.com/Animal-Farm-Centennial-George-Orwell/dp/0452284244%3FSubscriptionId%3D0G81C5DAZ03ZR9WH9X82%26tag%3Dcloofdat-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0452284244">Animal Farm</a></em> take on the trappings of the humans they supplanted, shifting ideologically from &#8216;Four Legs Good, Two Legs Bad&#8217; to declare &#8216;Four Legs Good, Two Legs Better!&#8217; as they rise to stand on their hind legs.</p>
<p>The pigs&#8217; dogmatism forces a series of increasingly convoluted rationalisations, until they end up professing exactly the opposite of their original position. Black, it seems, really <em>can</em> be white&#8230; but there&#8217;s absolutely no room for grey.</p>
<p>With data, current moves toward &#8216;open&#8217; are certainly to be lauded, and we should continue to demonstrate the benefits of more equitable access in persuading those who have yet to realise the opportunities for rethinking their business.</p>
<p>However, I&#8217;ve been in too many situations recently where persuasion, encouragement and demonstration have been cast aside in favour of brow beating, castigation and vitriol. Anyone who fails to immediately throw open the doors to their data vaults is, the argument increasingly seems to go, cruelly, wantonly, and entirely unreasonably standing in the path of truth, justice, and the {insert name of country} way. The language is intemperate, and the unspoken undercurrent of feeling seems almost to lump these evil data hoarders with the most vile underminers of social cohesion.</p>
<p>Nonsense.</p>
<p>Open Data is a good thing, and we could benefit from an awful lot more of it. But the arguments surely shouldn&#8217;t be religious (&#8216;Open&#8217; is better than &#8216;Closed&#8217;) or so polarised that compromise typically becomes impossible. Instead, we need collectively to demonstrate the value of change, and we need to understand and respect the positions of the market&#8217;s incumbents. Current practice should never be accepted as an <em>excuse</em> for lack of change, but all too often it may actually mask quite a good set of <em>reasons</em>.</p>
<p>Where data are currently sold, can we (as was to some extent done for the Ordnance Survey) calculate the costs of data collection, curation and sale, and demonstrate convincingly that <em>more</em> money could be made by removing that initial barrier to access?</p>
<p>Where a data holder participates in an existing data sharing arrangement with their peers, surely we can gather the evidence to demonstrate the likely effect of opening parts of that value chain&#8230; without destabilising an otherwise useful set of collaborations?</p>
<p>Where large quantities of low value data (such as a customer&#8217;s address) are stored and managed alongside highly valuable business data (the facts of a customer relationship), we can certainly set about demonstrating the ways in which a more open approach could pay dividends; instead of managing that postcode yourself, share a little in order to benefit from the work done by others on tracking past, current, and future changes of address.</p>
<p>Then again, maybe we should just scream and swear at all those data-hoarding dinosaurs, without trying to understand them or engage with their fears, concerns, and counter-arguments. It&#8217;s much easier that way.</p>
<p><em>Four Legs Good. Two Legs Often Quite Good, Too</em>!</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://radar.oreilly.com/2010/10/the-black-market-for-data.html">The black market for data</a> (radar.oreilly.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.techvibes.com/blog/gtec-2010-david-eaves-on-open-data-just-do-it">GTEC 2010: David Eaves on Open Data: &#8216;Just Do It!&#8217;</a> (techvibes.com)</li>
<li class="zemanta-article-ul-li"><a href="http://blogs.talis.com/nodalities/2010/08/the-linked-open-data-and-pavlova.php">The Linked Open Data and Pavlova</a> (blogs.talis.com)</li>
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www.guardian.co.uk/technology/datablog/2010/oct/13/free-data-nottingham-classes&amp;a=26313411&amp;rid=23070966-2fcd-40de-9234-e58f3edaf41a&amp;e=8d903551cdcbff112b530ca7949e59dc">Nottingham University offers masterclasses in dealing with open data &#8211; for free of course</a> (guardian.co.uk)</li>
<li class="zemanta-article-ul-li"><a href="http://cloudofdata.com/2010/07/talking-with-richard-stirling-about-progress-with-data-gov-uk/">Talking with Richard Stirling about progress with data.gov.uk</a> (cloudofdata.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.v3.co.uk/v3/news/2271319/ico-launches-consultation">ICO launches consultation on data sharing code of practice</a> (v3.co.uk)</li>
<li class="zemanta-article-ul-li"><a href="http://www.readwriteweb.com/archives/when_open_data_is_bad.php">How Open Data is Used Against the Poor</a> (readwriteweb.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=23070966-2fcd-40de-9234-e58f3edaf41a" alt="" /><span class="zem-script pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2010/10/open-good-but-theres-plenty-of-room-for-almost-open-and-not-open-too/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2010/10/open-good-but-theres-plenty-of-room-for-almost-open-and-not-open-too/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Repositories in the Cloud? Why on earth not?!</title>
		<link>http://cloudofdata.com/2010/02/repositories-in-the-cloud-why-on-earth-not/</link>
		<comments>http://cloudofdata.com/2010/02/repositories-in-the-cloud-why-on-earth-not/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 18:05:42 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Academic publishing]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Andy Powell]]></category>
		<category><![CDATA[Archives]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[Colleges and Universities]]></category>
		<category><![CDATA[Eduserv]]></category>
		<category><![CDATA[Higher Education]]></category>
		<category><![CDATA[infochimps]]></category>
		<category><![CDATA[Institutional repository]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[Open access]]></category>
		<category><![CDATA[Panton Principles]]></category>
		<category><![CDATA[repcloud]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software as a service]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=932</guid>
		<description><![CDATA[To be honest, I&#8217;ve never fully understood Higher Education&#8217;s penchant for building &#8216;institutional repositories.&#8217; These frequently under-populated aggregations of academic papers produced by &#8216;research active&#8217; employees of a particular university appear aligned almost exclusively to vaguely expressed institutional imperatives, and seem largely unrelated to either the selfish aspirations of the contributing authors or the tangible [...]]]></description>
			<content:encoded><![CDATA[<p>To be honest, I&#8217;ve never fully understood Higher Education&#8217;s penchant for building &#8216;<a class="zem_slink freebase/en/institutional_repository" title="Institutional repository" rel="wikipedia" href="http://en.wikipedia.org/wiki/Institutional_repository">institutional repositories</a>.&#8217; These frequently under-populated aggregations of academic papers produced by &#8216;research active&#8217; employees of a particular university appear aligned almost exclusively to vaguely expressed institutional imperatives, and seem largely unrelated to either the selfish aspirations of the contributing authors or the tangible relationships they painstakingly construct with others across their chosen discipline. The &#8216;repository&#8217; all too often appears a bureaucratic solution to a problem that the supposed beneficiaries do not recognise; a technological aberration that sits outside the conversational flow of the Web to which it is only tenuously attached.</p>
<p>Furthermore, &#8216;<a class="zem_slink freebase/en/open_access" title="Open access (publishing)" rel="wikipedia" href="http://en.wikipedia.org/wiki/Open_access_%28publishing%29">Open Access</a>&#8216; and &#8216;Repository&#8217; typically go hand in hand. If you support Open Access you need a repository, and if you question the role of repositories you&#8217;re in the pocket of evil publishers who want to lock up everything ever written and lease reading rights back to the employers of those who wrote the stuff in the first place.</p>
<p>Nonsense.</p>
<p>Open Access is an important component of today&#8217;s scholarly ecosystem. It&#8217;s not the only answer, and it&#8217;s not perfect, but it <em>does</em> have a significant part to play. Institutions have a role in preserving, disseminating and exploiting the work of their employees, but these are very different tasks that may benefit from different solutions. In too many cases, the repository is by default seen as a preservation mechanism <em>and</em> a dissemination vehicle, and as such it may fail to cost-effectively achieve either aim.</p>
<p>There are some large, well known, and research-intensive institutions where it might be possible to make a compelling argument for projecting a strong institutional image around a single &#8216;home&#8217; for all of that research output. Never mind, for a moment, that so much research today is the result of inter-institutional collaboration, or that the eminent researcher might wish to take &#8216;their&#8217; research publications with them as they move from Oxford to Harvard to York during their glittering career.</p>
<p>Alongside those institutions sit a plethora of others where research of equal quality is also being conducted; there just, maybe, isn&#8217;t quite as much of it. Bombarded by &#8216;advice&#8217; and funding, and desperate to keep up with the <a class="zem_slink freebase/en/russell_group" title="Russell Group" rel="wikipedia" href="http://en.wikipedia.org/wiki/Russell_Group">Russell Group</a>, ever-more institutions blindly join the repository cult and wonder why their new toys do not fill to overflowing with the jewels of scholarly erudition.</p>
<p>As research becomes increasingly data-rich, the whole cycle looks set to repeat. The recently released <a href="http://pantonprinciples.org/">Panton Principles</a> for <a class="zem_slink freebase/en/open_data" title="Open Data" rel="wikipedia" href="http://en.wikipedia.org/wiki/Open_Data">Open Data</a> in Science are to be welcomed, but I&#8217;ll bet the institutional response will all too often be the commissioning of a &#8216;data repository&#8217; to sit alongside the &#8216;publication repository&#8217; they already don&#8217;t use.</p>
<p>All of which is a rather long-winded way of introducing the fact that Eduserv&#8217;s <a class="zem_slink" title="Andy Powell" rel="twitter" href="http://twitter.com/andypowe11">Andy Powell</a> has asked me to facilitate a breakout afternoon on &#8216;Policy Issues&#8217; at the <a href="http://www.eduserv.org.uk/events/repcloud" class="broken_link">Repositories in the Cloud</a> event <a href="http://www.eduserv.org.uk/research">Eduserv</a> and <a class="zem_slink freebase/en/joint_information_systems_committee" title="Joint Information Systems Committee" rel="wikipedia" href="http://en.wikipedia.org/wiki/Joint_Information_Systems_Committee">JISC</a> are holding in London on Tuesday.</p>
<blockquote><p>&#8220;This free event, organised jointly by Eduserv and the JISC, will bring together software developers, repository managers, service providers, funding and advisory bodies to discuss the potential policy and technical issues associated with <strong>cloud computing</strong> and the delivery of <strong>repository services</strong> in UK HEIs.&#8221;</p></blockquote>
<p>In a post on 11 February, <a href="http://efoundations.typepad.com/efoundations/2010/02/repositories-and-the-cloud-tell-us-your-views.html">Andy invited participants to share some of their views</a> ahead of the meeting, and on 19 February <a href="http://efoundations.typepad.com/efoundations/2010/02/in-the-clouds.html">he wrote about some of his own thoughts</a>.</p>
<p>Like Andy, I struggled somewhat to nail down a coherent set of thoughts about the issue of pushing today&#8217;s repositories into the Cloud. On one level, I wonder whether the vast majority of institutions with small (and relatively low traffic) repositories would see much of a tangible efficiency gain or cost saving by moving off an in-house computer to rent an equivalent <a class="zem_slink freebase/en/virtual_machine" title="Virtual machine" rel="wikipedia" href="http://en.wikipedia.org/wiki/Virtual_machine">Virtual Machine</a> from Amazon, Rackspace, or any of their competitors. If we&#8217;re talking about IT systems within a typical university, there are others (email, calendaring, pools of compute resource for research jobs, etc) that appear more immediately compelling for the shift Cloud-ward. Which is not to say that there isn&#8217;t a clear opportunity for someone trusted to step into this space and offer a <a class="zem_slink freebase/en/software_as_a_service" title="Software as a service" rel="wikipedia" href="http://en.wikipedia.org/wiki/Software_as_a_service">SaaS</a> repository to which institutions might affordably subscribe. Eduserv? Mimas? Edina? The British Library? The National Archives? Duraspace? Any could, and if we&#8217;re not ready for something more then at least one probably should.</p>
<p>However, a bolder reconsideration of what repositories <em>are</em> and what they&#8217;re <em>for</em> might very well lead to something interesting, sustainable, and perfectly suited for benefitting from Cloud Computing&#8217;s strengths.</p>
<p>Why does a paper have to be &#8216;deposited&#8217; in a repository? Why does a single paper with three authors from three institutions have to be deposited in three separate institutional repositories? Why does that same paper have to be deposited – separately – in the subject repository favoured by scholars in the relevant discipline? Why does the institution&#8217;s very reasonable desire to protect, preserve, promote and disseminate its excellence mean that it has to run systems in perpetuity that preserve and permit access? Why do we address the fundamentally different (perhaps even contradictory) problems of access and preservation in the same system? Why can&#8217;t the individual researcher easily assemble a view across their publication history, regardless of the institution within which they happened to reside as they wrote each paper? Why don&#8217;t the assemblages of papers reflect personal, professional and disciplinary relationships, alongside (or instead of) the contractual accident of employee-employer relationships? Why isn&#8217;t the wealth of metadata implicit to any publication (authors, subjects, dates, citations, and more) available and actionable, both inside the repository and far beyond it across the Web? Why isn&#8217;t there a tight and active association between the paper and the data from which its findings were derived (something for which <em><a href="http://intarch.ac.uk/">Internet Archaeology</a></em> was demonstrating utility a very long time ago)?</p>
<p>Scholarly papers principally comprise text, augmented by the occasional static image. They&#8217;re not big, and they don&#8217;t tend to change very fast. In many ways, they represent a fairly easy problem set with which to work. As more and more data becomes key to research in a growing number of subject areas, the problems are set to become far larger and far more difficult. For individual universities to even consider replicating the process by which they all ended up with their repositories of text surely seems madness in this data-rich environment. Even with levels of uptake as low as those seen in too many text repositories, the issues of data management, curation, access and dissemination are too great to be sensibly solved in the institutional machine room. Services like <a href="http://infochimps.org/">InfoChimps</a> and Amazon&#8217;s own <a href="http://aws.amazon.com/publicdatasets/">Public Data Sets</a> offering show some of the ways that we might begin to work with data at scale. Might we, for example, come to recognise as Amazon has that it&#8217;s actually cheaper and quicker to entrust large data sets to FedEx rather than transmit them over the Internet?</p>
<p>&#8216;The answer&#8217; might be some central service for the community, funded by JISC like the Arts &amp; Humanities Data Service (AHDS) of old. Or it might be something different, something nimbler, more responsive, more flexible to individual, institutional, and disciplinary requirements, and something more scalable to new disciplines; institutional support for and use of <em>existing</em> Cloud infrastructures extending far beyond UK Higher Education, aligned with a clear understanding of the separation between preservation and access.</p>
<p>I certainly don&#8217;t have all the answers, but I do believe that simply asking whether or not we should move existing repositories to the Cloud is to miss the point. Rather, we should ask what role the Cloud might play in addressing the business requirements to which the institutional repository was our initial – faltering – response. The answer might very well be &#8216;None,&#8217; but I doubt it.</p>
<p>I look forward to Tuesday&#8217;s discussion. I&#8217;m not going there to push my personal view that individual institutions frequently shouldn&#8217;t be building, running or populating their own repositories at all. I&#8217;m going there to facilitate the discussion those in the room want to have, and to learn from their experiences and their perspectives.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://scholarlykitchen.sspnet.org/2010/01/07/citation-advantage-for-mandated-open-access-articles/">Does a Citation Advantage Exist for Mandated Open Access Articles?</a> (scholarlykitchen.sspnet.org)</li>
<li class="zemanta-article-ul-li"><a href="http://hangingtogether.org/?p=770">Scholarly content and the cliff edge: the place of subject &#8216;repositories&#8217;</a> (hangingtogether.org)</li>
<li class="zemanta-article-ul-li"><a href="http://www.downes.ca/cgi-bin/page.cgi?post=51742">Scholarly Communications must be Scalable</a> (downes.ca)</li>
<li class="zemanta-article-ul-li"><a href="http://opendotdotdot.blogspot.com/2010/02/beyond-open-access-open-publishing.html">Beyond Open Access: Open Publishing</a> (opendotdotdot.blogspot.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.scienceblog.com/cms/57-college-presidents-declare-support-public-access-publicly-funded-research-us-25470.html" class="broken_link">57 college presidents declare support for public access to publicly funded research in the US</a> (scienceblog.com)</li>
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www.guardian.co.uk/education/2010/feb/11/academics-in-aspic-says-mandelson&amp;a=12898526&amp;rid=f65ff066-66fd-42d9-bc76-113bd6066317&amp;e=5236f562a8baffa164e8623f52cd7d44">Mandelson says academics are &#8216;set in aspic&#8217;</a> (guardian.co.uk)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/f65ff066-66fd-42d9-bc76-113bd6066317/"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=f65ff066-66fd-42d9-bc76-113bd6066317" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-info pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
<div class="al2fb_like_button"><div id="fb-root"></div><script type="text/javascript">
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) return;
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=133647763430045";
  fjs.parentNode.insertBefore(js, fjs);
}(document, "script", "facebook-jssdk"));
</script>
<fb:like href="http://cloudofdata.com/2010/02/repositories-in-the-cloud-why-on-earth-not/" layout="standard" show_faces="true" width="450" action="like" font="arial" colorscheme="light" ref="AL2FB"></fb:like></div>]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2010/02/repositories-in-the-cloud-why-on-earth-not/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

