<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
		xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
>

<channel>
	<title>Paul Miller - The Cloud of Data &#187; Amazon Web Services</title>
	<atom:link href="http://cloudofdata.com/tag/amazon-web-services/feed/" rel="self" type="application/rss+xml" />
	<link>http://cloudofdata.com</link>
	<description>Linked Data, Cloud Computing, Semantic Web, SaaS, PaaS, more</description>
	<lastBuildDate>Fri, 10 Feb 2012 10:46:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<copyright>Licensed under the Creative Commons Attribution License, version 3.0 http://creativecommons.org/licenses/by/3.0/</copyright>
	<managingEditor>paul.miller@cloudofdata.com (Paul Miller)</managingEditor>
	<webMaster>paul.miller@cloudofdata.com (Paul Miller)</webMaster>
	<ttl>1440</ttl>
	<image>
		<url>http://cloudofdata.com/logo144x144.jpg</url>
		<title>Paul Miller - The Cloud of Data</title>
		<link>http://cloudofdata.com</link>
		<width>144</width>
		<height>144</height>
	</image>
	<itunes:subtitle>conversations with the executives shaping Cloud Computing and the Semantic Web.</itunes:subtitle>
	<itunes:summary>Linked Data, Cloud Computing, Semantic Web, SaaS, PaaS, more</itunes:summary>
	<itunes:keywords>Cloud Computing, Semantic Web, Linked Data, Open Data, SaaS, PaaS</itunes:keywords>
	<itunes:category text="Technology" />
	<itunes:category text="Business" />
	<itunes:author>Paul Miller</itunes:author>
	<itunes:owner>
		<itunes:name>Paul Miller</itunes:name>
		<itunes:email>paul.miller@cloudofdata.com</itunes:email>
	</itunes:owner>
	<itunes:block>no</itunes:block>
	<itunes:explicit>no</itunes:explicit>
	<itunes:image href="http://cloudofdata.com/logo300x300.jpg" />
		<item>
		<title>Strata Conference 2011, Day 2 Keynotes</title>
		<link>http://cloudofdata.com/2011/02/strata-conference-2011-day-2-keynotes/</link>
		<comments>http://cloudofdata.com/2011/02/strata-conference-2011-day-2-keynotes/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 18:13:07 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[BigData]]></category>
		<category><![CDATA[edd dumbill]]></category>
		<category><![CDATA[Hilary Mason]]></category>
		<category><![CDATA[Mark Madsen]]></category>
		<category><![CDATA[Thomson Reuters]]></category>
		<category><![CDATA[Werner Vogels]]></category>
		<category><![CDATA[Windows Azure DataMarket]]></category>
		<category><![CDATA[Zane Adam]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1490</guid>
		<description><![CDATA[Day 2, and after yesterday&#8217;s tutorials the conference is really getting going. Here&#8217;s a stream of consciousness from the morning&#8217;s keynotes at this sold-out event. Conference chair Edd Dumbill is introducing things, talking about William Smith&#8216;s nineteenth century map of geological strata in the British Isles, the rise of industrialisation, and the move to towns. [...]]]></description>
			<content:encoded><![CDATA[<p>Day 2, and after yesterday&#8217;s tutorials the conference is really getting going.</p>
<p>Here&#8217;s a stream of consciousness from the morning&#8217;s keynotes at this sold-out event.</p>
<p><span id="more-1490"></span></p>
<p>Conference chair <a class="zem_slink" title="Edd Dumbill" rel="homepage" href="http://twitter.com/edd">Edd Dumbill</a> is introducing things, talking about <a href="http://en.wikipedia.org/wiki/William_Smith_(geologist)">William Smith</a>&#8216;s nineteenth century map of geological strata in the British Isles, the rise of industrialisation, and the move to towns. Edd suggests that a similar set of inflections are happening today in the world of data; &#8216;the start of something big.&#8217;</p>
<p>&#8220;In the same way that the industrial revolution changed what it meant to be human, the data revolution is changing what it means to be alive.&#8221;</p>
<p>The first of this morning&#8217;s keynotes; Hilary Mason from link shortener <a class="zem_slink" title="bit.ly" rel="homepage" href="http://bit.ly">bit.ly</a>.</p>
<p>Data and the people who work with data; &#8220;The state of the data union is strong.&#8221; Data scientists have an identity &#8211; a place to rally around &#8211; with Strata.</p>
<p>We have accomplished much, begging, borrowing and stealing from lots of domains. We have the tools. We have the capacity to spin up infrastructure in the Cloud. We have the algorithms to explore data, and to learn from it.</p>
<p>The most important thing we have now that we didn&#8217;t have before&#8230; is momentum. People are paying attention.</p>
<p>There are still challenges though. Timeliness of data is an issue, especially in real-time. We need to develop systems that can do robust analysis against a moving stream of data. We need to be able to store data in ways that let us operate on it in real-time. Hadoop&#8230; amazing &#8216;because I can run a query and get the result back before I forget why I submitted the query in the first place.&#8217; We need training. We need imagination, not more ad optimisation networks. We have a real opportunity to do something better.</p>
<p>Opportunities (expressed in context of bit.ly); Bit.ly gets lots of data from people shrinking web links. They learn a lot about people; what they like, what they want, what they&#8217;re doing. bit.ly also gets rich segmentation data; location, context, etc. bit.ly sees global data, for example clicks on bit.ly links from Egyptian domains.</p>
<p>Now that we have all this data, it offers a window on to the world. What can we do with it? Make the world a better place? What would <em>you</em> do with all of this data?</p>
<p>Next up, James Powell from <a class="zem_slink" title="Reuters" rel="homepage" href="http://reuters.com">Thomson Reuters</a> to talk about privacy and behavioural data in B2B contexts. Thomson Reuters gathers large amounts of global data, and filters it for customers. Time and context key; 700,000 updates a second through financial systems, 5,000,000 documents per day served through <a class="zem_slink" title="OpenCalais" rel="homepage" href="http://www.opencalais.com">Open Calais</a>, etc. Thomson Reuters interested in ways to filter information better.</p>
<p>Need to think about B2B implications of behavioural data, especially as we sell/exchange increasing volumes of data with partners. Consumers <em>reasonably</em> comfortable with giving up some personal data in return for a &#8216;better&#8217; product (Amazon recommendations, etc), that probably doesn&#8217;t scale to the enterprise. For example, Open Calais customers submitting large numbers of dummy queries to obfuscate what they&#8217;re really looking for&#8230;</p>
<p>Key problem that needs to be addressed is ambiguity; many systems in this space still rely upon implicit assumptions, whilst the enterprise is used to explicit contracts. Tension &#8211; or recipe for disaster?</p>
<p>Keys to success &#8211; need to treat behavioural data differently/better, and avoid the mistake of simply continuing consumer trends.</p>
<p>Next, Mark Madsen from Third Nature, talking about &#8216;the Mythology of Big Data.&#8217;</p>
<p>Lots of assumptions underlying conversations about Big Data. &#8216;Every technology carries within itself the seeds of its own destruction.&#8217; Code is a commodity; things that a lot of people have built profitable careers around have started to move down-market. Libraries, packages, etc make it easier for third parties to stitch things together rather than start from scratch.</p>
<p>The central myth underlying Big Data that&#8217;s erupted over the past 18-24 months; the myth of the gold rush. <em>Everyone</em> wants to be a data scientist. But just like the gold rush, success takes capital. It takes corporate engagement, and infrastructure. The &#8216;myth tells us you can go it alone&#8230; and you can&#8217;t.&#8217;</p>
<p>1950s-60s &#8211; data as product. 1970s-80s &#8211; data as byproduct. 1990s-2000s &#8211; data as assset. 2010- data as substrate (data as the basis for competition). &#8216;The real data revolution is in business structure and processes and how the use information.&#8217;</p>
<p>Using Big Data; the point isn&#8217;t necessarily about &#8216;Big.&#8217; Much valuable data inside an enterprise is only GB or TB in size. We get tied up in &#8216;big&#8217; way too much. It&#8217;s not really about data either; it&#8217;s about <em>applying</em> data. Without an application, it&#8217;s trivia.</p>
<p>Next, Amazon CTO <a class="zem_slink" title="Werner Vogels" rel="homepage" href="http://www.allthingsdistributed.com">Werner Vogels</a>. An overview of how <a class="zem_slink" title="Amazon Web Services" rel="homepage" href="http://aws.amazon.com/">Amazon Web Services</a> look at the data processing being done on their infrastructure by customers&#8230; Government, Finance, COmmerce, Pharma&#8230; all making use of tools. Plugging <em>The Fourth Paradigm</em> book from Microsoft Research (which is very good).</p>
<p>Vogels &#8211; big data is big data when your data sets become so large that you have to innovate to manage them. Customers view big data as collection and curation of data for competitive advantage&#8230; with the presumption that bigger is better. For recommendations etc, that is probably true.</p>
<p>There are a number of categories of data, where quality is far more important than quantity.</p>
<p>In the past, data tended to be collected to answer questions. Now, trend to collecting as much as possible before developing the questions you want answered, and the algorithms you will need to use for the analysis.</p>
<p>To do this, you should not be worried by data storage, data processing, etc &#8211; which is why you should embrace the scalable Cloud.</p>
<p>Data analysis pipeline; collect &#8211; store &#8211; organise &#8211; analyse &#8211; share.</p>
<p><a class="zem_slink" title="AWS Import/Export" rel="homepage" href="http://aws.amazon.com/importexport">AWS Import/Export</a> &#8211; &#8220;you shouldn&#8217;t underestimate the bandwidth of a FedEx box.&#8221; Indeed.</p>
<p>&#8220;This is Day 1 for Cloud infrastructure.&#8217;</p>
<p>Next up, Microsoft&#8217;s Zane Adam talking about data marketplaces. Windows Azure <a href="https://datamarket.azure.com/">DataMarket</a>; <a class="zem_slink" title="Data as a Service" rel="wikipedia" href="http://en.wikipedia.org/wiki/Data_as_a_Service">Data as a Service</a>, free or at cost. One stop shop for data (one of many one stop shops, unfortunately!) DataMarket <em>is</em> interesting&#8230; but this is far too much of a product pitch for the keynote track.</p>
<p>90 days since launch &#8211; 5,000+ subscriptions, 3 Million transactions to date. Given Microsoft&#8217;s presence and reach, aren&#8217;t those figures a bit low?</p>
<p>&#8220;There&#8217;s a lot of data out there&#8230; but it&#8217;s not all good.&#8221; A Data Marketplace gives customers access to good data. Does it? Do Microsoft vet every fact in a submitted data set? What would a single bad data set do to the marketplace&#8217;s brand recognition?</p>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=b94efb47-86af-442d-a0df-9515cd0e3707" alt="" /><span class="zem-script pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2011/02/strata-conference-2011-day-2-keynotes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In a world of niche Clouds, how do you define a useful niche?</title>
		<link>http://cloudofdata.com/2010/12/in-a-world-of-niche-clouds-how-do-you-define-a-useful-niche/</link>
		<comments>http://cloudofdata.com/2010/12/in-a-world-of-niche-clouds-how-do-you-define-a-useful-niche/#comments</comments>
		<pubDate>Tue, 14 Dec 2010 13:08:20 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Enterprise Computing]]></category>
		<category><![CDATA[IaaS]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Andy Powell]]></category>
		<category><![CDATA[Data center]]></category>
		<category><![CDATA[Eduserv]]></category>
		<category><![CDATA[FleSSR]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[Joint Information Systems Committee]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[VMware]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=1393</guid>
		<description><![CDATA[There are a couple of interesting posts on the blog of the UK&#8217;s FLESSR project, detailing their efforts to work out how feasible it might be to offer a new Cloud service to universities. More on that in a moment. I don&#8217;t think I&#8217;ve ever really been convinced by the argument that everything will end [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://geekandpoke.typepad.com/geekandpoke/2008/05/simply-explaine.html" target="_blank"><img class="alignright size-medium wp-image-1396" style="margin: 0px; border: 0px initial initial;" title="Simply Explained - Cloud Computing" src="http://cloudofdata.com/wp-content/uploads/2010/12/cloud-explained-300x214.jpg" alt="" width="300" height="214" /></a>There are a couple of interesting posts on the blog of the UK&#8217;s FLESSR project, detailing their efforts to work out how feasible it might be to offer a new Cloud service to universities. More on that in a moment.</p>
<p>I don&#8217;t think I&#8217;ve ever really been convinced by the argument that <em>everything</em> will end up in the data centres of <a class="zem_slink" title="Amazon EC2" rel="homepage" href="http://aws.amazon.com/ec2/">Amazon</a>.</p>
<p>The straightforward provision of commodity Cloud Computing is an important &#8211; and growing &#8211; area, and one that will continue to expand as interfaces become simpler, FUD is challenged, and prices maintain their relentless march towards the bottom. <em>Everyone</em> has <em>something</em> they could usefully, sensibly, and cost-effectively run in a commodity Cloud such as those offered by <a href="http://aws.amazon.com/">Amazon</a>, <a class="zem_slink" title="Rackspace" rel="homepage" href="http://www.rackspace.com">Rackspace</a>, <a href="http://www.flexiant.com/">Flexiant</a>, and others. In <em>this</em> space, basic stability, security and reliability combine with a compelling &#8211; and diminishing &#8211; pricing proposition to create commodity services targeted squarely to lowest common denominator functionality. Here, market forces may (inevitably?) lead to an eventual reduction in the number of providers. Cost, although not the only consideration, is both important and compelling. Although markets like competition, there may even be a single winner here, one day.</p>
<p>Layered all around the basic, routine, grunt-work computation that these commodity public clouds handle so well, many organisations find themselves having to cope with a wide range of <em>other</em> use cases and data sets. Some require specialist hardware (like the <a class="zem_slink" title="Graphics processing unit" rel="wikipedia" href="http://en.wikipedia.org/wiki/Graphics_processing_unit">GPUs</a> that Amazon has <a href="http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html">recently begun selling access to</a>). Some demand particular regulatory and legislative hoops to be jumped through. Some have quirky requirements around latency in data transfer or speed of in-CPU processing. Some have <em>lots</em> of data, and issues with regard to getting the stuff from one location to another with a sensible balance between transfer cost and time.</p>
<p>All of these are certainly capable of being addressed in the Cloud, but the economics and the business rationale begin to shift. For the data owner, cost may no longer be quite so significant a factor. Reliability may matter more, or speed, or the audit trail. For the Cloud provider, these requirements no longer look like the lowest common denominator. It&#8217;s not cost-effective to provide these capabilities to <em>everyone</em> and still keep the price low. It becomes more sensible to segment, to divide, and to create bespoke offerings of various kinds. Some of these services require such specific things in terms of network topology, physical building layout, and staff expertise that it may even become counter-productive to have these services in the same building as the commodity Cloud. Here, there&#8217;s plenty of room for new entrants, plenty of scope for competition, and plenty of opportunity to differentiate in terms of price, location, support, and a host of other factors. This segment of the Cloud is only just getting started.</p>
<p>In these contexts, we see compelling arguments made for on-premise private clouds, off-premise private clouds, hybrid clouds, community clouds and the rest. Some of the arguments made in favour of private and hybrid certainly are part of the FUD we see in this space, but beneath the noise, the security scares, and the vested interests of SysAdmins and sellers of data centre components, there lies a grain of truth. Not everything is most sensibly run on a cheap VM, rented from Amazon (or Rackspace, or whoever) with your credit card, and physically located half way round the planet.</p>
<p>Unfortunately, it can be difficult to make sensible decisions about which type of cloud works best in each situation, and large swathes of the market are doing everything in their power to add to the confusion.</p>
<p>Having accepted that the basic offering from a public cloud provider is not the solution for my particular requirements, where do I turn next?</p>
<p>Do I listen to the (convincing) pitch from a vendor of &#8216;community cloud&#8217; solutions for my domain? If I&#8217;m in Healthcare, they come with HIPAA and European Data Protection Directive, and all sorts of other accreditations. For dealing with sensitive patient data, this may be just what I need&#8230; but does the wily salesman <em>also</em> persuade me to run staff email and the hospital volleyball club website on this over-specified (and expensive) infrastructure?</p>
<p>Do I listen to the (convincing) pitch from a vendor of virtualisation software? If I&#8217;ve got a reasonably sized data centre with some life left in it, I may see the value of virtualising all of that expensive hardware, and running current applications in house more efficiently. But instead of gradually reducing my in-house costs, do I continue to add more machines as current ones reach end of life, or as new requirements come along?</p>
<p>Do I listen to the (convincing) pitch from my co-location facility, which happily sells me a &#8216;private cloud&#8217; that may fail to deliver some of the economies of scale so central to the main Cloud proposition?</p>
<p>Do I listen to the horror stories, stick my head in the sand, and simply keep ordering servers until every single one of my competitors undercuts my costs and I go out of business?</p>
<p>These, and more, are certainly possible. But let&#8217;s return to that UK project I mentioned right at the start.</p>
<p>Flexible Services for the Support of Research (<a href="http://flessr.blogspot.com/">FleSSR</a>) is</p>
<blockquote><p>&#8220;a new cloud pilot project looking at utilising hybrid private-public IaaS cloud infrastructure to provide computational and data services to the academic research community. The project is a collaboration between the Oxford e-Research Center, IT Service @ University or Reading, e-Science Centre @ STFC, Eduserv, EoverI, Eucalyptus INC and Canonical Ltd.&#8221;</p></blockquote>
<p>The ten month project is funded by the Joint Information Systems Committee (<a href="http://www.jisc.ac.uk">JISC</a>), an organisation that supports the innovative use of IT across UK universities.</p>
<p>Now, to a degree, the project&#8217;s mindset must be influenced by its partners. IT staff at Reading and STFC are incumbents with turf to protect (or new vistas to discover, map, and claim). Eduserv has a new data centre that they&#8217;d like to fill with willing clients. It would be easy to be cynical, but knowing some of the people involved, I see no real reason to be. It is perfectly reasonable to suggest that a &#8216;community&#8217; the size of UK Higher Education would realise value in replicating less (not nothing) at every university campus across the country, and bringing much of that together in some sort of Cloud. That Cloud might use public infrastructure, or it might be served up from an organisation such as Eduserv, which is known to the community, aware of the community&#8217;s requirements, quirks and foibles, and (importantly) not-for profit (and therefore cheaper?).</p>
<p>Personally, I&#8217;d always rather presumed that an organisation like Eduserv (or JISC itself) would act on behalf of the community to procure a competitive price on access to the resources of Amazon, Rackspace, or one of the others. I&#8217;m not convinced that <em>most</em> UK research computation needs any sort of special treatment that couldn&#8217;t be met from Amazon&#8217;s Dublin data centre&#8230; unless the community itself can somehow beat &#8211; and continue to beat &#8211; Amazon on price. Somewhat surprisingly, that&#8217;s exactly what some calculations in <a href="http://flessr.blogspot.com/2010/12/costs-of-storage-in-cloud.html">two</a> <a href="http://flessr.blogspot.com/2010/12/costs-of-building-storage-for-cloud.html">posts</a> by Eduserv&#8217;s Andy Powell suggest could happen. By including network costs and other charges over and above the basic storage cost, Andy finds Amazon, Rackspace and Dropbox to be more expensive than anticipated, and posits that Eduserv (connected to every UK university free of charge via JISC&#8217;s high speed <a href="http://www.ja.net/">JANET</a> service, and constrained in the ways it can generate profit from services sold to universities by its charitable status) might actually work out cheaper.</p>
<p>There&#8217;s a lot of work to do in terms of fleshing out the assumptions behind some of Andy&#8217;s figures, but the whole industry certainly benefits when people conduct exercises like these out in the open, for all to see. If Andy has made mistakes, the vendors should be quick to jump in and correct them. If his assumptions miss the mark, public debate can redress the balance.</p>
<p>The Cloud is not all about price. But more transparency around the true cost of computing in the Cloud &#8211; and in your data centre &#8211; means that we can all make more informed decisions.</p>
<p>Thanks for sharing, Andy &#8211; and hopefully readers will be willing and able to look over your calculations and share their own views.</p>
<p><strong>Note</strong>: <em>this post was conceived and written in the United Kingdom. By reading this post you agree to comply with UK usage, and will henceforth pronounce the word &#8216;niche&#8217; from the title as &#8216;neesh,&#8217; not &#8216;nitch.&#8217; Or maybe not.</em></p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://www.zdnet.com/blog/btl/rackspace-launches-managed-cloud-services/42436">Rackspace launches managed cloud services</a> (zdnet.com)</li>
<li class="zemanta-article-ul-li"><a href="http://venturebeat.com/2010/12/06/cloud-computing-public-private-hybrid-demistified/">Are hybrid clouds the path to cloud-computing nirvana?</a> (venturebeat.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.rackspacecloud.com/blog/2010/12/14/test/" class="broken_link">We&#8217;ll Take Care of Your Cloud, While You Manage Your Business</a> (rackspacecloud.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.cloudave.com/8675/trust-is-key-for-cloud-success-and-what-can-we-do-about-it/">Trust Is Key For Cloud Success And What Can We Do About It?</a> (cloudave.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/pixy.gif?x-id=f19f2112-f391-4e6b-b351-c623cae0cabf" alt="" /><span class="zem-script pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2010/12/in-a-world-of-niche-clouds-how-do-you-define-a-useful-niche/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Repositories in the Cloud? Why on earth not?!</title>
		<link>http://cloudofdata.com/2010/02/repositories-in-the-cloud-why-on-earth-not/</link>
		<comments>http://cloudofdata.com/2010/02/repositories-in-the-cloud-why-on-earth-not/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 18:05:42 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Academic publishing]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Andy Powell]]></category>
		<category><![CDATA[Archives]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[Colleges and Universities]]></category>
		<category><![CDATA[Eduserv]]></category>
		<category><![CDATA[Higher Education]]></category>
		<category><![CDATA[infochimps]]></category>
		<category><![CDATA[Institutional repository]]></category>
		<category><![CDATA[JISC]]></category>
		<category><![CDATA[Open access]]></category>
		<category><![CDATA[Panton Principles]]></category>
		<category><![CDATA[repcloud]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software as a service]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=932</guid>
		<description><![CDATA[To be honest, I&#8217;ve never fully understood Higher Education&#8217;s penchant for building &#8216;institutional repositories.&#8217; These frequently under-populated aggregations of academic papers produced by &#8216;research active&#8217; employees of a particular university appear aligned almost exclusively to vaguely expressed institutional imperatives, and seem largely unrelated to either the selfish aspirations of the contributing authors or the tangible [...]]]></description>
			<content:encoded><![CDATA[<p>To be honest, I&#8217;ve never fully understood Higher Education&#8217;s penchant for building &#8216;<a class="zem_slink freebase/en/institutional_repository" title="Institutional repository" rel="wikipedia" href="http://en.wikipedia.org/wiki/Institutional_repository">institutional repositories</a>.&#8217; These frequently under-populated aggregations of academic papers produced by &#8216;research active&#8217; employees of a particular university appear aligned almost exclusively to vaguely expressed institutional imperatives, and seem largely unrelated to either the selfish aspirations of the contributing authors or the tangible relationships they painstakingly construct with others across their chosen discipline. The &#8216;repository&#8217; all too often appears a bureaucratic solution to a problem that the supposed beneficiaries do not recognise; a technological aberration that sits outside the conversational flow of the Web to which it is only tenuously attached.</p>
<p>Furthermore, &#8216;<a class="zem_slink freebase/en/open_access" title="Open access (publishing)" rel="wikipedia" href="http://en.wikipedia.org/wiki/Open_access_%28publishing%29">Open Access</a>&#8216; and &#8216;Repository&#8217; typically go hand in hand. If you support Open Access you need a repository, and if you question the role of repositories you&#8217;re in the pocket of evil publishers who want to lock up everything ever written and lease reading rights back to the employers of those who wrote the stuff in the first place.</p>
<p>Nonsense.</p>
<p>Open Access is an important component of today&#8217;s scholarly ecosystem. It&#8217;s not the only answer, and it&#8217;s not perfect, but it <em>does</em> have a significant part to play. Institutions have a role in preserving, disseminating and exploiting the work of their employees, but these are very different tasks that may benefit from different solutions. In too many cases, the repository is by default seen as a preservation mechanism <em>and</em> a dissemination vehicle, and as such it may fail to cost-effectively achieve either aim.</p>
<p>There are some large, well known, and research-intensive institutions where it might be possible to make a compelling argument for projecting a strong institutional image around a single &#8216;home&#8217; for all of that research output. Never mind, for a moment, that so much research today is the result of inter-institutional collaboration, or that the eminent researcher might wish to take &#8216;their&#8217; research publications with them as they move from Oxford to Harvard to York during their glittering career.</p>
<p>Alongside those institutions sit a plethora of others where research of equal quality is also being conducted; there just, maybe, isn&#8217;t quite as much of it. Bombarded by &#8216;advice&#8217; and funding, and desperate to keep up with the <a class="zem_slink freebase/en/russell_group" title="Russell Group" rel="wikipedia" href="http://en.wikipedia.org/wiki/Russell_Group">Russell Group</a>, ever-more institutions blindly join the repository cult and wonder why their new toys do not fill to overflowing with the jewels of scholarly erudition.</p>
<p>As research becomes increasingly data-rich, the whole cycle looks set to repeat. The recently released <a href="http://pantonprinciples.org/">Panton Principles</a> for <a class="zem_slink freebase/en/open_data" title="Open Data" rel="wikipedia" href="http://en.wikipedia.org/wiki/Open_Data">Open Data</a> in Science are to be welcomed, but I&#8217;ll bet the institutional response will all too often be the commissioning of a &#8216;data repository&#8217; to sit alongside the &#8216;publication repository&#8217; they already don&#8217;t use.</p>
<p>All of which is a rather long-winded way of introducing the fact that Eduserv&#8217;s <a class="zem_slink" title="Andy Powell" rel="twitter" href="http://twitter.com/andypowe11">Andy Powell</a> has asked me to facilitate a breakout afternoon on &#8216;Policy Issues&#8217; at the <a href="http://www.eduserv.org.uk/events/repcloud" class="broken_link">Repositories in the Cloud</a> event <a href="http://www.eduserv.org.uk/research">Eduserv</a> and <a class="zem_slink freebase/en/joint_information_systems_committee" title="Joint Information Systems Committee" rel="wikipedia" href="http://en.wikipedia.org/wiki/Joint_Information_Systems_Committee">JISC</a> are holding in London on Tuesday.</p>
<blockquote><p>&#8220;This free event, organised jointly by Eduserv and the JISC, will bring together software developers, repository managers, service providers, funding and advisory bodies to discuss the potential policy and technical issues associated with <strong>cloud computing</strong> and the delivery of <strong>repository services</strong> in UK HEIs.&#8221;</p></blockquote>
<p>In a post on 11 February, <a href="http://efoundations.typepad.com/efoundations/2010/02/repositories-and-the-cloud-tell-us-your-views.html">Andy invited participants to share some of their views</a> ahead of the meeting, and on 19 February <a href="http://efoundations.typepad.com/efoundations/2010/02/in-the-clouds.html">he wrote about some of his own thoughts</a>.</p>
<p>Like Andy, I struggled somewhat to nail down a coherent set of thoughts about the issue of pushing today&#8217;s repositories into the Cloud. On one level, I wonder whether the vast majority of institutions with small (and relatively low traffic) repositories would see much of a tangible efficiency gain or cost saving by moving off an in-house computer to rent an equivalent <a class="zem_slink freebase/en/virtual_machine" title="Virtual machine" rel="wikipedia" href="http://en.wikipedia.org/wiki/Virtual_machine">Virtual Machine</a> from Amazon, Rackspace, or any of their competitors. If we&#8217;re talking about IT systems within a typical university, there are others (email, calendaring, pools of compute resource for research jobs, etc) that appear more immediately compelling for the shift Cloud-ward. Which is not to say that there isn&#8217;t a clear opportunity for someone trusted to step into this space and offer a <a class="zem_slink freebase/en/software_as_a_service" title="Software as a service" rel="wikipedia" href="http://en.wikipedia.org/wiki/Software_as_a_service">SaaS</a> repository to which institutions might affordably subscribe. Eduserv? Mimas? Edina? The British Library? The National Archives? Duraspace? Any could, and if we&#8217;re not ready for something more then at least one probably should.</p>
<p>However, a bolder reconsideration of what repositories <em>are</em> and what they&#8217;re <em>for</em> might very well lead to something interesting, sustainable, and perfectly suited for benefitting from Cloud Computing&#8217;s strengths.</p>
<p>Why does a paper have to be &#8216;deposited&#8217; in a repository? Why does a single paper with three authors from three institutions have to be deposited in three separate institutional repositories? Why does that same paper have to be deposited – separately – in the subject repository favoured by scholars in the relevant discipline? Why does the institution&#8217;s very reasonable desire to protect, preserve, promote and disseminate its excellence mean that it has to run systems in perpetuity that preserve and permit access? Why do we address the fundamentally different (perhaps even contradictory) problems of access and preservation in the same system? Why can&#8217;t the individual researcher easily assemble a view across their publication history, regardless of the institution within which they happened to reside as they wrote each paper? Why don&#8217;t the assemblages of papers reflect personal, professional and disciplinary relationships, alongside (or instead of) the contractual accident of employee-employer relationships? Why isn&#8217;t the wealth of metadata implicit to any publication (authors, subjects, dates, citations, and more) available and actionable, both inside the repository and far beyond it across the Web? Why isn&#8217;t there a tight and active association between the paper and the data from which its findings were derived (something for which <em><a href="http://intarch.ac.uk/">Internet Archaeology</a></em> was demonstrating utility a very long time ago)?</p>
<p>Scholarly papers principally comprise text, augmented by the occasional static image. They&#8217;re not big, and they don&#8217;t tend to change very fast. In many ways, they represent a fairly easy problem set with which to work. As more and more data becomes key to research in a growing number of subject areas, the problems are set to become far larger and far more difficult. For individual universities to even consider replicating the process by which they all ended up with their repositories of text surely seems madness in this data-rich environment. Even with levels of uptake as low as those seen in too many text repositories, the issues of data management, curation, access and dissemination are too great to be sensibly solved in the institutional machine room. Services like <a href="http://infochimps.org/">InfoChimps</a> and Amazon&#8217;s own <a href="http://aws.amazon.com/publicdatasets/">Public Data Sets</a> offering show some of the ways that we might begin to work with data at scale. Might we, for example, come to recognise as Amazon has that it&#8217;s actually cheaper and quicker to entrust large data sets to FedEx rather than transmit them over the Internet?</p>
<p>&#8216;The answer&#8217; might be some central service for the community, funded by JISC like the Arts &amp; Humanities Data Service (AHDS) of old. Or it might be something different, something nimbler, more responsive, more flexible to individual, institutional, and disciplinary requirements, and something more scalable to new disciplines; institutional support for and use of <em>existing</em> Cloud infrastructures extending far beyond UK Higher Education, aligned with a clear understanding of the separation between preservation and access.</p>
<p>I certainly don&#8217;t have all the answers, but I do believe that simply asking whether or not we should move existing repositories to the Cloud is to miss the point. Rather, we should ask what role the Cloud might play in addressing the business requirements to which the institutional repository was our initial – faltering – response. The answer might very well be &#8216;None,&#8217; but I doubt it.</p>
<p>I look forward to Tuesday&#8217;s discussion. I&#8217;m not going there to push my personal view that individual institutions frequently shouldn&#8217;t be building, running or populating their own repositories at all. I&#8217;m going there to facilitate the discussion those in the room want to have, and to learn from their experiences and their perspectives.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://scholarlykitchen.sspnet.org/2010/01/07/citation-advantage-for-mandated-open-access-articles/">Does a Citation Advantage Exist for Mandated Open Access Articles?</a> (scholarlykitchen.sspnet.org)</li>
<li class="zemanta-article-ul-li"><a href="http://hangingtogether.org/?p=770">Scholarly content and the cliff edge: the place of subject &#8216;repositories&#8217;</a> (hangingtogether.org)</li>
<li class="zemanta-article-ul-li"><a href="http://www.downes.ca/cgi-bin/page.cgi?post=51742">Scholarly Communications must be Scalable</a> (downes.ca)</li>
<li class="zemanta-article-ul-li"><a href="http://opendotdotdot.blogspot.com/2010/02/beyond-open-access-open-publishing.html">Beyond Open Access: Open Publishing</a> (opendotdotdot.blogspot.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.scienceblog.com/cms/57-college-presidents-declare-support-public-access-publicly-funded-research-us-25470.html" class="broken_link">57 college presidents declare support for public access to publicly funded research in the US</a> (scienceblog.com)</li>
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www.guardian.co.uk/education/2010/feb/11/academics-in-aspic-says-mandelson&amp;a=12898526&amp;rid=f65ff066-66fd-42d9-bc76-113bd6066317&amp;e=5236f562a8baffa164e8623f52cd7d44">Mandelson says academics are &#8216;set in aspic&#8217;</a> (guardian.co.uk)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Reblog this post [with Zemanta]" href="http://reblog.zemanta.com/zemified/f65ff066-66fd-42d9-bc76-113bd6066317/"><img class="zemanta-pixie-img" style="border: none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=f65ff066-66fd-42d9-bc76-113bd6066317" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-info pretty-attribution"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2010/02/repositories-in-the-cloud-why-on-earth-not/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Amazon tethers balloons for now; attention turns to crunching data in the Cloud with Elastic MapReduce web service</title>
		<link>http://cloudofdata.com/2009/04/amazon-tethers-balloons-for-now-attention-turns-to-crunching-data-in-the-cloud-with-elastic-mapreduce-web-service/</link>
		<comments>http://cloudofdata.com/2009/04/amazon-tethers-balloons-for-now-attention-turns-to-crunching-data-in-the-cloud-with-elastic-mapreduce-web-service/#comments</comments>
		<pubDate>Thu, 02 Apr 2009 10:58:39 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Administrivia]]></category>
		<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Enterprise Computing]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[SaaS]]></category>
		<category><![CDATA[Web 2.0]]></category>
		<category><![CDATA[Web 3.0]]></category>
		<category><![CDATA[Amazon Elastic Compute Cloud]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Cloudera]]></category>
		<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Elastic Computing]]></category>
		<category><![CDATA[Elasticity]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Jeff Barr]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Web service]]></category>
		<category><![CDATA[Web Services]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=494</guid>
		<description><![CDATA[Image via Wikipedia Amid mounting international concern that the guidance lasers aboard Jeff Bezos&#8216; new Floating Amazon Cloud Environment would interfere with Rudolph&#8216;s sense of direction, sources close to the Amazon Web Services team tell me that they&#8217;ve been forced to alter priorities and switch attention to an early release of the next product on [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright" style="width: 212px;">
<dt class="wp-caption-dt"><a href="http://commons.wikipedia.org/wiki/Image:DIN_4844-2_Warnung_vor_Laserstrahl_D-W010.svg"><img title="Warning for laserbeam, symbol D-W010 according..." src="http://upload.wikimedia.org/wikipedia/commons/thumb/1/16/DIN_4844-2_Warnung_vor_Laserstrahl_D-W010.svg/202px-DIN_4844-2_Warnung_vor_Laserstrahl_D-W010.svg.png" alt="Warning for laserbeam, symbol D-W010 according..." width="202" height="177" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://commons.wikipedia.org/wiki/Image:DIN_4844-2_Warnung_vor_Laserstrahl_D-W010.svg">Wikipedia</a></dd>
</dl>
</div>
</div>
<p><em>Amid mounting international concern that the guidance lasers aboard <a class="zem_slink" title="Jeff Bezos" rel="crunchbase" href="http://www.crunchbase.com/person/jeff-bezos">Jeff Bezos</a>&#8216; new <a href="http://aws.typepad.com/aws/2009/03/up-up-and-away-cloud-computing-reaches-for-the-sky.html">Floating Amazon Cloud Environment</a> would interfere with <a href="http://en.wikipedia.org/wiki/Rudolph_the_Red-Nosed_Reindeer">Rudolph</a>&#8216;s sense of direction, sources close to the <span class="zem_slink">Amazon</span> Web Services team tell me that they&#8217;ve been forced to alter priorities and switch attention to an early release of the next product on their roadmap.</em></p>
<p>Today sees the release of <a href="http://aws.amazon.com/">Amazon</a>&#8216;s latest web service; the <a href="http://en.wikipedia.org/wiki/Hadoop">Hadoop</a>-powered <a href="http://aws.amazon.com/elasticmapreduce/">Elastic MapReduce</a>;</p>
<blockquote><p>&#8220;Using Amazon Elastic MapReduce, you can instantly provision as much or as little capacity as you like to perform data-intensive tasks for applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research. Amazon Elastic MapReduce lets you focus on crunching or analyzing your data without having to worry about time-consuming set-up, management or tuning of Hadoop clusters or the compute capacity upon which they sit.&#8221;</p></blockquote>
<p>The company&#8217;s <a href="http://phx.corporate-ir.net/phoenix.zhtml?c=176060&amp;p=irol-newsArticle&amp;ID=1272550&amp;highlight=">press release</a> quotes VP for Product Management &amp; Developer Relations, Adam Selipsky, who notes;</p>
<blockquote><p>&#8220;<span class="ccbnTxt">Some researchers and developers already run Hadoop on Amazon EC2, and       many of them have asked for even simpler tools for large-scale data       analysis. Amazon Elastic MapReduce       makes crunching in the cloud much easier as it dramatically reduces the       time, effort, complexity and cost of performing data-intensive tasks.&#8221;</span></p></blockquote>
<p><span class="ccbnTxt">MapReduce was brought to prominence by Google, and is one of the principal techniques at that company&#8217;s disposal in enabling them to break massive data sets into manageable chunks suitable for cost-effective processing on the commodity hardware for which they are known. The abstract for <a href="http://labs.google.com/papers/mapreduce.html" class="broken_link">a Google research paper on the topic</a> outlines the value proposition reasonably succinctly;</span></p>
<blockquote><p><span class="ccbnTxt">&#8220;MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.</span></p>
<p>Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program&#8217;s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.</p>
<p>Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google&#8217;s clusters every day.<span>&#8220;</span></p></blockquote>
<p><span><a href="http://hadoop.apache.org/">Hadoop</a> is a <a href="http://www.yahoo.com/">Yahoo!</a>-nurtured Open Source equivalent to Google&#8217;s MapReduce, managed as a project of the <a href="http://apache.org/">Apache Software Foundation</a>, and reputedly scalable to handle many petabytes of data distributed across thousands of CPUs.</span></p>
<p><span>As Adam noted in the press release, customers (such as the <em>New York Times</em> and Netflix) are already using Hadoop on Amazon&#8217;s Web Services. Today&#8217;s announcement makes it easier to cost-effectively and transparently commission (and decommission) the required compute resources. This is the &#8216;elasticity&#8217; referred to in the new service&#8217;s name, and is an increasingly important aspect of the current generation of Cloud-based compute services; much of the economic value proposition lies in <em>only</em> using (and therefore paying for) the resources you actually need to complete a task. If demand increases, the number of (virtual) machines available should rapidly increase to cope, and they should shut back down just as rapidly when the demand passes;</span></p>
<blockquote><p><span>&#8220;</span>Amazon Elastic MapReduce enables you to use as many or as few compute instances running Hadoop as you want. You can commission one, hundreds, or even thousands of instances to process gigabytes, terabytes, or even petabytes of data. And, you can run as many job flows concurrently as you wish. You can instantly spin up large Hadoop job flows which will start processing within minutes, not hours or days. When your job flow completes, unless you specify otherwise, the service automatically tears down your instances.<span>&#8220;</span></p></blockquote>
<p>Elastic MapReduce is <em>currently</em> available only for data centres in Amazon&#8217;s US region (<span>so non-US customers can <em>use</em> the service; they just have to be able/willing to transfer the data beyond their borders), and is priced in addition to existing EC2 instances with Elastic MapReduce on a $US0.10 per hour &#8216;small&#8217; instance costing a further $US0.015 per hour (yes, 1 and a half cents per hour) and on a $US0.80 per hour &#8216;extra large&#8217; instance costing a further $US0.12 per hour.</span></p>
<p><span>Elastic MapReduce is another nice example of slow, incremental improvement to Amazon&#8217;s core Web Services offer. </span></p>
<p><span>It remains to be seen, as developers get down to using it for real, whether it&#8217;s pitched as a low-end disruptor that simply rounds out another piece of the emerging AWS whole, or if it&#8217;s a viable competitor in its own right to the recently announced <a href="http://www.cloudera.com/">Cloudera</a> which sees taking Hadoop to mainstream enterprise customers as its <em>raison d&#8217;etre</em>;</span></p>
<blockquote><p><span>&#8220;</span><span>Cloudera</span> can help you install, configure and run <span>Hadoop</span> for large-scale data processing and analysis. <a href="http://www.cloudera.com/hadoop">Get Cloudera&#8217;s Distribution for Hadoop</a> and start working with <span>Big Data</span> today.<span>&#8220;</span></p></blockquote>
<p><span><strong>Update:</strong> Amazon&#8217;s Jeff Barr provides a lot more detail in <a href="http://aws.typepad.com/aws/2009/04/announcing-amazon-elastic-mapreduce.html">a post to the AWS Blog</a>.<br />
</span></p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://blog.makezine.com/archive/2009/03/learn_to_develop_software_for_hadoo.html?CMP=OTC-0D6B48984890" class="broken_link">Learn to develop software for Hadoop</a> (makezine.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.businessweek.com/the_thread/techbeat/archives/2009/01/amazon_blows_aw.html?campaign_id=rss_blog_techbeat">Amazon Blows Away Fourth Quarter Earnings Expectations</a> (businessweek.com)</li>
<li class="zemanta-article-ul-li"><a href="http://perspectives.mvdirona.com/2009/03/29/GrandChallengesInDatabaseSelfManagement.aspx">Grand Challenges in Database Self-Management</a> (perspectives.mvdirona.com)</li>
<li class="zemanta-article-ul-li"><a href="http://news.cnet.com/8301-13846_3-10128773-62.html?part=rss&amp;subj=news">Cloud platforms of the future: Hadoop and Eucalyptus</a> (news.cnet.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.vnunet.com/vnunet/news/2238595/cloudera-aims-hadoop-commercial" class="broken_link">Cloudera aims to take Hadoop commercial</a> (vnunet.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.theregister.co.uk/2009/02/24/the_meta_cloud/">The Meta Cloud &#8211; Flying data centers enter fourth dimension</a> (theregister.co.uk)</li>
<li class="zemanta-article-ul-li"><a href="http://www.theregister.co.uk/2009/03/16/cloudera_hadoop_launch/">Cloudera floats commercial Hadoop distro</a> (theregister.co.uk)</li>
<li class="zemanta-article-ul-li"><a href="http://www.johnmwillis.com/open-source/free-cool-tools-for-educators/">Free Cool Tools for Educators</a> (johnmwillis.com)</li>
<li class="zemanta-article-ul-li"><a href="http://gigaom.com/2009/03/15/hadoop-focussed-startup-cloudera-raises-5-million/">Hadoop-Focussed Startup, Cloudera Raises $5 Million</a> (gigaom.com)</li>
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www10.nytimes.com/2009/03/17/technology/business-computing/17cloud.html%3F_r%3D5%26partner%3Drss%26amp%3Bemc%3Drss&amp;a=3806096&amp;rid=cc35c960-1691-40a0-9105-85d2e9f2278b&amp;e=f09840de83c2bd30d346c0e96c8bdf9b">Hadoop, a Free Software Program, Finds Uses Beyond Search</a> (nytimes.com)</li>
<li class="zemanta-article-ul-li"><a href="http://insidehpc.com/2009/03/17/cloudera-launched-to-offer-commercialized-hadoop/">Cloudera launched to offer commercialized Hadoop</a> (insidehpc.com)</li>
<li class="zemanta-article-ul-li"><a href="http://battellemedia.com/archives/004872.php">Cloudera</a> (battellemedia.com)</li>
<li class="zemanta-article-ul-li"><a href="http://dorai.wordpress.com/2009/03/17/linklog-streaming-data-distributed-execution-engines/">LinkLog: Streaming Data, Distributed Execution Engines</a> (dorai.wordpress.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/cc35c960-1691-40a0-9105-85d2e9f2278b/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=cc35c960-1691-40a0-9105-85d2e9f2278b" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2009/04/amazon-tethers-balloons-for-now-attention-turns-to-crunching-data-in-the-cloud-with-elastic-mapreduce-web-service/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sun, IBM, and the value of a comprehensive proposition</title>
		<link>http://cloudofdata.com/2009/03/sun-ibm-and-the-value-of-a-comprehensive-proposition/</link>
		<comments>http://cloudofdata.com/2009/03/sun-ibm-and-the-value-of-a-comprehensive-proposition/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 11:38:34 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Enterprise Computing]]></category>
		<category><![CDATA[IaaS]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Accenture]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Dell]]></category>
		<category><![CDATA[Don Clark]]></category>
		<category><![CDATA[EDS]]></category>
		<category><![CDATA[Grid Computing]]></category>
		<category><![CDATA[Hewlett-Packard]]></category>
		<category><![CDATA[IBM]]></category>
		<category><![CDATA[Jonathan Schwartz]]></category>
		<category><![CDATA[Larry Dignan]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Solaris]]></category>
		<category><![CDATA[Sun Microsystems]]></category>
		<category><![CDATA[Wall Street Journal]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=428</guid>
		<description><![CDATA[Image via Wikipedia Twitter is aflutter once again this morning, this time over a Wall Street Journal suggestion that &#8216;IBM in talks to buy Sun.&#8217; I am not able to comment on the veracity of the rumour itself, but it&#8217;s clear that Sun needs to do something in order to strengthen its position in a [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright" style="width: 212px;">
<dt class="wp-caption-dt"><a href="http://en.wikipedia.org/wiki/Image:Sun_Microsystems_logo.svg"><img title="Sun Microsystems" src="http://upload.wikimedia.org/wikipedia/en/thumb/c/c8/Sun_Microsystems_logo.svg/202px-Sun_Microsystems_logo.svg.png" alt="Sun Microsystems" width="202" height="87" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://en.wikipedia.org/wiki/Image:Sun_Microsystems_logo.svg">Wikipedia</a></dd>
</dl>
</div>
</div>
<p><a href="http://www.twitter.com">Twitter</a> is <a href="http://search.twitter.com/search?q=&amp;ands=IBM+Sun&amp;phrase=&amp;ors=&amp;nots=&amp;tag=&amp;lang=all&amp;from=&amp;to=&amp;ref=&amp;near=&amp;within=15&amp;units=mi&amp;since=2009-03-17&amp;until=&amp;rpp=50">aflutter</a> once again this morning, this time over a <em><a class="zem_slink" title="The Wall Street Journal" rel="homepage" href="http://www.wsj.com/">Wall Street Journal</a></em> suggestion that &#8216;<a href="http://online.wsj.com/article/SB123735970806267921.html">IBM in talks to buy Sun</a>.&#8217; I am not able to comment on the veracity of the rumour itself, but it&#8217;s clear that <a href="http://www.sun.com/">Sun</a> needs to do something in order to strengthen its position in a competitive market. Selling to <a href="http://www.ibm.com/">IBM</a> is certainly one route, but an easier one might be the provision of a more complete Sun-badged proposition.</p>
<p>Elsewhere on WSJ.com this morning, in news that seems extremely unlikely to be unconnected, <a href="http://blogs.wsj.com/digits/2009/03/18/sun-like-others-has-its-head-in-the-clouds/">Don Clark reports</a> on Sun&#8217;s</p>
<blockquote><p>&#8220;plans to offer its own cloud-style services. Sun also plans to offer software, as well as hardware, to other companies that want to build clouds.&#8221;</p></blockquote>
<p>Alongside competitive enterprise server hardware and Sun&#8217;s widely used stable of open source software (<a class="zem_slink" title="Solaris (operating system)" rel="homepage" href="http://sun.com/solaris/">Solaris</a>, <a href="http://java.com/">Java</a>, <a class="zem_slink" title="MySQL" rel="homepage" href="http://www.mysql.com">MySQL</a>, <a class="zem_slink" title="OpenOffice.org" rel="homepage" href="http://www.openoffice.org/">OpenOffice</a>, etc), this latest announcement of &#8216;Sun Cloud Storage&#8217; (equivalent to Amazon&#8217;s <a class="zem_slink" title="Amazon S3" rel="homepage" href="http://aws.amazon.com/s3">Simple Storage Service</a>, S3) and &#8216;Sun Cloud Compute&#8217; (equivalent to Amazon&#8217;s <a href="http://aws.amazon.com/ec2/">Elastic Compute Cloud</a>, EC2) should make Sun a serious player in the Cloud Computing space in a way that their abortive <a href="http://en.wikipedia.org/wiki/Sun_Grid">network.com</a> never really did.</p>
<p>So why is anyone discussing either a desire on Sun&#8217;s part to sell, or a desire on IBM&#8217;s part to consider buying?</p>
<p>I&#8217;ve greatly enjoyed the insights of Sun CEO <a href="http://www.sun.com/aboutsun/executives/schwartz/bio.jsp">Jonathan Schwartz</a>, especially as enunciated most recently on <a href="http://blogs.sun.com/jonathan/">his blog</a> in two videos discussing <a href="http://blogs.sun.com/jonathan/entry/step_one_adoption">community adoption of Sun&#8217;s open source software</a> and <a href="http://blogs.sun.com/jonathan/entry/commercial_innovation_3_of_4">the commercial models Sun deploys to monetise that community</a>. Despite Jonathan&#8217;s arguments, though, it seems to me that Sun lacks a fundamental piece of the whole; an effective and highly visible professional services arm. IBM has this. <a class="zem_slink" title="Hewlett-Packard" rel="homepage" href="http://www.hp.com">HP</a>, with the purchase of <a href="http://www.eds.com/">EDS</a>, has this. <a href="http://www.accenture.com/">Accenture</a> and gang <em>are</em> this, but nothing makes them choose to use or recommend Sun over its competitors today.</p>
<p>As Jonathan discusses in the first of the videos I pointed to (YouTube <a href="http://www.youtube.com/watch?v=Oro3faNPxGY">version</a> embedded below, in two parts), Sun has been successful in encouraging use and innovation around a suite of open source operating systems, tools and applications.</p>
<p style="text-align: center;"><iframe title="YouTube video player" class="youtube-player" type="text/html" width="425" height="344" src="http://www.youtube.com/embed/Oro3faNPxGY" frameborder="0" allowFullScreen="true"> </iframe></p>
<p style="text-align: center;"><iframe title="YouTube video player" class="youtube-player" type="text/html" width="425" height="344" src="http://www.youtube.com/embed/gsVErU22krw" frameborder="0" allowFullScreen="true"> </iframe></p>
<p>Indeed, it was little more than a year ago that the company <a href="http://www.mysql.com/news-and-events/sun-to-acquire-mysql.html">announced plans</a> to spend some $800 million in acquiring European open source web database company MySQL. The problem is that these solutions are <em>all freely downloadable from the Web</em>, and the inevitable professional services and consultancy work associated with enterprise delivery — which could generate so much revenue — goes to far more companies than just Sun.</p>
<p>Alongside the software, Sun has a competitive range of hardware offerings in the enterprise space, and sells these in competition with IBM, HP, Dell and the rest.</p>
<p>By omitting a compelling and enveloping professional services proposition, Sun damages its own ability to capitalise upon its software and hardware efforts. Potential customers download Sun software, and then run it on anything; Sun gets a very small slice of the hardware sales. Sun isn&#8217;t doing <em>badly</em> at selling hardware, but maybe a more rounded services proposition would enable them to do <em>better</em>, despite Jonathan&#8217;s points in the commercial innovation video.</p>
<p style="text-align: center;"><iframe title="YouTube video player" class="youtube-player" type="text/html" width="425" height="344" src="http://www.youtube.com/embed/WdjYndoFvcc" frameborder="0" allowFullScreen="true"> </iframe></p>
<p>With more emphasis on offering a comprehensive package of solutions — whilst not removing choice and the vibrant open source community of which Jonathan speaks — might Sun not be a more obvious choice for customers in need of services and support?</p>
<p>An acquisition might, <a href="http://blogs.zdnet.com/BTL/?p=14817">as Larry Dignan writes</a>, make sense. But there&#8217;s plenty of life left in a standalone Sun, too&#8230; <em>if</em> it can monetise more of those downloading free software or steer more of those who &#8216;just need a server&#8217; towards one with a Sun badge on the front. Professional Services are the road to both.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://news.cnet.com/8301-10787_3-10156025-60.html?part=rss&amp;subj=news">Sun&#8217;s missing mojo: MIA until when?</a> (news.cnet.com)</li>
<li class="zemanta-article-ul-li"><a href="http://news.cnet.com/8301-1001_3-10163603-92.html?part=rss&amp;subj=news">IBM, Amazon foreshadow bevy of connecting clouds</a> (news.cnet.com)</li>
<li class="zemanta-article-ul-li"><a href="http://gigaom.com/2009/03/16/cisco%25e2%2580%2599s-data-center-moves-who-wins-who-loses/" class="broken_link">Cisco&#8217;s Data Center Moves: Who Wins, Who Loses?</a> (gigaom.com)</li>
<li class="zemanta-article-ul-li"><a href="http://gigaom.com/2009/03/16/ciscos-data-center-play-reinvents-the-server/">Cisco&#8217;s Data Center Play Reinvents The Server</a> (gigaom.com)</li>
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www.infoworld.com/article/08/12/10/Sun_takes_another_swing_at_cloud_computing_1.html&amp;a=2196914&amp;rid=52943ae8-2a45-4eb4-9ca8-535fc678c6f6&amp;e=cccdd59591b5d1c81f32f0f8697f15c6">Sun takes another swing at cloud computing</a> (infoworld.com)</li>
<li class="zemanta-article-ul-li"><a href="http://news.cnet.com/8301-13505_3-10192593-16.html?part=rss&amp;subj=news">Sun CEO: Open source = free advertising</a> (news.cnet.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.it-sideways.com/2009/02/sun-microsystem-business-revenues.html">Sun Microsystem Business Revenues (Latest Discovery)</a> (it-sideways.com)</li>
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www.infoworld.com/article/09/03/05/Schwartz-Not-worried-about-Suns-future_1.html&amp;a=3585344&amp;rid=52943ae8-2a45-4eb4-9ca8-535fc678c6f6&amp;e=53cc76b82dab7f990392f1c9e0ac56e7">Schwartz: Not worried about Sun&#8217;s future</a> (infoworld.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/52943ae8-2a45-4eb4-9ca8-535fc678c6f6/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=52943ae8-2a45-4eb4-9ca8-535fc678c6f6" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2009/03/sun-ibm-and-the-value-of-a-comprehensive-proposition/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Berkman Center unveils fascinating insight into media trends, with a little Semantic Web goodness from Calais under the hood</title>
		<link>http://cloudofdata.com/2009/03/berkman-center-unveils-fascinating-insight-into-media-trends-with-a-little-semantic-web-goodness-from-calais-under-the-hood/</link>
		<comments>http://cloudofdata.com/2009/03/berkman-center-unveils-fascinating-insight-into-media-trends-with-a-little-semantic-web-goodness-from-calais-under-the-hood/#comments</comments>
		<pubDate>Wed, 11 Mar 2009 13:20:43 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Barak Pridor]]></category>
		<category><![CDATA[Berkman Center for Internet & Society]]></category>
		<category><![CDATA[bloggers]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[ClearForest]]></category>
		<category><![CDATA[Ethan Zuckerman]]></category>
		<category><![CDATA[Harvard University]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[Media Cloud]]></category>
		<category><![CDATA[newspaper]]></category>
		<category><![CDATA[Open Calais]]></category>
		<category><![CDATA[Stephen Schultze]]></category>
		<category><![CDATA[Thomson Reuters]]></category>
		<category><![CDATA[TV News]]></category>
		<category><![CDATA[Yochai Benkler]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=386</guid>
		<description><![CDATA[Image via Wikipedia Harvard University&#8217;s Berkman Center for Internet &#38; Society unveiled their Media Cloud research tool today, bringing Semantic Web goodness from Thomson Reuters&#8217; Calais and affordably scalable Cloud oomph from Amazon Web Services together in powering exploration of a fascinating topic. As the press release notes, &#8220;Media Cloud was conceived by Berkman Fellow [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="margin: 1em; display: block;">
<div>
<dl class="wp-caption alignright" style="width: 212px;">
<dt class="wp-caption-dt"><a href="http://en.wikipedia.org/wiki/Image:Harvard_Wreath_Logo_1.svg"><img title="Harvard University" src="http://upload.wikimedia.org/wikipedia/en/thumb/3/3a/Harvard_Wreath_Logo_1.svg/202px-Harvard_Wreath_Logo_1.svg.png" alt="Harvard University" width="202" height="202" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://en.wikipedia.org/wiki/Image:Harvard_Wreath_Logo_1.svg">Wikipedia</a></dd>
</dl>
</div>
</div>
<p>Harvard University&#8217;s <a href="http://cyber.law.harvard.edu/">Berkman Center for Internet &amp; Society</a> unveiled their <a href="http://www.mediacloud.org/">Media Cloud</a> research tool today, bringing Semantic Web goodness from Thomson Reuters&#8217; <a href="http://www.opencalais.com/">Calais</a> and affordably scalable Cloud oomph from <a href="http://aws.amazon.com/">Amazon Web Services</a> together in powering exploration of a fascinating topic.</p>
<p>As the <a href="http://www.opencalais.com/press-releases/harvard-berkman-centers-media-cloud-offer-insights-media-trends-opencalais">press release</a> notes,</p>
<blockquote><p>&#8220;Media Cloud was conceived by Berkman Fellow Ethan Zuckerman and Berkman Faculty Co-Director Yochai Benkler.  It was inspired by their debate over whether the blogosphere largely echoed traditional media or was instead a source for original news and democratic agenda-setting.</p>
<p>&#8216;While daily newspapers struggle for survival, political, niche and special interest blogs continue to capture consumer interest,&#8217; said Yochai Benkler, Faculty Co-Director of the Berkman Center. &#8216;In the midst of this upheaval, it is difficult to know where stories begin, who sets the agenda, and how these dramatic changes impact news coverage on the whole.  We created Media Cloud to help researchers and the public get quantitative answers to these challenging questions.&#8217;&#8221;</p></blockquote>
<p>The site itself <a href="http://www.mediacloud.org/about-2/" class="broken_link">provides more detail</a>, notably,</p>
<blockquote><p>&#8220;Print newspapers are declaring bankruptcy nationwide. High-profile blogs are proliferating. Media companies are exploring new production techniques and business models in a landscape that is increasingly dominated by the Internet. In the midst of this upheaval, it is difficult to know what is actually happening to the shape of our news. Beyond one-off anecdotes or painstaking manual content analysis, there are few ways to examine the emerging news ecosystem.&#8221;</p></blockquote>
<p>By systematically harvesting full-text content from both new and traditional news sources and passing it through Calais for entity extraction, the Berkman team is able to build a coherent and normalised pool of news that they &#8211; and others &#8211; can begin to interrogate in order to answer pressing questions as to the shifting news landscape. Today&#8217;s site is just the beginning, and the team has ambitious plans to increase the number of sources they harvest, to provide ever-richer visualisations on their site, and to expose public APIs to the underlying data in order that third parties can consume it for their own purposes.</p>
<p>Ahead of the launch I spoke with <a href="http://clearforest.com/AboutUs/ManagementTeam.asp#1" class="broken_link">Barak Pridor</a>, CEO of <a href="http://thomsonreuters.com/">Thomson Reuters</a>&#8216; <a href="http://clearforest.com/">ClearForest</a> subsidiary, and Berkman Fellow <a href="http://cyber.law.harvard.edu/people/SSchultze">Stephen Schultze</a>. Both stressed the beta nature of the site, and the fact that the team are still working to optimise a number of areas. That aside, I was impressed by some of the early results.</p>
<p>Barak pointed to the role Calais is playing in enabling the Berkman team to conduct comparisons within and between different news sources, extracting key entities (personal names, companies, places, etc) from the stream of text and normalising the various ways in which we refer to them (IBM, International Business Machines, etc).</p>
<p>Stephen discussed some of the background to the project, and was keen to emphasise the <em>platform</em> nature of their activity; although focussed on a destination web site today, the intention is to expose much of the data via a series of APIs that third parties will be able to consume. In the context of recent comparable moves from the <a href="http://open.blogs.nytimes.com/2009/02/25/announcing-the-times-newswire-api/"><em>New York Times</em></a> and Britain&#8217;s <a href="http://www.guardian.co.uk/media/2009/mar/10/guardian-open-platform"><em>Guardian</em></a>, this is clearly to be welcomed and will accelerate the innovation around this incomparable pool of content.</p>
<p>Whilst the site does not currently expose the full richness of the data being harvested, it is already possible to see a number of interesting <a href="http://www.mediacloud.org/visualizations/" class="broken_link">visualisations</a>.</p>
<p style="text-align: center;"><a title="bbc-wsj-fox-maps" href="http://www.mediacloud.org/visualizations/?tagset=13&amp;chart_is_log=true&amp;pivotterm=mugabe&amp;viz_type=map&amp;media_source[1]=BBC&amp;media_source[2]=Wall+Street+Journal&amp;media_source[3]=FOX+News&amp;media_id[1]=1094&amp;media_id[2]=1150&amp;media_id[3]=1092&amp;submit=Submit+Query" class="broken_link"><img class="attachment wp-att-388 aligncenter" style="margin: 8px;" src="http://cloudofdata.com/wp-content/uploads/2009/03/bbc-wsj-fox-maps.png" alt="bbc-wsj-fox-maps" width="378" height="660" /></a></p>
<p>The maps above, for example, compare the coverage devoted to all the world&#8217;s countries by three very different news organisations; the BBC, the Wall Street Journal and FOX News. The apparent non-coverage of Iran is surprising, and suggests an algorithm still in need of tweaking. That apart, it&#8217;s actually surprising to note the very similar spread of coverage. European snobbishness about the insularity of US news &#8211; and the farcical nature of FOX &#8216;news&#8217; &#8211; may not survive exposure to this data, although we <em>can</em>, of course, reassure ourselves that &#8216;coverage&#8217; doesn&#8217;t always equate to &#8216;insight,&#8217; &#8216;analysis&#8217; or &#8216;truth.&#8217; As more data become available, even those preconceived notions may face an overdue battering.</p>
<p>The system is already capable of addressing more complex questions, and exploring temporal aspects of the ways in which stories break, spread and relate to one another across different media. Stephen shared a number of evolving visualisations that made it possible for me to explore several trends at once, and I look forward to these tools arriving on the site.</p>
<p>The project is directly funded by the Berkman Center at present, and a desire to enrich and stabilise the platform means that significant addition of new news resources may be some months off. I, for one, look forward to seeing more non-US resources such as the UK&#8217;s <em>Financial Times</em> and <em>Guardian</em>, or their international equivalents.</p>
<p>The project&#8217;s APIs are also being finalised at this point and will be released in due course, along with the source code upon which Media Cloud runs. It will be interesting to see the balance between API access and software download at that point.</p>
<p>Media Cloud represents an excellent example of the uses to which the technologies I cover can be put. It&#8217;s not a Semantic Web project. It&#8217;s not a Cloud Computing project. It&#8217;s an intriguing exploration, that puts those technologies (invisibly) to work in helping to get the job done. That&#8217;s the way it should be.</p>
<p>And, as Stephen commented when asked what he was most interested to see,</p>
<blockquote><p>&#8220;the most amazing stuff will be what other folks use it for in asking and answering their own questions.&#8221;</p></blockquote>
<p>Indeed.</p>
<p>Thomson Reuters is already supporting this project by contributing Calais and some technical expertise. I wonder whether some of our more enlightened news organisations might like to help the Berkman get more data in there, faster? As more newspapers close each week, and as we (supposedly) become ever-more insular in the choices we make about the news we consume, there&#8217;s a real need to understand the ways in which the media are changing. Many eyes on this data set is one sure way to ensure that the commentary moving forward is informed and informative.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://crooksandliars.com/susie-madrak/lawsuit-determine-fair-use-blog-links">Lawsuit to Determine Fair Use for Blog Links, Headlines</a> (crooksandliars.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.paidcontent.org/entry/419-newspapers-suddenly-adapt-to-socal-media-nearly-60-percent-offer-user-g/">Newspapers Suddenly Adapt To Socal Media; Nearly 60 Percent Offer User-Gen Content</a> (paidcontent.org)</li>
<li class="zemanta-article-ul-li"><a href="http://www.mathewingram.com/work/2008/11/26/yes-twitter-is-a-source-of-journalism/">Yes, Twitter is a source of journalism</a> (mathewingram.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.xconomy.com/boston/2009/01/13/boston-journalists-launch-globalpostcom-alternative-to-traditional-medias-shrinking-international-coverage/">Boston Journalists Launch GlobalPost.com, Alternative to Traditional Media&#8217;s Shrinking International Coverage</a> (xconomy.com)</li>
<li class="zemanta-article-ul-li"><a href="http://blogs.harvardbusiness.org/now-new-next/2009/02/whats-the-best-business-model.html">What&#8217;s the Best Business Model for Newspapers?</a> (blogs.harvardbusiness.org)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/10d91dc7-a83e-4660-a8c1-65a1851c4ec2/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=10d91dc7-a83e-4660-a8c1-65a1851c4ec2" alt="Reblog this post [with Zemanta]" /></a><span class="zem-script more-related"><script src="http://static.zemanta.com/readside/loader.js" type="text/javascript"></script></span></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2009/03/berkman-center-unveils-fascinating-insight-into-media-trends-with-a-little-semantic-web-goodness-from-calais-under-the-hood/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>My podcast conversation about Cloud Computing with Nick Carr</title>
		<link>http://cloudofdata.com/2009/02/my-podcast-conversation-with-about-cloud-computing-with-nick-carr/</link>
		<comments>http://cloudofdata.com/2009/02/my-podcast-conversation-with-about-cloud-computing-with-nick-carr/#comments</comments>
		<pubDate>Mon, 02 Feb 2009 08:44:39 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Podcast]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Big Switch]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[does IT matter]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[nicholas carr]]></category>
		<category><![CDATA[Nick Carr]]></category>
		<category><![CDATA[the big switch]]></category>
		<category><![CDATA[utility computing]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=303</guid>
		<description><![CDATA[Nick Carr&#8217;s most recent book, The Big Switch [UK, US], was published in January of 2008. Whether by luck or judgement, he caught the meme of the moment and became closely associated with growing interest in the notion of &#8216;Cloud Computing&#8216; throughout last year. The paperback edition of Nick&#8217;s book has just been published, and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://nicholasgcarr.com/"></a><a title="bigswitch-cover" href="http://nicholasgcarr.com/bigswitch/"><img class="attachment wp-att-311 alignright" style="margin: 6px;" src="http://cloudofdata.com/wp-content/uploads/2009/01/bigswitch-cover.jpg" alt="bigswitch-cover" width="160" height="243" /></a>Nick Carr&#8217;s most recent book, <em><a class="zem_slink" title="The Big Switch" rel="homepage" href="http://www.nicholasgcarr.com/bigswitch/">The Big Switch</a></em> [<a href="http://www.amazon.co.uk/gp/product/0393333949?ie=UTF8&amp;tag=thinkingabout-21&amp;linkCode=as2&amp;camp=1634&amp;creative=19450&amp;creativeASIN=0393333949">UK</a><img style="border:none !important; margin:0px !important;" src="http://www.assoc-amazon.co.uk/e/ir?t=thinkingabout-21&amp;l=as2&amp;o=2&amp;a=0393333949" border="0" alt="" width="1" height="1" />, <a href="http://www.amazon.com/gp/product/0393333949?ie=UTF8&amp;tag=cloofdat-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0393333949">US</a><img style="border:none !important; margin:0px !important;" src="http://www.assoc-amazon.com/e/ir?t=cloofdat-20&amp;l=as2&amp;o=1&amp;a=0393333949" border="0" alt="" width="1" height="1" />], was published in January of 2008. Whether by luck or judgement, he caught the <a class="zem_slink" title="Meme" rel="wikipedia" href="http://en.wikipedia.org/wiki/Meme">meme</a> of the moment and became closely associated with growing interest in the notion of &#8216;<a class="zem_slink" title="Cloud computing" rel="wikipedia" href="http://en.wikipedia.org/wiki/Cloud_computing">Cloud Computing</a>&#8216; throughout last year.</p>
<p>The paperback edition of Nick&#8217;s book has just been published, and includes an additional chapter that introduces &#8216;The Cloud 20&#8242;; representative examples of the different ways in which companies big and small, new and old are engaging with the Cloud Computing phenomenon.</p>
<p>I was delighted when Nick agreed to speak with me over the weekend, and <a href="http://blogs.talis.com/nodalities/2009/02/nick-carr-talks-about-cloud-computing-and-the-big-switch.php">the resulting podcast has just been released so you can listen in on our conversation</a>.</p>
<p>As well as talking about the book, which most readers of this blog have doubtless already read, we looked more broadly at some of the trends shaping consumer and corporate attitudes to the Cloud. How does the current economic climate, for example, affect the rate of adoption of <a class="zem_slink" title="Utility computing" rel="wikipedia" href="http://en.wikipedia.org/wiki/Utility_computing">utility computing</a> or of Cloud-based applications such as those from Google?</p>
<p><a href="http://blogs.talis.com/nodalities/2009/02/nick-carr-talks-about-cloud-computing-and-the-big-switch.php">Have a listen</a>, and let us know what <em>you</em> think&#8230;</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://www.techpluto.com/cloud-computing-characteristics/">What kinda apps are best suited for &#8216;Cloud deployment&#8217; : 4 Solutions</a> (techpluto.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.johnmwillis.com/other/alternate-cloud-20-list/">Alternate Cloud 20 List</a> (johnmwillis.com)</li>
<li class="zemanta-article-ul-li"><a href="http://q-ontech.blogspot.com/2009/01/national-higher-ed-computing-cloud.html">A National Higher Ed Computing Cloud</a> (q-ontech.blogspot.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.fiberevolution.com/2008/10/review-the-big-switch.html">Review: The Big Switch</a> (fiberevolution.com)</li>
<li class="zemanta-article-ul-li"><a href="http://insidehpc.com/2009/01/15/eli-lilly-using-cloud-computing/">Eli Lilly using cloud computing</a> (insidehpc.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.roughtype.com/archives/2009/01/the_cloud_20.php">The Cloud 20</a> (roughtype.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.johnmwillis.com/cloud-computing/cloud-cafe-30-what-is-a-cloud-from-the-beginning/">Cloud Cafe #30 What is a Cloud from the Beginning</a> (johnmwillis.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.elasticvapor.com/2008/11/overcast-conversations-on-cloud.html">Overcast: Conversations on Cloud Computing</a> (elasticvapor.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.elasticvapor.com/2008/11/eweek-podcast-evolution-of-cloud.html">eweek Podcast: The Evolution of Cloud Computing</a> (elasticvapor.com)</li>
<li class="zemanta-article-ul-li"><a href="http://www.elasticvapor.com/2009/01/unified-ontology-for-cloud-computing.html">A Unified Ontology for Cloud Computing</a> (elasticvapor.com)</li>
<li class="zemanta-article-ul-li"><a href="http://news.cnet.com/Year-in-review-The-cloud-soars/2009-7345_3-6248570.html?part=rss&amp;subj=news">Year in review: The &#8216;cloud&#8217; soars</a> (news.cnet.com)</li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/3da9cbc0-5b56-43d3-a24e-0ffdfd78c75e/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=3da9cbc0-5b56-43d3-a24e-0ffdfd78c75e" alt="Reblog this post [with Zemanta]" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2009/02/my-podcast-conversation-with-about-cloud-computing-with-nick-carr/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Powered by Cloud conference, London</title>
		<link>http://cloudofdata.com/2009/01/powered-by-cloud-conference-london/</link>
		<comments>http://cloudofdata.com/2009/01/powered-by-cloud-conference-london/#comments</comments>
		<pubDate>Mon, 19 Jan 2009 10:26:49 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[Business]]></category>
		<category><![CDATA[Business model]]></category>
		<category><![CDATA[Elastichosts]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[Powered By Cloud]]></category>
		<category><![CDATA[Rightscale]]></category>
		<category><![CDATA[salesforce.com]]></category>
		<category><![CDATA[Sam Sethi]]></category>
		<category><![CDATA[Simon Wardley]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=261</guid>
		<description><![CDATA[Image via Wikipedia Event organisers are feeling the squeeze as advertising, travel and &#8216;training&#8217; budgets present easy targets to Finance Directors seeking to balance their books in the current economic climate. Amidst announcement after announcement of cancelled and radically down-sized trade shows and conferences, one bright spot in the event management space appears to be [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img">
<div>
<dl class="wp-caption alignright" style="margin: 1em; float: right; display: block; width: 212px;">
<dt class="wp-caption-dt"><a href="http://commons.wikipedia.org/wiki/Image:Houses.of.parliament.overall.arp.jpg"><img title="The British Houses of Parliament, London" src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Houses.of.parliament.overall.arp.jpg/202px-Houses.of.parliament.overall.arp.jpg" alt="The British Houses of Parliament, London" width="202" height="152" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://commons.wikipedia.org/wiki/Image:Houses.of.parliament.overall.arp.jpg">Wikipedia</a></dd>
</dl>
</div>
</div>
<p>Event organisers are feeling the squeeze as advertising, travel and &#8216;training&#8217; budgets present easy targets to Finance Directors seeking to balance their books in the current economic climate.</p>
<p>Amidst announcement after announcement of cancelled and radically down-sized trade shows and conferences, one bright spot in the event management space appears to be anything related to &#8216;The Cloud.&#8217; There is an understandable perception that Cloud Computing will save money, so that ticks boxes back at HQ. There is also a perception that a sound understanding of the Cloud (and yes, it&#8217;s more than simply outsourcing your Data Centre to save some money) will position companies to come out of this economic downturn extremely well placed to exploit new opportunities and grow.</p>
<p>One of those events to cross my radar just before Christmas was <a href="http://www.poweredbycloud.com/">Powered By Cloud</a>, which is being held in London &#8211; just around the corner from the UK Parliament &#8211; on 2 and 3 February. According to the site, attendees will learn;</p>
<blockquote><p>&#8220;What does [the Cloud] mean for your business  model?</p></blockquote>
<blockquote><p>How fast will this happen?</p></blockquote>
<blockquote><p>How can I make money from Cloud Computing?</p></blockquote>
<blockquote><p>What technologies will be used?</p></blockquote>
<blockquote><p>What are the implications for consumers, privacy and security?</p></blockquote>
<blockquote><p>What is the future of Cloud Computing?&#8221;</p></blockquote>
<p>Speakers on the programme look like a nice mix of solutions providers, customers and analysts, and include <a href="http://aws.amazon.com/">Amazon</a>&#8216;s <a href="http://www.linkedin.com/in/simonebrunozzi">Simone Brunozzi</a>, <a href="http://www.linkedin.com/pub/1/710/b78">Dave Armstrong</a> from <a class="zem_slink" title="Google" rel="homepage" href="http://google.com">Google</a>, <a href="http://www.linkedin.com/pub/0/776/6a5">Woodson Martin</a> from <a class="zem_slink" title="Salesforce.com" rel="homepage" href="http://www.salesforce.com/">Salesforce</a>, <a href="http://www.rightscale.com/">Rightscale</a> CEO <a href="http://www.linkedin.com/pub/0/b1/b71">Michael Crandell</a>, <a href="http://www.elastichosts.com/">Elastichosts</a> CEO <a href="http://www.linkedin.com/in/richardjdavies">Richard Davies</a>, <a href="http://www.linkedin.com/in/simonwardley">Simon Wardley</a> and <a href="http://www.linkedin.com/in/samsethi">Sam Sethi</a>.</p>
<p>If you&#8217;ve got a couple of days to spare, can convince the Finance Director to agree (tell &#8216;em Cloud Computing saves money&#8230;), and can get to London then this looks like a pretty good investment for that diminished travel budget. It <em>might</em> even be worth enduring Heathrow to reach.</p>
<p>And, thanks to Philip Low at event organisers <a href="http://broad-group.com/">BroadGroup</a>, here&#8217;s something that might even make the Finance Director smile&#8230; If you use discount code &#8216;<strong>SPKR</strong>&#8216; when you register, you can get in cheaper and save even more money. Enjoy!</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://arnoldit.com/wordpress/2008/11/04/clouds-merge-search-challenge-emerges/">Clouds Merge, Search Challenge Emerges</a></li>
<li class="zemanta-article-ul-li"><a href="http://news.cnet.com/8301-1001_3-10143315-92.html?part=rss&amp;subj=news">Salesforce.com rolls out Service Cloud</a></li>
<li class="zemanta-article-ul-li"><a href="http://www.infoworld.com/article/08/12/10/Sun_takes_another_swing_at_cloud_computing_1.html">Sun takes another swing at cloud computing</a></li>
<li class="zemanta-article-ul-li"><a href="http://news.cnet.com/8301-1001_3-10125537-92.html?part=rss&amp;subj=news">Nebulous cloud computing</a></li>
<li class="zemanta-article-ul-li"><a href="http://news.zdnet.com/2424-9595_22-255222.html">HP dismisses cloud &#8216;hype&#8217;</a></li>
<li class="zemanta-article-ul-li"><a href="http://r.zemanta.com/?u=http%3A//www.guardian.co.uk/commentisfree/2008/nov/05/computers-clouds-storage-servers-technology&amp;a=1703847&amp;rid=866e725a-5351-4376-88de-053368b55712&amp;e=9b29d9c255c0e0952aa1f5e88cc2b6c2">Editorial: The trend for storing information on remote servers known as &#8216;clouds&#8217; is a good thing</a></li>
<li class="zemanta-article-ul-li"><a href="http://insidehpc.com/2009/01/15/eli-lilly-using-cloud-computing/">Eli Lilly using cloud computing</a></li>
<li class="zemanta-article-ul-li"><a href="http://blogs.ft.com/techblog/2009/01/is-the-pentagon-ready-to-stick-its-head-in-the-cloud/">Is the Pentagon ready to stick its head in the cloud?</a></li>
<li class="zemanta-article-ul-li"><a href="http://ericbrown.com/top-issues-for-cios.htm">Top Issues for CIO&#8217;s</a></li>
<li class="zemanta-article-ul-li"><a href="http://www.elasticvapor.com/2008/10/someone-say-recession-not-in-cloud.html">Someone say Recession? Not in the Cloud.</a></li>
<li class="zemanta-article-ul-li"><a href="http://freelanceswitch.com/the-business-of-freelancing/6-ways-to-help-your-business-weather-the-economic-storm/">6 Ways to Help Your Business Weather the Economic Storm</a></li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/4e35e06f-1f99-4b14-bc8f-df593590748b/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=4e35e06f-1f99-4b14-bc8f-df593590748b" alt="Reblog this post [with Zemanta]" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2009/01/powered-by-cloud-conference-london/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Amazon Public Data Sets bring the Cloud of Data closer</title>
		<link>http://cloudofdata.com/2008/12/amazon-public-data-sets-bring-the-cloud-of-data-closer/</link>
		<comments>http://cloudofdata.com/2008/12/amazon-public-data-sets-bring-the-cloud-of-data-closer/#comments</comments>
		<pubDate>Tue, 16 Dec 2008 15:32:08 +0000</pubDate>
		<dc:creator>Paul Miller</dc:creator>
				<category><![CDATA[Cloud computing]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[PaaS]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Amazon S3]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[Amazon.com]]></category>
		<category><![CDATA[Cloud of Data]]></category>
		<category><![CDATA[Google App Engine]]></category>
		<category><![CDATA[Licensing]]></category>
		<category><![CDATA[Open Data]]></category>
		<category><![CDATA[Open Data Commons]]></category>
		<category><![CDATA[Public Domain]]></category>
		<category><![CDATA[Talis]]></category>

		<guid isPermaLink="false">http://cloudofdata.com/?p=199</guid>
		<description><![CDATA[Image via CrunchBase, source unknown It began, as so many things do these days, with an idle tweet. On 21 November, Amazon Web Services&#8216; Deepak Singh pointed to a new page describing the company&#8217;s &#8216;Public Data Sets on Amazon Web Services.&#8217; Lidija Davis covered the news for ReadWriteWeb two days later and on 4 December [...]]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img">
<div>
<dl class="wp-caption alignright" style="margin: 1em; float: right; display: block; width: 210px;">
<dt class="wp-caption-dt"><a href="http://www.crunchbase.com/company/amazon"><img title="Image representing Amazon as depicted in Crunc..." src="http://www.crunchbase.com/assets/images/resized/0000/3898/3898v1-max-450x450.jpg" alt="Image representing Amazon as depicted in Crunc..." width="200" height="89" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://www.crunchbase.com">CrunchBase</a>, source unknown</dd>
</dl>
</div>
</div>
<p>It began, as so many things do these days, with an idle <a href="http://en.wikipedia.org/wiki/Twitter">tweet</a>.</p>
<p>On 21 November, <a href="http://aws.amazon.com/">Amazon Web Services</a>&#8216; <a href="http://mndoci.com/blog/about/" class="broken_link">Deepak Singh</a> <a href="http://twitter.com/mndoci/status/1016646762">pointed</a> to a new page describing the company&#8217;s &#8216;<a href="http://aws.amazon.com/publicdatasets/">Public Data Sets on Amazon Web Services</a>.&#8217;</p>
<p><a href="http://www.readwriteweb.com/about_Lidija.php">Lidija Davis</a> <a href="http://www.readwriteweb.com/archives/amazon_web_services_seeks_publ.php">covered the news</a> for ReadWriteWeb two days later and on 4 December Amazon issued its <a href="http://phx.corporate-ir.net/phoenix.zhtml?c=176060&amp;p=irol-newsArticle&amp;ID=1232302&amp;highlight=">formal press release</a>, prompting a flurry of coverage from Mike Arrington at <a href="http://www.techcrunch.com/2008/12/04/amazon-launches-public-data-sets-to-ease-research/">TechCrunch</a>, Larry Dignan at <a href="http://blogs.zdnet.com/BTL/?p=11081">ZDNet</a>, Krishnan Subramanian at <a href="http://www.cloudave.com/link/amazon-tries-to-lure-scientific-community-into-the-clouds">CloudAve</a>, and many others.</p>
<p>Alongside broader discussion of this move, members of the <a href="http://www.w3.org/">W3C</a>-backed <a href="http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData">Linking Open Data project</a> delved into the synergies via their <a href="http://lists.w3.org/Archives/Public/public-lod/">public mailing list</a> and Linking Open Data enthusiast <a class="zem_slink" title="Kingsley Idehen" rel="crunchbase" href="http://www.crunchbase.com/person/kingsley-idehen">Kingsley Idehen</a>&#8216;s <a href="http://www.openlinksw.com/">company</a> issued a <a href="http://www.earthtimes.org/articles/show/openlink-bolsters-semantic-web-vision,648977.shtml">Press Release</a> suggesting ways in which their products might fit within this shifting data landscape.</p>
<p>So what have Amazon done, what does it mean, and how does it &#8216;bring the Cloud of Data closer&#8217; as the title of this post suggests?</p>
<p>Amazon&#8217;s <a href="http://aws.amazon.com/publicdatasets/">web page</a> describes their offer quite succinctly;</p>
<blockquote><p>&#8220;Public Data Sets on <span class="caps">AWS</span> provides a centralized repository of public data sets that can be seamlessly integrated into <span class="caps">AWS</span> cloud-based applications.  <span class="caps">AWS</span> is hosting the public data sets at no charge for the community, and like all <span class="caps">AWS</span> services, users pay only for the compute and storage they use for their own applications.&#8221;</p></blockquote>
<p>As Krishnan noted in his post,</p>
<blockquote><p>&#8220;By doing this, Amazon is helping research community save money on storage and  bandwidth costs associated with assessing these public data from any EC2  instances they use in their research. When the data in question is in hundreds  of terabytes or petabytes, we are talking about huge cost savings here.&#8221;</p></blockquote>
<p>In addition, OpenLink&#8217;s <a href="http://www.earthtimes.org/articles/show/openlink-bolsters-semantic-web-vision,648977.shtml">press release</a> gives an indication of the efficient manner in which services and data <em>already hosted by Amazon</em> can be plugged together as needed;</p>
<blockquote><p>&#8220;As a vital contribution to the momentum behind the burgeoning Web of Linked Data, [OpenLink's product] Virtuoso provides a simple deployment mechanism for highly integrated knowledge bases emerging from the Linking Open Data community. For example, it is now possible to deploy personal or service-specific renditions of <a href="http://dbpedia.org/About">DBpedia</a> within 1.5 hours, compared to an 8 &#8211; 22 hour effort when performed from scratch.&#8221;<br />
(my links)</p></blockquote>
<p>By offering free hosting for public data, then, Amazon are doing the wider community a huge service. Much of the data there today is reasonably readily available from other sources, so the biggest immediate benefits are those of speed and cost outlined above by Krishnan and OpenLink. For existing or potential users of Amazon&#8217;s Web Services to power their applications, this is yet another reason to consider Amazon.</p>
<p><a class="zem_slink" title="Harvard Medical School" rel="homepage" href="http://hms.harvard.edu/">Harvard Medical School</a>&#8216;s Dr. Peter Tonellato was quoted in Amazon&#8217;s release, and he is unlikely to be alone;</p>
<blockquote><p>&#8220;<span class="ccbnTxt">Public Data Sets on AWS will enable me and many of my colleagues to       collaborate with each other by sharing our commonly used data sets,       research environments and tools. We can set up a controlled environment in       minutes, run our computational analysis for a couple of hours, and shut       down the environment. Our results are completely repeatable. I only pay       for the compute time I use, and more importantly I can spend more time       focusing on research, not downloading and setting up computational       infrastructure.</span>&#8220;</p></blockquote>
<p>The bigger long-term contribution of this Amazon initiative may actually lie with data that are difficult or impossible to find online today. In a previous existence at the <a href="http://ads.ahds.ac.uk/">Archaeology Data Service</a> (ADS), for example, my colleagues and I were always being contacted by individuals and organisations with data that they <em>wanted</em> to see online; individuals and organisations that lacked the skills, resources or mandate to mount and maintain the data themselves. How many of those organisations will <a href="http://aws.amazon.com/publicdatasets/#3">beat a path to Amazon&#8217;s door</a> now&#8230; and what sort of resource might we see emerge as a result?</p>
<p>However&#8230;</p>
<p>Krishnan concludes his post with a reality-check, commenting;</p>
<blockquote><p>&#8220;this data stored on AWS servers are useful only if the researchers use Amazon  EC2 for their computing needs&#8230; even if  they could tap into it from external platforms, it doesn’t mean much if these public datasets are  accessible using some kind of API from their original source itself.&#8221;</p></blockquote>
<p>In other words, much (most? all?) of the advantage Amazon is offering evaporates if developers then have to pull the hosted data off Amazon&#8217;s servers and into their own applications running locally or via a competing Cloud provider such as Google.</p>
<p>Although the way in which it is recognised and monetised is finally shifting, data is still valuable, and Amazon (and others) clearly recognise the benefits of enticing users to entrust data to <em>their</em> offering, whilst (almost) imperceptibly making it that little bit more painful to use the data somewhere else.</p>
<p>Kingsley Idehen is <a href="http://www.earthtimes.org/articles/show/openlink-bolsters-semantic-web-vision,648977.shtml">quoted</a> as saying,</p>
<blockquote><p>&#8220;The Web&#8217;s potential as a globally distributed information space that plugs into disparate databases has never been in question. What has remained unclear is how a federated Web of linked databases would be delivered in a manner consistent with the Web&#8217;s core architecture, without compromising its simplicity.&#8221;</p></blockquote>
<p>It is in moving us toward this open vision that Amazon&#8217;s offering (although undoubtedly an important step along the way) is ultimately lacking. For that, we may well require the open and linked approach of Semantic Web offerings from companies such as <a href="http://www.talis.com/platform/">Talis</a> and Kingsley&#8217;s OpenLink. These recognise the futility of expecting all data to migrate to a single service provider, whilst still ensuring that those on the &#8216;inside&#8217; may gain the benefits of proximity on the network, pre-computation of certain indices, etc. Amazon and its services clearly have a place within that emerging ecosystem, but it is a place that they will need to share with others.</p>
<p>The worthwhile philanthropic aspects of Amazon&#8217;s announcement apart, the company is certainly doing its part to evangelise the benefits of moving data to the Cloud, and this is to be wholeheartedly welcomed.</p>
<p>CIOs are recognising the benefits of Cloud-based computation, and their resistance to the loss of control implied by individual cost centres&#8217; embracing of SaaS solutions such as Salesforce is diminishing. The proposition of accessing <em>data</em> in the Cloud, at will, is even more profound, and the benefits to be gained require careful and compelling explanation in the face of inevitable fears regarding issues such as data integrity.</p>
<p>Showing everyone the benefits to be gained in sharing disparate <em>public</em> data sets is one more step along the way to widespread acceptance of the value in easing restrictions over access to more sensitive resources.</p>
<h6 class="zemanta-related-title" style="font-size: 1em;">Related articles by Zemanta</h6>
<ul class="zemanta-article-ul">
<li class="zemanta-article-ul-li"><a href="http://www.cloudave.com/link/the-evolution-of-an-all-encompassing-world-of-clouds">The evolution of an all encompassing world of clouds</a></li>
<li class="zemanta-article-ul-li"><a href="http://www.readwriteweb.com/archives/amazon_web_services_bigger_than_amazon.php">Amazon Web Services: Bigger Than Amazon</a></li>
<li class="zemanta-article-ul-li"><a href="http://www.xconomy.com/seattle/2008/12/04/public-data-sets-go-on-amazons-cloud/">Public Data Goes on Amazon&#8217;s Cloud</a></li>
<li class="zemanta-article-ul-li"><a href="http://blogs.ft.com/techblog/2008/12/the-amazon-cloud-no-longer-a-mid-altantic-kludge/">The Amazon Cloud: no longer a mid-Altantic kludge</a></li>
<li class="zemanta-article-ul-li"><a href="http://www.cloudave.com/link/amazon-tries-to-lure-scientific-community-into-the-clouds">Amazon Tries to Lure Scientific Community into the Clouds</a></li>
</ul>
<div class="zemanta-pixie" style="margin-top: 10px; height: 15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/8e52f4ea-d3e4-4217-8ac8-03193483b71e/"><img class="zemanta-pixie-img" style="border: medium none; float: right;" src="http://img.zemanta.com/reblog_e.png?x-id=8e52f4ea-d3e4-4217-8ac8-03193483b71e" alt="Reblog this post [with Zemanta]" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://cloudofdata.com/2008/12/amazon-public-data-sets-bring-the-cloud-of-data-closer/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>

