Paul Miller

The Cloud of Data


Data Market Chat: Hjálmar Gíslason discusses DataMarket.com

Image representing DataMarket as depicted in C...

Image via CrunchBase

With Iceland’s DataMarket.com, Founder Hjálmar Gíslason is on his fourth startup, and ready to expand overseas. Focused upon becoming “Google for datanumbers,” DataMarket concerns itself with collecting and providing access to quantitative data; numbers from governments, international agencies, and commercial providers around the world.

Alongside the business of collecting data and making it available for download, DataMarket has invested in providing tools with which users can visualize data (typically in the form of a graph) and even compare results from diverse sources. Hjálmar sees these tools as part of a strategy to ensure that it is

“more desirable to use data on DataMarket than at the source.”

Hjálmar also discusses his view that four characteristics of data make it profitably sellable; proprietariness, timeliness, analysis, and curation.

Have a listen to learn more about DataMarket, and to hear Hjálmar’s thoughts on an industry segment that his company has done much to shape. And check back on Tuesday for the next podcast in the series; Ian Davis of Kasabi.

Following up on a blog post that I wrote at the start of 2012, this is the third in a series of podcasts with key stakeholders in the emerging category of Data Markets. Other conversations, all of which will be published here, have been scheduled with AggData, BuzzData, Factual, Infochimps, Kasabi, and Microsoft. I am still adding conversations to the series, and intend to talk with more companies and with analysts and investors with insight to share. 

Article Tagged: , , ,
No Comments

CloudCamp London: the Big Data Special

Big Data

Image by Kevin Krejci via Flickr

The CloudCamp unconference returned to London for the 14th time this evening, regaling a capacity crowd in the Crypt below Clerkenwell’s St James Church with several hours of discussion and debate on the somewhat elusive topic of ‘Big Data’.

Rather rough notes of the proceedings follow, after the break. Read the rest…

Article Tagged: , , , ,
No Comments

TOSCA may prove a prescient name for new cloud standards effort

Poster for the opera Tosca by Giacomo Puccini

Image via Wikipedia

Last week, open standards body OASIS unveiled yet another shiny new standards effort. The OASIS Topology and Orchestration Specification for Cloud Applications (TOSCA) Technical Committee hopes to make it “easier to deploy cloud applications without vendor lock-in,” and to support moving from one cloud to another. The usual suspects — the likes of IBM, CA, and Cisco — are on board. The usual holdouts — Google and Amazon, of course — are not. So what is TOSCA trying to achieve? How does it fit alongside all the dead, dying, or ponderously deliberating cloud standardisation efforts that have gone before? And without the giants of the cloud, is there really any point bothering?

As I’ve probably mentioned before, involvement in various national and international standardisation efforts played a big part in my early career. I went to the working group meetings in odd (but often beautiful) locations. I participated in the conference calls. I engaged on the mailing lists. I drafted and edited and reviewed the documents. I completely buy into the idea that there is a place for foundational standards, developed through consensus-building and maintained for the long haul by organisations that stand apart from the vested interests and their competing agendas.

I also believe that there’s a time and a place for these standardisation efforts. Do it too soon, and we end up ossifying something that needs to be in a state of flux. When you don’t know what the best way to prepare a meal is, it’s too soon to print the recipe book. We need to try different approaches, and we need to be able to throw away the attempts that didn’t work out. More worryingly, standardisation efforts can be used for political ends. They can be little more than a rod with which to beat the (usually dominant) competition. At best a distraction, or a talking shop for those unwilling or unable to just get on and do something. At worst, one amongst a toolchest of dirty tricks in a broader war for hearts, minds, and — ultimately — wallets.

The cloud market is a fascinating place. There are leaders and there are followers. There is innovation, and there is competition. There is agreement, and there is debate. For all the rhetoric, and all the posturing, we really don’t yet know the right answer to many of the cloud’s questions.

Maybe TOSCA and the Open Data Center Alliance and IEEE and the rest are — still — too early, and should be content to let the market thrash out a few more of these issues before anyone tries to write anything down? And when it is time to write some stuff down, let’s make sure we focus on specific, finite, tangible, atomic tasks rather than “the cloud.” As Dave Roberts commented in regard to TOSCA’s scope;

“That goal is so large, that I think it’s probably unbounded. When problems get unbounded, the best you can ever hope to achieve is to solve a large enough subset of the problem that the solution is still interesting. If you can’t achieve that, people ignore the solution because it fundamentally doesn’t help them. There is always an ‘interesting’ part of the problem space that they have to solve a different way, and that undercuts the use of the partial ‘solution.’”

And as for Tosca? Things didn’t end well for her, did they? Might TOSCA’s fate, too, be sealed?

Article Tagged: , , , , ,
No Comments

Data Market Chat: Chris Hathaway discusses AggData

Image representing AggData as depicted in Crun...

Image via CrunchBase

Chris Hathaway sees basic location information scattered across the websites of hundreds — or thousands — of coffee shop chains, hotel groups, and fast food joints, but argues that it’s almost impossible to do anything more sophisticated with the data than find your closest Starbucks. His company, AggData, is attempting to fill what he sees as a gap in the market; scraping addresses and other facts off company websites to create simple files of store locations that can then be enriched with coordinate data and sold.

Customers for this data include competitors, market researchers, consultants, and even the companies themselves; as is so often the case, it can be easier to buy data on store locations from a third party than to find the authoritative sources within your own organisation. AggData is strongest in the US today, but also offers a growing body of data for other countries. Although the data files are structurally simple, Chris sees plenty of opportunity to continue collecting and selling data to a growing community of customers.

Unlike Factual, which was the focus of last week’s podcast, AggData is not currently interested in combining data from different sources. Customers download separate files on the locations of Starbucks, Peets and Tim Hortons, and not a single aggregated set of coffee shop locations. The AggData model is also predicated upon using their own scripts to extract data from third party sites; asked if he would accept a file of WalMart store locations supplied by WalMart, Hathaway explained why he would — and does — decline.

Have a listen to learn more about AggData, and to hear Chris’ perspectives on the potential role of semantic technologies in making his job easier. And check back on Thursday for the next podcast in the series; Hjálmar Gíslason of DataMarket.com.

Following up on a blog post that I wrote at the start of 2012, this is the second in a series of podcasts with key stakeholders in the emerging category of Data Markets. Other conversations, all of which will be published here, have been scheduled with BuzzData, DataMarket.com, Factual, Infochimps, Kasabi, and Microsoft. I am still adding conversations to the series, and intend to talk with more companies and with analysts and investors with insight to share. 

Article Tagged: , , , , ,
No Comments

Data Market Chat: Tyler Bell discusses Factual

Factual logoHaving received some $27 million in investment from big names like Andreessen Horowitz, LA-based Factual is one of the better funded examples of a ‘data marketplace.’ But Tyler Bell, the company’s Director of Product, is not sure that Factual necessarily fits most people’s perception of what a data marketplace should be.

Focussed — for now — upon aggregating location data, Factual provides access by API or download to a pool of over 55 million places in the US and other territories. A key differentiator for the company is their investment in cleaning and harmonising information drawn from multiple sources. API-based services such as Crosswalk and Resolve enable developers to cope with the very different ways in which third party services like Yelp, Foursquare and Gowalla reference a single restaurant or coffee shop.

Tyler suggests, though, that location data may just be the start;

“Factual doesn’t necessarily want to be a location-only company. Really what we’re doing is we’re cutting our teeth on location now, and places… It’s just a wonderful way to learn how to refine your business and of course how to refine your technology stack… But for the immediate future, you’ll see us focus primarily on places.”

Have a listen to learn more about Factual, and to hear some of Tyler’s perspectives on the utility of good, comprehensive data. And check back on Tuesday for the next podcast in the series; Chris Hathaway of AggData.

Following up on a blog post that I wrote at the start of 2012, this is the first in a series of podcasts with key stakeholders in the emerging category of Data Markets. Future conversations, all of which will be published here, have been scheduled with AggData, BuzzData, Datamarket.com, Infochimps, Kasabi, and Microsoft. I am still adding conversations to the series, and intend to talk with more companies and with analysts and investors with insight to share. 

Article Tagged: , , , , , ,
3 Comments
Rss Feeds