From Microsoft’s Azure Data Marketplace to the eponymous DataMarket, or InfoChimps, Factual, and Kasabi, there’s resurgent interest in the venerable business of collecting, curating, and commercialising data created by others. But despite investment and innovation, there isn’t yet the matching evidence for much use or — even — interest amongst prospective customers. In principle, at least, these data markets should be providing valid, viable, and valuable services to a market that is potentially enormous. So why aren’t more users rushing to get at these sites?
In many ways, the core concept of the data marketplace is nothing new. Companies like Bloomberg, Nielsen and Experian have built (extremely) profitable businesses by aggregating data, quality checking it, and selling it on. Often their customers could have gone directly to the source(s) and paid far less, but they don’t. The convenience and quality assurance of dealing with a single — reputable — source is perceived to have value. A brand like Bloomberg’s is associated with trustworthiness and authority, and the brand of the marketplace is far more prominent than the data sets upon which it is built.
Similar sites have also served the needs of those seeking data for free, with IBM’s ManyEyes project, Freebase (acquired by Google), Hans Rosling’s Gapminder or The Guardian‘s Data Store amongst those typically mentioned. Current government enthusiasm for ‘transparency’ has fed all of these sites with data, and led to creation of large government-specific data repositories such as data.gov.uk.
The commercial services like Bloomberg have tended to focus upon specific domains (finance, in Bloomberg’s case) or types of data. They have also tended to be eye-wateringly expensive; aimed squarely at the small market segment for whom the data are mission-critical and the fees are affordable. The free services like Gapminder also tend to focus (global development statistics in this case). Other, perhaps, than experiments like ManyEyes, both the free and the commercial sites tended to aim for a degree of comprehensiveness and authority. They wanted to become the place to turn for their type of data.
But for the new generation of data markets, the picture becomes far less clear. They tend to be catholic in their data acquisition policies, they typically don’t even attempt comprehensiveness, they mix free (almost all of them hold identical large swathes of government data from the US, the UK, and elsewhere) with commercial data, and they continue to feel their way toward business models that might prove sustainable for the long haul. Perhaps more seriously, they appear almost schizophrenic with respect to brand projection, attempting to push both their own brand and those of the data sets they host in ways that can confuse far more often than they enlighten.
In attempting to differentiate themselves, today’s data markets are seeking to add features and functionality in order to be seen as far more than simply places to buy third-party data. They want to become recognised for quality assurance, for data enrichment, or for tools and capabilities that make working with the data easier or more powerful. They want to become sticky, and they want to be seen as different from their competitors. The trick, though, is to explain those features and those differences in ways that make sense to potential customers. Those customers will ultimately pay for functionality and utility, not for gimmicks or under-the-hood technological distinctions that have no real impact upon getting on with the job in hand. Are today’s data markets describing their features in ways that help prospective customers to understand why they should be chosen over the alternatives? Not really. At least, not yet.
Also, as RedMonk’s Stephen O’Grady touched upon amongst a set of related issues, we’ve really not begun to see much evidence of price competition. There are too few suppliers, each with their fiercely loyal bands of tame users (‘customers’), and too few people prepared to shop around for the best deal.
The new data markets are still young. Understandably, they are still feeling their way in order to understand what the market wants, how much it is prepared to pay for what it wants, how large the market might be, and what their individual niche within that broader market might look like. Earlier models, based upon almost monopolistic domination of specific verticals and polarised pricing, offer some lessons but are ultimately unsatisfactory blueprints for this more competitive, open, and complex environment. Beyond specific domains like finance (which may be ripe for disruption), the data markets must struggle to convince prospective customers that they have something of value to offer. Those customers may already have their own processes for obtaining data. They may generate the data themselves, or expect — as so many do — to be able to access what they need for free. They are perhaps suspicious of data produced by third parties who are, in other contexts, their competitors, and they are almost certainly unwilling to allow ‘the competition’ to benefit from their own data. They invariably do not understand the costs associated with gathering and quality-assuring data, or the challenge of preparing different data sets in order that they may meaningfully be combined. And into this, the fledgling data markets must insert themselves, market themselves, and sell themselves. They must change behaviours, they must challenge presumptions, they must alter working practices, and they must persuade their new customers that all of this pain is worth paying for. A tall order, indeed, but necessary if any of them are to realise their potential.
The European Commission, at least, begins to comprehend the scale of the challenge. A set of projects are currently being finalised, and this year will see European SMEs given the funding to boot-strap a number of new data sources. With Commission funding, it is hoped, the chosen projects will be able to explore models by which data can be created, curated, shared and re-used in a manner that is cost-effective and ultimately sustainable. The funding should enable these projects to reach viable scale, and give participants the freedom to explore alternative commercial models. The projects will be announced shortly, but only time will tell if the funding and the incentives are sufficient to break through the barriers that prevented any of these markets from forming by themselves.
But outside the rather artificial bubble created by European public funding, there is a lot of work to do. Investors are intrigued by — but still wary of — the opportunity. Infochimps is spending its way through over $1.5 million of investment, Factual has almost $30 million, and companies like Talis and Microsoft are making not-insignificant investments in their own efforts. We’re all still experimenting, but with the real market for these services currently falling far short of the money at stake, it mustn’t be long before investors start asking harder questions. Back in 2010, Pete Soderling and Pete Forde described data as a $100 billion market. The data markets may be after a significant chunk of that but, today, they’re not even close.
The ways that data markets are attempting to differentiate themselves, and the work being done to understand the market opportunity here, will have to wait for subsequent posts.
Disclosures: I am a former employee of and current shareholder in Kasabi’s parent company, Talis. The European Commission is, from time to time, a client.
Related articles
- With Factual, 1 API now unlocks data for 55 million places (gigaom.com)
- ‘Roswell’: Another key component of Microsoft’s cloud strategy (zdnet.com)
- Infochimps Acquires Y Combinator Startup Data Marketplace, Expanding Brand Holdings and Online Presence (prweb.com)

Pingback: Data Market Chat: the podcasts are a-coming... | Paul Miller - The Cloud of Data()
Pingback: Windows Azure and Cloud Computing Posts for 1/16/2012+ - Windows Azure Blog()
Pingback: Tyler Bell discusses Factual | Paul Miller - The Cloud of Data()
Pingback: Data Market Chat: Tyler Bell discusses Factual()