Thomson Reuters‘ Open Calais team have clearly been busy, with several announcements at the Semantic Technology Conference here in San Jose.
On 15 June the company rolled out version 4.1 of Open Calais, embracing Spanish language content and the notion of ‘social tags;’
“OpenCalais is a great semantic data extraction engine. If you write an article about the relative merits ofPorsche and BMW at the test track in Leipzig, we’ll diligently identify Porsche and BMW as companies and Leipzig as a geography. We’ll create Linked Data URIs to represent these things and open up access to theLinked Data ecosystem so you can enhance your article with other content assets.
But… sometimes you just want a great description. The kind of tags a human would put on the article. Like “Car racing” or “Automobiles”. The kind of tag that would, for example, be very searchable and therefore …. SEO’able (that is definitely is not a word).
In 4.1 we’re introducing OpenCalais Social Tags. Social Tags is our attempt to emulate how a human might tag the document. Social Tags does some fairly sophisticated analysis of your entire document and maps it to a knowledgebase based on Wikipedia and other assets. From that process we generate Social Tags.“
This morning, they followed through with a further pair of announcements. Firstly, CNET has been joined by The Huffington Post, DailyMe and the UK’s Mail Online in integrating Open Calais into their workflow.
One of the more interesting aspects of the earlier CNET announcement was the contribution of data back into the pool;
“CNET joins Thomson Reuters as one of the first commercial media companies to publish core data assets for public, programmatic use on the open semantic Web. CNET will leverage OpenCalais’ connection to the rapidly expanding ‘Linked Data cloud’ to allow its original content — such as tech product reviews on laptops, TVs, smart phones, and digital cameras; news articles and blog posts from its CNET News editorial staff; and parts of its core technology product catalog – to be available for public use.”
It will be interesting to see whether these latest media properties are able and willing to do something similar.
The second element announced today in many ways mirrors Amazon’s recent Import/Export service. Thomson Reuters, too, have recognised that it remains impractical to move large quantities of data over the network, and today announced their ‘Archive Express’ which will process up to 20 million documents off a physical storage device within 24 hours, free of charge.