The Blogsphere system of record – not another blog search engine (or why Google got it wrong)
It is not enough to support finding blogs that contains a submitted word or a phrase. It is not enough even to find tagged blogs. This is just one use case out of many to support the blogging world.
If you want to support the blogger you need to look at the entire picture. Google, building Blog Search, continued to index blogs content the same way it did for web sites, not so useful and you’ll see why in a minute, but Technorati on the other hand got it right from the start.
Let me start by defining the term “System of Records”: the wikipedia’s definition for the system of record is correct but maybe too abstract. The key element is that the system of records contains both the raw data (the content) plus the metadata that help to support different business use.
The difference between Technorati and Google is that Google has the raw data and some indexes. Technorati has the content and the rich metadata!
Why is this metadata so important? To answer this question we need to understand what bloggers are doing.
So what bloggers are doing?
Most bloggers are driven by clear objective(s) and look for supporting services to accomplished them. They like to be seeing, i.e. to find readers, to be able to interact with their readers. They like to co-operate with other bloggers or to react to other blog post. Bloggers like to share content with others. Bloggers love comments, traffic and communities.
If you agree with me here then you’ll agree that it is not enough to show you a list of blog post that contains the word “system of records”.
So what do we have today supporting the blogsphere:
- Blog authoring tools like WordPress, Blogger, Typepad and more that help us write, present, organize, tag, bookmark content and to make the connections between the blogger and its community (different gadget and plug ins(like the one for Facebook and Yahoo’s MyBlogLog).
- Social networks and bookmarking that help to socialize new posts – I won’t even bother listing the numerous options in this category.
- Readers and news aggregators to scale reading new posts
- Blog search engines and directories
- Technorati and StumbleUpon that does few more things
The few more things that Technorati does:
- Make the connection between the blog and the blogger’s profile – allow bloggers to claim a blog
- Show reactions to the blog posts – blog authority
- Show what is hot (what’s percolating in the blog world now)
- Show the top 100 blogs
- Rank blogs relative to other blogs
- Categorize blogs (directory)
- Show what is popular and uprising
- Finding new content and keeping the existing information current – spider, pinging
To provide these capabilities Technorati has to collect, build, update and store a lot of information beyond the actual blogs content. I would claim that Technorati holds toady the best blogsphere system of record. And I would add that they organize the data in such a way that Technorati are now capable of supporting new business use cases.
One example I can think of is adding a function where the focus is changed from the blog post to the bloggers providing bloggers segmentation tool. One combination is finding bloggers from a certain category (technology, video stream), rank (mid to top), and preferred media (audio, video).
I plan to write another post to discuss bloggers clustering in more details with the intention to show what additional meta data can be added to the blogsphere system of record. Hopefully with new data new business use cases can be supported.
Full discloser: I don’t work or invested in Technorati I just appreciate people that appreciate data and treat it the right way:)