shift | Webnomena

Real-time search – the missing piece

November 11, 2009 Keren Dagan Leave a comment

Shifting the problem from finding content to finding people for search, discovery and filtering is not enough.

The evolution of finding new and engaging content:

Step 1: We started by searching for engaging content using search engine like Google or blog search engine/directory such as Technorati. These search engines operates web crawlers scanning the web for new information, then index (categorize) and rank web pages using different algorithms. As time went by we started adding blogs feeds (using the RSS and ATOM protocols) to our feed reader of choice like Google Reader.
Results: with some effort we managed to find great bloggers to follow, but new content was slow to arrive, it was slow to discover, and even after awhile we ended up with not enough variety. No wonder it was a dead-end!
Step 2: step #1, plus finding the people behind the content, following their feeds on social media tools (twitter, FriendFeed, facebook etc.).
Results: initially, we got faster and richer content , but it got messy very quickly (especially when we auto follow back), it was also overwhelming at times, and lots of people share the same content (whether it is lame or great). Add to the feed stream cacophonies the fact that people are using these channels for chatting with their peers, sharing thoughts and feeling, promoting their business/products/services and we end up with yet another dead-end!
Step 3: step #2, plus lists. Now we can group people into categorized twitter lists, and follow their tweets.
Results: Now, the content is a little less messy because we have more control over the data filtering. The process for building your own list is very slow and tedious at the moment, but you can use other’s lists via listorious or tweepml. On the flip side it requires coming up with a new process for scanning the lists timelines (how frequently? whom to give more attention? adding/removing tweeps), and you can easily end up with too many lists. The worse part is that the people on the list not always share just about the subject that matches the list category. Bottom-line, it is somehow better than step #2 but not by much – another dead-end?

Content by people

In steps #1 we let the crawler to find and categorize the content and it was up to us to find it. In step #2 and #3 we shifted to people search and then we let them drive content to us. This time the crowd took care of the categorization tasks; finding and matching people to domains of knowledge. People categorized themselves and others, built many great lists, follow other lists (indication of popularity) and shared them for us to grab.

The shift

In the process from #1 to #2 we shifted the content discovery problem to people discovery problem. Due to this shift we gained big time in scale, arming the entire web community to search for new content. We accelerated discovery and knowledge gain. We also gained speed over RSS or the web crawler. Among the changes, going from steps #1 to step #3, the focus shifted from filtering content to filtering people (lists).

Small pause to recap: we have categorized content thanks to search engines and tags, we have people grouped by categories thanks to the people, but we still have a lot of noise.

The missing step

In my opinion, we are missing a step. I think that we ought to get back to the computerized categorization. We need a crawler, to categorize and rank the data in the context of the list.
I would like to be able to filter list timeline view by: links only, discussion threads only, and even more important by content that matches the list’s definition in the first place.
If I follow a list that discuss mobile phone technology I want to see only mobile phone technology related content.

Picture credit orangeacid

Categories: Method, Monitoring, Observations, Software Tags: content, discovery, Listorious, Real-time, search, shift, tweepml, Twitter, twitter lists