Home > Monitoring, Observations, Software > Mashuper’s reality

Mashuper’s reality

In his post “Could Someone Explain Technorati” Chris Brogan wonders about the consistency, accuracy and reliability of Technorati service. I can’t explain the behavior of the system over there but I can share some of my experience dealing with different challenges using online APIs (web services) and data. The objective here is to help other mashupers to better prepare for future integrations effort across multiple web services. Since it appears that the mashupers community is growing faster than the web service provider I’m sure that more fellow API consumers can share some stories of their own. I will be happy to hear about.

I see three participants perspectives in this “love triangle”: the web site visitor, the mashuper (the API consumer) and the service provider.

My visitor experience:

Chris Brogan talks about his experience from the user perspective in his post. I have nothing to add here but I would say that as a service provider, this should be my top concern satisfying my loyal community. Maybe the way to deal with this in the case from Chris’s post is by monitoring for exceptions (drastic rise or fall in the rank/authority).

My mashup experience:

As I mentioned in some of my earlier posts (here, here and here) I’m working on a small project for finding productive bloggers by monitoring for consistent improvements in their Technorati rank. So on a frequent basis I monitor the rank for over 800 bloggers now. I plot some of the result to a designated Twitter account: blogmon.
The first set of challenge is dealing with volatile data:

  • Some times I see no authority in the results (inboundblogs).
  • Some times there is no valid last update date in the results: <lastupdate>1970-01-01 00:00:00 GMT</lastupdate>
  • Most time there is no author (the user did not add it)
  • Some time there are no tags (the user did not add it)
  • Some time as Chris mentioned the rank is off for a short period of time

For example see Seth Godin’s Blog rank history:

last update    rank    authority   
2/12/2008    19    8599   
2/25/2008    18    8697   
3/17/2008    19    8658   
3/22/2008    16    8827   
4/10/2008    15    8946   
4/19/2008    16    8882   
4/23/2008     17    8819   
5/1/2008    243       
5/12/2008     17    8828   
5/14/2008     16    8863   
5/20/2008    15    8890   

These are the details that a consumer of online volatile data must plan and look for ways to compensate for.

Some suggestions:

  • Check the validity of the date
  • Don’t just count on the last result i.e. search for the last valid result and monitor over time.
  • Be prepare to plot partial results (e.g. no top tags or author).
  • Most important: guard your data i.e. protect what that you take from the service  and store in your records.

 
The next set of challenge has to do with the web service behavior:

  • I get the fowling error once or twice: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
  • Some API requests come back with:

<HTML>
     <HEAD>
          <META HTTP-EQUIV=”REFRESH” CONTENT=”2; URL=http://api.technorati.com/bloginfo?url=****&key=****&version=0.9&start=1&limit=30&claim=0&highlight=0″>
      <TITLE>****</TITLE>
      </HEAD>
      <BODY>
           <CENTER>
               <IMG SRC=”***”>
               <BR>
           </CENTER>
     </BODY>
</HTML>

**I intentionally masked the URL, title, image and my developer Key with ****

This result can crash your system if not handled.

  • Finally: and I get this one a lot:)

<?xml … “http://api.technorati.com/dtd/tapi-002.xml”>
<tapi version=”1.0″>
    <document>
        <result>
            <error>You have used up your daily allotment of Technorati API queries.</error>
        </result>
    </document>
</tapi>

 Some suggestions:

  • I can’t picture my dev world without Exception Handling – this is the ultimate protection against web service unexpected behavior in this specific case. So guard any call, loading XML result and data parsing by wrapping them with a try and catch block.
  • Logging – log expected and unexpected behavior for later analysis and recovery.
  • Build the system so exceptions are caught, logged but the execution can move on to the next task.
  • This is something that I learned from a smart Army office: “If there is a doubt there is no doubt” basically saying that it is better to not report at all than to report inaccurate data.
  • Find ways to minimize the API calls  – e.g. I ask for tags only when I find a blog worth reporting on
  • A thought: I’m not an expert in XML and DTD but could it be that using DTD slows down the web service. If you know more about it please share with me/us. Is this really necessary on a read only calls?

    

                               TechnoratiLookingForTheMonster

About the service:

I can’t talk much about what that a web service provider feels or experience (I’m sure that Ian Kallen from Technorati has a lot to share about this subject) but I want to say few things:

  • Please don’t get this post wrong I’m a fan of Technorati – I use it and deeply  appreciate their service and thankful for having the option using the APIs . As I said earlier the intention is to share from experience and to allow you to better prepare for such effort.
  • I guess that it is hard to estimate the load on the system with such growth in the number of mashupers out there. So my heart is with them.
  • There are two more threats that the web service provider needs to protect itself from and I’m sure that those consume some energy:  protecting the hard gather data and its environments from abuse and malicious attacks.

 

One last comment: ironically I had none problems with Twitter so far:)  but I’m aware of the pain that some of the Twitter API user suffer occasionally.

Advertisements
  1. Hathorn
    January 21, 2010 at 5:02 pm

    Looks like I was a bit late on this one but it’s really a good post. I’m thinking about a reply on my site . . . Art Of War

  1. June 7, 2008 at 4:33 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: