Google Engine | Webnomena

Outsourcing Data Producer’s open API development and support – a new business opportunity?

July 5, 2008 Keren Dagan 3 comments

So you built a new Web 2.0 like service. It gets some traction and people are crowding in. The site just published an open API. Before you know it, the system crumbles down under the weights of its own success. Sounds familiar?

If your API only exposes “read only” API i.e. an option to pull some of the data out of the system you’re only in half of a trouble. In a case where your API allows the “writing” option too, i.e. modifying system records in the database, now things gets really interesting. Example for read only: Technorati provide blog, bloggers and posts information to Internet Bots and badges. Twitter is an example for both read and write API. Bots built using Twitter API can get members’ status updates as well as automating posting status messages.

Btw, both of te above examples are for Data Producers that are working night and day to scale these open API support.

The problems are generally the same and so does the solutions: performance (throughput and latency)hardware sizing and cost, traffic pattern predictability, load balancing, throttling, caching, stateless web nodes, multi-casting, table partitions (having skilled=$$$ DBA for building high availability database), backup, API format (there are too many of them), message queuing, redundancy, recovery, security, quality of service (premium services), statistics, logging, error handling, monitoring, abuse protection, you name it.

Gnip is a new start-up founded by Eric Marcoullier that is working to address some of these common problems. Reading their blog shows how much thought and sweat is put into addressing some of these common scalability problems. They aim at addressing some the other pain point in the open API arena like consistent API and Identity discovery. Having a consistent/normalized entity ID across multiple web services can solve one of the biggest obstacle today for using WYSIWYG mashup tools like Popfly – but this is for another post:).

If you fit the profile description from the first paragraph reading more about them icould help, but this is not all this post is about.

Let’s assume that the “read” part is getting better due to service like Gnip (ping in reverse) same way that blogging platforms improved new posts indexing using ping service. Now, bots and mashup services don’t need to be “busy waiting” on the API. What about the “write”? What can be done to make this reusable and scalable?

I think that it all come to a new opportunity here to outsource the entire open API development and support, and to save a bundle. Here are some ways to save on this effort through consolidation.

Reusing hardware through hosting solutions whether physical or virtual like Amazon EC2 or Google App Engine
Reusing technologies implementation and integration like using memcached, terracotta and many more
Reusing expertise – Database Administrator and Security experts
Protocol and meta data standards
Monitoring tools and technics

Saving: blood, sweat, tears, grief and reputation (in other words avoiding embarrassment).

Bottom line is, that in my opinion, Gnip take it a good distance forward but there is a room for another reusable, consistent and scalable layer between the Data Producer and Gnip.

What do you think?

Categories: Method, Observations, Software Tags: Bot, Data Producers, EC2, Eric Marcoullier, Gnip, Google Engine, memcached, Open API, Outsourcing, Ping Service, Popfly, Reusable, Scalability, TerraCotta

Webnomena

Archive

Google Engine Vs. Amazon EC2 – using Google Trends

Outsourcing Data Producer’s open API development and support – a new business opportunity?