Wednesday, December 08, 2010


Since most of my friends here have asked me questions on SEO. I thought i'd share this presentation that I posted on SlideShare. This was a pre-engagement presentation. It contains over a 100 slides and covers many dimensions of SEO. Enjoy!

Monday, December 06, 2010

Disaster @ Tumblr

Tumblr has been down for more than 12 hours due to an issue with their database cluster. Here is the comment I left on

This is the freshest lesson for entrepreneurs and startups:
- Learn to value your data
- Implement a high availability plan
- Plan a disaster recovery strategy

“Tumblr likely has the resources to recover…”

I really hope that holds out true but remember, data is the only irreplaceable asset of an organization. Once it’s gone, it’s gone.

When I was handling the disaster at Fotolog (massive database corruption when our SAN crashed), I couldn’t find any company or consulting firm ready to handle the situation and help with data recovery. It was a miracle that I came across the concept of DUDE (Data Unloading by Data Extraction) and started writing InnoDB data recovery programs in sheer desperation. In case of Fotolog, we had all basic infrastructure in place for redundancy and high availability. The component that caused the disaster was the one we relied most upon: “the financial grade strength SAN.”

The point I am trying to make is having access to cash in the bank + large userbase + really smart engineers doesn’t provide any guarantee that your data will be safe in case of a disaster.

Times like these can be of incredible stress on those handling the situation. I feel for folks at Tumblr and hoping for a speedy recovery.

Good luck Tumblr guys! You’re in my thoughts.


Friday, December 03, 2010

Sequoia backs MongoDB with $6.5M investment

Some exciting news coming from 10Gen, the company behind MongoDB. It announced today that Seqouia is investing $6.5M in it's high performance, document-oriented (BSON), key-value based NoSQL solution that supports automatic sharding and dynamic queries. Foursquare, Disqus, Etsy, Sourceforge, eVite, EventBrite and New York Times are all users of 10Gen. The features this young NoSQL solution offers is truly impressive. See MongoDB page on my Big Data Low Latency site for quick review of MongoDB.

I had the opportunity to meet with Roelof Botha few months ago as Sequoia was looking to invest in the NoSQL space and was evaluating both hardware and software solutions to solving big data challenge. Since then I was eager to hear which of the many startups in the NoSQL space will receive Sequoia's blessing. Now we know :)

Wednesday, December 01, 2010

Video: Netflix's migration to AWS cloud

Found this video regarding Netflix's migration to Amazon's AWS cloud very informative. Enjoy!

Cloud Migration Whitepapers

Amazon's AWS team has published a series of whitepapers covering various scenarios for migrating into AWS cloud infrastructure. Links to these whitepapers are provided below for your convenience:
Cloud Migration
- Migrating applications to the AWS cloud
- Migrating web application
- Migrating batch processing applications
- Migrating backend processing pipelines

Big Data: Freedom or Something Else?

Googling around, I came across Bradford Cross' article, Big Data Is Less About Size, And More About Freedom. Bradford writes, " The scale of data and computations is an important issue, but the data age is less about the raw size of your data, and more about the cool stuff you can do with it."

Even though the article makes some good points, I'm not sure I can agree with Bradford's point of view here. As an architect, when I think in terms of Big Data, the ability to do "cool stuff" is probably the last thing that crosses my mind. Big Data, to me, is about ensuring constant response time as the data grows in size without sacrificing functionality.

What do you think Big Data is about? Is it merely about being able to do 'cool stuff' with your data? Is it about ensuring constant access/response times? Or is it about something else? I'm eager to hear your thoughts.

Tuesday, November 30, 2010


Alaric Snell-Pym discusses why choose between SQL and NoSQL? Why can't you use both in your infrastructure?

"NoSQL engines abandon SQL for the chance to have more flexible data models and softer semantics for update operations - but they also abandon it because it’s a lot of work to implement. And, creating a new database from scratch, they’re keen on solving the interesting hard problems (such as replicated data storage), rather than following the well-trodden path of writing SQL parsers and query planners, with a few decades of catching up with the competition ahead of them."

Hate the dirty recruitment tactics

I hate it when recruiters reach out to you with a message indicating that they are looking for 'key positions' and when you follow up, the tone changes to "we're just looking for engineers." This happens all the time and the latest company to play this recruitment tactic is LinkedIn. Guys, can't you decide whether you are looking to fill a 'key position' or just an engineering position before reaching out to candidates? I can see that mentioning 'key position' will get a candidate's attention but this is just a low level tactic.

Lean Startups and Scalability

I wrote this as a reply to Does Lean Startup Methodology Apply to Consumer Startups?" However, due to comment length restrictions on that blog, I am posting my comment here and welcome your thoughts.

"An enterprise will pilot products and iterate with a vendor: Let's run a 6 month consulting engagement/pilot to evaluate if this new database solves the problem."

Only an enterprise where there is a major disconnect between management and engineering will opt for this path. In enterprises where needed data I/O patterns are understood, taking such path may spell disaster.

The primary problem with the 'lean startup' methodology that I see is that it blindly preaches entrepreneurs to close their eyes, cut corners and just get the product to market without fully understanding the future scalability needs. Scalability doesn't has to be sacrificed in order to build a lean startup, except in those cases where there is no architect on board.

Many entrepreneurs think of taking route of frameworks in early days only to find out the haunting effects as growth happens.

Case studies exist where massive infrastructures with low latencies have been built without blowing budgets and need for re-architecting infrastructures. The concept of lean startups is great, but incomplete (especially as it is being preached).

When the right architects and team is on-board, building the right way also becomes the lean way.

<update> Let's not forget that data is the most valuable asset of an organization and every migration a great opportunity for a screwup. Do you really want to migrate it around from database to database as you sit with your fingers crossed hoping that the latest vendor will solve your problem? Or does it make more sense to invest in an experienced architect and then make decisions rather than shooting in the dark? I'll let you decide the rest.

Monday, November 22, 2010

Probably the worst way to deal with a stuck query...

is to disable a customer's account for more than 24 hours without any warning whatsoever. This happened to one of my accounts and I'm beyond furious at the database and network administrators of Seriously, guys, I don't know of a more unprofessional way of dealing with a stuck query.

Wednesday, October 27, 2010

MySQL at Facebook

Mark your calendars for Nov 2 as Mark Callaghan and Facebook's MySQL team will be talking about how MySQL is used at Facebook.