Monday, December 06, 2010

Disaster @ Tumblr

Tumblr has been down for more than 12 hours due to an issue with their database cluster. Here is the comment I left on GigaOm.com

This is the freshest lesson for entrepreneurs and startups:
- Learn to value your data
- Implement a high availability plan
- Plan a disaster recovery strategy

“Tumblr likely has the resources to recover…”

I really hope that holds out true but remember, data is the only irreplaceable asset of an organization. Once it’s gone, it’s gone.

When I was handling the disaster at Fotolog (massive database corruption when our SAN crashed), I couldn’t find any company or consulting firm ready to handle the situation and help with data recovery. It was a miracle that I came across the concept of DUDE (Data Unloading by Data Extraction) and started writing InnoDB data recovery programs in sheer desperation. In case of Fotolog, we had all basic infrastructure in place for redundancy and high availability. The component that caused the disaster was the one we relied most upon: “the financial grade strength SAN.”

The point I am trying to make is having access to cash in the bank + large userbase + really smart engineers doesn’t provide any guarantee that your data will be safe in case of a disaster.

Times like these can be of incredible stress on those handling the situation. I feel for folks at Tumblr and hoping for a speedy recovery.

Good luck Tumblr guys! You’re in my thoughts.

Frank

1 comment:

Kedar said...

Right Said!! I've not yet forgotten magnolia.