Showing posts with label disaster is inevitable. Show all posts
Showing posts with label disaster is inevitable. Show all posts

Monday, December 06, 2010

Disaster @ Tumblr

Tumblr has been down for more than 12 hours due to an issue with their database cluster. Here is the comment I left on GigaOm.com

This is the freshest lesson for entrepreneurs and startups:
- Learn to value your data
- Implement a high availability plan
- Plan a disaster recovery strategy

“Tumblr likely has the resources to recover…”

I really hope that holds out true but remember, data is the only irreplaceable asset of an organization. Once it’s gone, it’s gone.

When I was handling the disaster at Fotolog (massive database corruption when our SAN crashed), I couldn’t find any company or consulting firm ready to handle the situation and help with data recovery. It was a miracle that I came across the concept of DUDE (Data Unloading by Data Extraction) and started writing InnoDB data recovery programs in sheer desperation. In case of Fotolog, we had all basic infrastructure in place for redundancy and high availability. The component that caused the disaster was the one we relied most upon: “the financial grade strength SAN.”

The point I am trying to make is having access to cash in the bank + large userbase + really smart engineers doesn’t provide any guarantee that your data will be safe in case of a disaster.

Times like these can be of incredible stress on those handling the situation. I feel for folks at Tumblr and hoping for a speedy recovery.

Good luck Tumblr guys! You’re in my thoughts.

Frank

Sunday, July 20, 2008

S3 suffers major outage

Funny how Amazon doesn't use S3 to store any assets for amazon.comtweet by @gruber


Amazon's S3 suffered a major outage today knocking many websites offline. S3 outage started at approximately 12:00 PM EST and the last time I checked at 11:11PM EST, Smugmug, a popular photo hosting site that extensively uses S3, was still down.

- S3 down for more than 7 hours
- S3 outage, 7 hours and counting
- S3 down again
- Amazon failure downs Web 2.0 sites
- Amazon's S3 experiencing outage

Sunday, June 01, 2008

Disaster is Inevitable - Must shutdown generators

Disaster is really inevitable. Even with all the redundant power investments, ThePlanet (formerly EV1 and RackShack), had to shut down their backup generators at their H1 data center on the instructions of the fire crew. This happened after a wire-short in fault transformer led to an explosion that knocked off one of their walls, ultimately bringing 9,000 servers down. Luckily no one was injured.

This just goes on to show that just because a data center has redundant power and backup generators, it does not mean that a disaster cannot happen. IIRC, ThePlanet's last disaster was blamed on backup generators not kicking off properly.

While there was no damage to servers, I wonder how many MyISAM repairs need to be triggered once the servers do come back online?

- The Planet Status Update

Saturday, April 26, 2008

Disaster is Inevitable -- SQL Injection: Poorly Written Code and No Backups!

Let me start out by saying: the best response to a disaster is backup you can count on.

Found a scary story today about hundreds of thousands of websites using Microsoft IIS and SQL Servers being affected by Internet-wide SQL injection attacks. The story originally reported by F-Secure is now on Slashdot as well.

On the IIS forum, panic is visible. Those who had backups are breathing a sigh of relief like one administrator who commented, "We have been hit by this as well. Lucky backup ran last night just prior to the attack."

Others without backups are just screwed.

F-secure reports in an update to the story, "Do note that this attack doesn't use any vulnerabilities in any of those two applications. What makes this attack possible is poorly written ASP and ASPX (.net) code."

Although this attack is targeted towards IIS and SQL Server, there are lessons to be learned for sites using other servers and databases. There are several guides available on the Internet that will show you how to secure your application against SQL Injection attacks, like http://www.blogger.com/img/gl.link.gifthis one that is focused on securing PHP and MySQL applications.

In this year's "Disaster is Inevitable--Are you Ready" presentation at the MySQL Conference (Yes, I have read Baron's post), I covered a few types of disasters. However, I missed an important kind of disaster: ones that are caused by SQL Injection. My next presentation on this topic will certainly cover this. BTW, if you missed my presentation, you can thank Artem Russakovskii, who took meticulous notes that you can read.

What saddens me is comments like, "but we have all patches applied to the version we are using." There is of course, a disconnect here as far as understanding the problem is concerned.

Patches don't secure you against SQL injection attacks; Properly written code does. Sanity check is very important!

Replication as a backup method won't help against SQL Injection
Based on my survey, a disturbingly high number of sites use replication as their backup strategy. If replication is your sole method of backup, then beware, SQL injection based disasters aren't going to help. Unless, of course, you have time delayed slaves and are able to stop replication before the slaves are affected.

Every year there are a number of backup related presentations at MySQL Conference. All, except one of the following, were presented this year!:

- What do you mean there's no backup? -- A timeless presentation by Mike Kruckenberg and Jay Pipes originally presented in 2006.
- Backup and Recovery Basics by Kai Voigt
- MySQL Backups go near continous by David Wartell
- MySQL Online Backup: An In-depth presentation by Chuck Bell
- Online Backup, Open Replication and a world of contribution by Lars Thalmann and Chuck Bell
- Performing MySQL Backups using LVM Snapshots by Lenz Grimmer
- Top 5 Considerations While Setting Up Your MySQL Backups