Sunday, January 27, 2008

I WILL NOT BLOB

A comment was left for me by Roland and that made me want to write my thoughts.

My thoughts have changed on this topic over time. Primarily, because I now work on a large scale infrastructure where I can see how things would have been if BLOBs were stored in database. Most of the following discussion keeps the size and scale of my work load in mind and is targeted towards those who are interested in evaluating BLOBs for a work load, access patterns, budget and performance requirements similar to ours. The following are my thoughts and opinions.

When I started my current job, we were near 4 million members and just over 100 million photos. There were a lot of performance issues at that time. Today we have 14 million members and reaching 400 million photos and topping 152 million page views a day with at times reaching more than 900,000 photos a day and 8.9 million comments a day. During this time, on the database end, we only recommissioned one new test database server as production server in our guestbook cluster. Other than that one "addition", all our growth is being currently served from the same number of purchased servers/instances in our photo, friend/favorite and guestbook clusters. In that sense, one can say we have been scaling up with MySQL and Solaris 10 on Sun hardware.

There are more than available features, cost and quantity discount factors to take into account when making a decision about the DBMS that should be used for a particular environment and architecture. Just because a data type (that you weren't going to use) is available in a DBMS, shouldn't be the reason to pick it.

When you're in charge of data, you have to wear many hats to make decisions. As a data architect, your job is to take a principle based approach that is not influenced primarily by what extra features a DBMS offers but rather takes into account the "organization's strategic goals," and its budget. Experience in architectures is an important requirement to being an effective data architect as an approach considered optimal in one environment may not be equally ideal in another environment especially given unique constraints of each organization. Same is true from a database administrator point of view except that now maintainability, performance and scalability requirements are of utmost concern to you.

I was not the decision maker when my company decided to store images on file system, however, I couldn't have been more happier. If given the choice tomorrow, I would go with the decision of keeping photos outside the databases.

For an extreme high performance environment like ours where there is constant explosive growth, BLOBs could not have been the SOLUTION. In fact, had we stored nearly 400,000,000 photos (we are about to reach that milestone probably tomorrow) in *a* DBMS, we would be not only be crippling our performance and forcing our existing database infrastructure to become mostly unusable. For us, it's not just about storing 400 million photos. It's also about handling load that is generated by serving more than 4 billion page views a month. I would be very interested in seeing if any company, that is similar in specs like us and has intense performance requirements per server, like ours, uses BLOBs to store their images.

The concept of databases is generally suited to storing a very large number of objects that are small in size. Here, both "very large" and "small" are relative to each industry and requirements of the system. When you look at overall picture, file systems are more suited to handling large objects, especially if the large object consists of an image.

The decision of whether or not to store images as BLOBs MUST NOT BE DONE based on what your DBMS offers (even if its Oracle, SQL Server or MySQL). If you primarily base that decision based on your DBMS offering alone, you have done a great disservice to your organization in my opinion. The decision should be based on your performance requirements, scalability needs, ease of maintenance, cost of future features to be added to the application etc., and not on what you are more comfortable using.


Just for a moment, we have to think outside the box to be able to understand the argument of why storing BLOBs in file system is a wiser decision especially in a data-intensive web based environment.

When a client browser requests a page, it parses it out for presence of any media files and then makes a separate request to retrieve and display each media file. Typically, you can query the database for the data elements and then for each media request made later, simply use the file system to serve the request. If, however, you store media files as BLOBs, then for each media request you will need to communicate with database to get the contents of the requested media file. This can put unnecessary load on your database and increase I/O contention on your database server. Then, suddenly, you have your database server doing a lot of IO activity and your plans to keep disk I/O to minimum and serving your working data set mostly from memory goes hay wire. Some would argue that the cost of opening files is greater than accessing it from database but I wonder how long that would hold true for large media files.

For my company the size of the images is in double-digit (TT) terabytes. The size of databases is in multiple, but single digit terabytes (T). It makes no sense to me now that a database administrator would agree to bloat their databases to add TT more terabytes when they could have kept them at T terabytes and kept the other TT terabytes away from the database.

Unless the BLOB is kept in a separate table, say photo_blob, from the main table, say photo, you are going to make schema changes a nightmare for yourself, since you will be performing migrations on an additional TT terabytes. Suddenly, upgrading your database to a new version would seem like a slow walk in hell.

Same remains true for backups of database servers. By keeping the media files away from your database, you relieve your database from having to process TT terabytes of data for backup purposes which can be performed separately using file system backup tools.

If you use snapshot software to take database backups, there is something else to worry about. Snapshot utilities such as fssnap can create snapshots very quickly but as data is written to the original volume, the backup store files are also updated. This means in large environments as write load on database increases, so does your performance issues while the snapshot is held. With BLOBs you are now keeping snapshots much longer on database server so the additional TT terabytes can be backed up. While this happens, your performance will keep crippling. Instead of releasing snapshots within 5 hours, you are now waiting more than 24 hours. Yes, you will still have to backup the TT terabytes of media files, but at least you won't be making it the headache of your poor database server who already has so many things to worry about. By keeping media and its associated metadata separately, you can apply optimal backup techniques to each environment.

Oh, and what happens when you have to reset replication and move extra TT terabytes to slaves?

And let's not forget about disaster recovery issues where you need to recover data from crashed databases. More than half of your data would avoid having to be recovered in case of a database crash, if stored on a separate file system.

As we move towards higher and higher resolution devices, the size of the media files will continue to grow. 20-30MB image files are not rare anymore. Do you really want to store high resolution media in your database? If not, will you store low resolution media in database and high resolution on file system? Why?

Then there is the whole issue of fragmentation of database storage compared to file system storage. Not to mention making a mess by mixing sequential I/O and random I/O instead of separating them and benefiting from the separation.

This is not just an issue with MySQL. A 2006 study conducted by Microsoft researchers states:
"One surprising (to us at least) conclusion of our work is that storage fragmentation is the main determinant of the break-even point in the tradeoff... In essence, filesystems seem to have better fragmentation handling than databases and this drives the break-even point down from about 1MB to about 256KB."


What about Portability? Image formats such as PNG and JPEG are platform independent, whereas your database files may not be.

Of course, there are issues with garbage collection when you store your media outside of database. However, these issues should be trivial to solve for any experienced developer.

The decision to store BLOBs on file system makes even more sense from financial point of view. If an organization of our size was to move to Oracle or SQL Server, then they can significantly reduce their licensing costs by storing TT terabytes of media files on file system. I wonder how much our Oracle licensing costs would be to store and serve TT terabytes. Even with MySQL, our license fee for support would be significantly more for additional servers employed to serve TT terabytes of media.

I'll end with this thought: If you're storing your BLOBs, especially large media files in database, aren't you choosing the most expensive file system available?

Further reading:
To BLOB or Not to BLOB: Large Object Storage in a Database or a Filesystem.
To BLOB Or Not To BLOB by Deep Select.
When to put images in MySQL by Sheeri

13 comments:

rpbouman said...

Hi Frank,

interesting and well-written post. My comment was indeed intended in the way you eloquently describe it in your article:

"There are more than available features, cost and quantity discount factors to take into account when making a decision about the DBMS that should be used for a particular environment and architecture."

and

"For us, it's not just about storing 400 million photos. It's also about handling load that is generated by serving more than 4 billion page views a month."

Anyway - it's great to hear your point of view - thanks for sharing it.

Roland

Unknown said...

hi there, thanks for this post. There's a dearth of anecdotal evidence on the topic of "db or fs?" ... some more resources: a thread on the tomcat dev list recently explored this too ...

http://mail-archives.apache.org/mod_mbox/tomcat-users/200802.mbox/%3c47B08812.8000006@compulsivecreative.com%3e

http://mail-archives.apache.org/mod_mbox/tomcat-users/200802.mbox/%3ced5105470802100246n41aa5c12w87ec26494bc0ea3d@mail.gmail.com%3e

Anonymous said...

Flickr has a massive image database. They store images in the MySQL database. I have not seen any performance issues with Flickr.

Frank said...

Hi anonymous,

From comments on http://www.highscalability.com/flickr-architecture and http://sheeri.com/archives/39 and this presentation by Cal Henderson (http://www.niallkennedy.com/blog/uploads/flickr_php.pdf), Flickr only stores image reference in db.

If you believe otherwise, please update me and provide reference.

thanks!

Frank said...

hi brien,

thank you for the links.

frank

Anonymous said...

Hello Frank,

I stand corrected. Flickr stores only image refs in the DB, the actual image files are stored on the file system.

Thanks for the informative article.

Anonymous said...

thanks for this.I have finally made up my mind on this issue.

Michael said...

I think your points are valid and frankly, BLOBs should be stored in a file system.

There are some situations that make the decision less clear cut. For example:

Synchronizing backups between filesystem and database to ensure recovery would contain matching data.

Also, crashes of the filesystem could render some files invalid. DBs tend to handle such matters well with their transaction logs.

If you require BLOBs to be inserted/updated in a transaction you will find yourself rolling your own transaction code to work in conjunction with database transactions.

...Michael...

Anonymous said...

With regards to transactions and backups, ensuring the data is on disk before you commit the pointer in the DB is sufficient for most cases. For backups, if you're inserting as above, then you can always safely dump the DB then the FS. Sure the FS might contain more data than in the DB, but it won't contain less.

I guess I'm not seeing the overhead as much of a detraction.

Artem Russakovskii said...

Excellent post, Frank. Made up my mind about BLOBs as well. I wonder if Paul and PBXT will be ever able to solve the issues here.

Anonymous said...

Thank you for posting this.

The opinions and advice from experts like yourself are very helpful for the less knowledgeable, like myself.

Thanks again.

Anonymous said...

Thanks for your article. How do handle transactions when you are dealing with a file system.

-Manoj

test said...

Thank you for the article. I was going to go the DB route, just because I think is a clean way to go. There should be a mysql data type that just use the file system and keep the reference on the db making completely transparaent the file implementation.