MySQL Consulting and NoSQL Consulting: MySQL DBA: oracle

The purpose of this post isn't by any way to bad mouth MySQL but rather to show the impact of lack of qualified DBAs to those who use MySQL.

I was shocked last year, when a CTO of a large company confided in me and said basically that they are so tired of searching for qualified MySQL DBAs that they may switch to Oracle. At some point, I guess the frustration alone can justify an otherwise unneeded cap-ex of such magnitude.

Today, I hear that ValueCentric, a pharmaceutical technology consulting firm based in New York, has decided to let go of MySQL in their environment and instead switched to Oracle on Oracle Linux.

"as the firm expanded and began taking on bigger clients like AstraZeneca, Roche and P&G Pharmaceuticals, company officials became concerned over MySQL's ability to keep up with rapidly growing data stores and complex quality-of-service requirements."

Now, MySQL may not be an ideal fit for many

"We were faced with some pretty big issues around performance, both in processing and on the back end, as well as reporting, where we had problems with queries returning on time and being able to handle the processing of data at night -- or whenever it came in -- in an expedient manner," Janca explained. "We needed to step back and see where we were headed strategically, do an assessment and try to remove some risk that we thought we had in continuing down the path with MySQL."

This gives the sense that MySQL cannot hold up as a company grows. However, to me, it seems like they had someone with little or no MySQL experience working on it. Or worst, their data architect did a horrible job of architecting their data and reporting environment. Or even worse, they hired services of a consultant who showed them the only way out of their issues was MySQL. There isn't a shortage of such consultants, you know.

Folks, MySQL is a tool and you need to know how to use that tool to get the job done. Just because you have been able to successfully open a tool doesn't mean that you know how to use it.

No one argues the need to have a competent DBA when one is using Oracle. However, the notion of Open Source makes people somehow believe that MySQL should run great out of the box even if the database architecture of the company sucks.

One of the biggest mistakes companies and IT managers make, IMHO, is to hire a data architect with minimal MySQL experience. They think their data can be architected by someone who doesn't necessarily has the expertise with underlying database. This can be a disastrous call as a competent data architect must possess an in-depth knowledge of the underlying database.

ValueCentric had also become disenchanted with MySQL's disaster-recovery capabilities and support services, added John LoFaso, ValueCentric's director of technical operations.

Those of you who attended my talk at MySQL Camp II, know the hell I went through on 7.5.07, when our SAN crashed leaving more than 1TB of InnoDB data severely corrupt (won't stay started even with innodb_force_recovery set to 6). For sure, I wasn't able to tap any readily-available resource to recover from this disaster. However, I was able to write programs myself to extract data from InnoDB tablespaces and the recovery rate was truly impressive. However, I realize not everyone will be wanting to write programs themselves especially in the times of extremely high stress (i.e. when disaster has occurred). This is one area that is definitely still ignored by MySQL. Peter Zaitsev recently blogged about releasing a DR tool for InnoDB however, I don't think that has happened yet.

"The support was very sketchy, and we just couldn't rely on it," he said.

Ok, since this is a somewhat touchy subject, for me and many others so I will mainly let it pass. I do however feel that you need to have the highest support plan to really get MySQL to help you. We, for one, have paid for support but hardly use it (MySQL software is so great).

I find it pretty amusing that while people use MYSQL, their budget for MySQL support is just pennies compared to when their support budget for Oracle. I believe had they spent same amount of resources on MySQL support, they would be a happy MySQL user.

One thing I will say is that we are preferring to bring in independent consultants for our next major project, though, that may/may not qualify as support to some.

So in case you are wondering why ValueCentric didn't go with Microsoft?

"because of "impersonal" support operations" and because "The goal was that we needed a partner, not an 800-number"

I believe in one thing: MySQL has a remarkable community, remarkable product, remarkable leadership and a remarkable vision so why not make the support remarkable as well? Right now, it may be great support, but it certainly isn't remarkable. And to know the difference between great and remarkable, well, you gotta read Purple Cow. (hint: they are both opposite of each other).

Going back to the shortage of competent MySQL DBAs, this is something that, I think, MySQL can artificially overcome by making their support remarkable.

However, to be fair to MySQL, the support is all there, however as with most things in life, it has a price. Ok, I seem to be in a catch 22 situation so I guess, enough for today.

BTW, thanks Ronald, for giving me Purple Cow as a gift in 2006. The book has really changed my view of thinking.

A comment was left for me by Roland and that made me want to write my thoughts.

My thoughts have changed on this topic over time. Primarily, because I now work on a large scale infrastructure where I can see how things would have been if BLOBs were stored in database. Most of the following discussion keeps the size and scale of my work load in mind and is targeted towards those who are interested in evaluating BLOBs for a work load, access patterns, budget and performance requirements similar to ours. The following are my thoughts and opinions.

When I started my current job, we were near 4 million members and just over 100 million photos. There were a lot of performance issues at that time. Today we have 14 million members and reaching 400 million photos and topping 152 million page views a day with at times reaching more than 900,000 photos a day and 8.9 million comments a day. During this time, on the database end, we only recommissioned one new test database server as production server in our guestbook cluster. Other than that one "addition", all our growth is being currently served from the same number of purchased servers/instances in our photo, friend/favorite and guestbook clusters. In that sense, one can say we have been scaling up with MySQL and Solaris 10 on Sun hardware.

There are more than available features, cost and quantity discount factors to take into account when making a decision about the DBMS that should be used for a particular environment and architecture. Just because a data type (that you weren't going to use) is available in a DBMS, shouldn't be the reason to pick it.

When you're in charge of data, you have to wear many hats to make decisions. As a data architect, your job is to take a principle based approach that is not influenced primarily by what extra features a DBMS offers but rather takes into account the "organization's strategic goals," and its budget. Experience in architectures is an important requirement to being an effective data architect as an approach considered optimal in one environment may not be equally ideal in another environment especially given unique constraints of each organization. Same is true from a database administrator point of view except that now maintainability, performance and scalability requirements are of utmost concern to you.

I was not the decision maker when my company decided to store images on file system, however, I couldn't have been more happier. If given the choice tomorrow, I would go with the decision of keeping photos outside the databases.

For an extreme high performance environment like ours where there is constant explosive growth, BLOBs could not have been the SOLUTION. In fact, had we stored nearly 400,000,000 photos (we are about to reach that milestone probably tomorrow) in *a* DBMS, we would be not only be crippling our performance and forcing our existing database infrastructure to become mostly unusable. For us, it's not just about storing 400 million photos. It's also about handling load that is generated by serving more than 4 billion page views a month. I would be very interested in seeing if any company, that is similar in specs like us and has intense performance requirements per server, like ours, uses BLOBs to store their images.

The concept of databases is generally suited to storing a very large number of objects that are small in size. Here, both "very large" and "small" are relative to each industry and requirements of the system. When you look at overall picture, file systems are more suited to handling large objects, especially if the large object consists of an image.

The decision of whether or not to store images as BLOBs MUST NOT BE DONE based on what your DBMS offers (even if its Oracle, SQL Server or MySQL). If you primarily base that decision based on your DBMS offering alone, you have done a great disservice to your organization in my opinion. The decision should be based on your performance requirements, scalability needs, ease of maintenance, cost of future features to be added to the application etc., and not on what you are more comfortable using.

Just for a moment, we have to think outside the box to be able to understand the argument of why storing BLOBs in file system is a wiser decision especially in a data-intensive web based environment.

When a client browser requests a page, it parses it out for presence of any media files and then makes a separate request to retrieve and display each media file. Typically, you can query the database for the data elements and then for each media request made later, simply use the file system to serve the request. If, however, you store media files as BLOBs, then for each media request you will need to communicate with database to get the contents of the requested media file. This can put unnecessary load on your database and increase I/O contention on your database server. Then, suddenly, you have your database server doing a lot of IO activity and your plans to keep disk I/O to minimum and serving your working data set mostly from memory goes hay wire. Some would argue that the cost of opening files is greater than accessing it from database but I wonder how long that would hold true for large media files.

For my company the size of the images is in double-digit (TT) terabytes. The size of databases is in multiple, but single digit terabytes (T). It makes no sense to me now that a database administrator would agree to bloat their databases to add TT more terabytes when they could have kept them at T terabytes and kept the other TT terabytes away from the database.

Unless the BLOB is kept in a separate table, say photo_blob, from the main table, say photo, you are going to make schema changes a nightmare for yourself, since you will be performing migrations on an additional TT terabytes. Suddenly, upgrading your database to a new version would seem like a slow walk in hell.

Same remains true for backups of database servers. By keeping the media files away from your database, you relieve your database from having to process TT terabytes of data for backup purposes which can be performed separately using file system backup tools.

If you use snapshot software to take database backups, there is something else to worry about. Snapshot utilities such as fssnap can create snapshots very quickly but as data is written to the original volume, the backup store files are also updated. This means in large environments as write load on database increases, so does your performance issues while the snapshot is held. With BLOBs you are now keeping snapshots much longer on database server so the additional TT terabytes can be backed up. While this happens, your performance will keep crippling. Instead of releasing snapshots within 5 hours, you are now waiting more than 24 hours. Yes, you will still have to backup the TT terabytes of media files, but at least you won't be making it the headache of your poor database server who already has so many things to worry about. By keeping media and its associated metadata separately, you can apply optimal backup techniques to each environment.

Oh, and what happens when you have to reset replication and move extra TT terabytes to slaves?

And let's not forget about disaster recovery issues where you need to recover data from crashed databases. More than half of your data would avoid having to be recovered in case of a database crash, if stored on a separate file system.

As we move towards higher and higher resolution devices, the size of the media files will continue to grow. 20-30MB image files are not rare anymore. Do you really want to store high resolution media in your database? If not, will you store low resolution media in database and high resolution on file system? Why?

Then there is the whole issue of fragmentation of database storage compared to file system storage. Not to mention making a mess by mixing sequential I/O and random I/O instead of separating them and benefiting from the separation.

This is not just an issue with MySQL. A 2006 study conducted by Microsoft researchers states:

"One surprising (to us at least) conclusion of our work is that storage fragmentation is the main determinant of the break-even point in the tradeoff... In essence, filesystems seem to have better fragmentation handling than databases and this drives the break-even point down from about 1MB to about 256KB."

What about Portability? Image formats such as PNG and JPEG are platform independent, whereas your database files may not be.

Of course, there are issues with garbage collection when you store your media outside of database. However, these issues should be trivial to solve for any experienced developer.

The decision to store BLOBs on file system makes even more sense from financial point of view. If an organization of our size was to move to Oracle or SQL Server, then they can significantly reduce their licensing costs by storing TT terabytes of media files on file system. I wonder how much our Oracle licensing costs would be to store and serve TT terabytes. Even with MySQL, our license fee for support would be significantly more for additional servers employed to serve TT terabytes of media.

I'll end with this thought: If you're storing your BLOBs, especially large media files in database, aren't you choosing the most expensive file system available?

Further reading:
To BLOB or Not to BLOB: Large Object Storage in a Database or a Filesystem.
To BLOB Or Not To BLOB by Deep Select.
When to put images in MySQL by Sheeri

MySQL Consulting and NoSQL Consulting: MySQL DBA

Tuesday, January 29, 2008

MySQL dumped - Oracle loaded - Whose fault is it?

Sunday, January 27, 2008

I WILL NOT BLOB