Wednesday, April 30, 2008
There are a few slides that were edited out of video because of reasons beyond my control. However, you should still be able to enjoy most of the video.
There is one point related to this video that I would like to make: Based on my particular experience I was leading to believe that Solaris 10 Kernel had the same issue as Linux Kernel related to swappiness and swapping where the kernel will start putting more importance on maintaining file system cache than the mysqld process. However, towards the end of the session, it was pointed out by a Sun engineer (thanks!) that there must be something else going on as UFS on Solaris 10 shouldn't depict this behavior and a process shouldn't swap in favor of maintaining file system cache. I am having this issue on 3 of my servers and I am currently working with Sun engineers to get to the bottom of the issue.
Jesse Robbins, chair for Velocity conference graciously provided a 20% discount coupon as a comment on my blog post.
The early registration is about to end, but I find it really interesting that many slots still mention TBC (to be confirmed). I would have expected the schedule to be fully determined by now, however, I still believe this should be a great conference to attend.
Earlier I wrote about my proposed session being rejected at Velocity Conference which was a big disappointment especially since my presentation was about a top 13 website in the world. Wasn't that the point of this conference to begin with? There are several sessions at this conference that have been presented several times at other conferences including MySQL and a little Google search turns up the slides. So some company's 'secret sauce' is worth repeating and others not? Oh well, no hard feelings. As I said, I still think there would be some interesting sessions.
Let me know if you are planning to attend the conference. I will be flying to SFO on Sunday evening and flying back on Wednesday afternoon.
Sunday, April 27, 2008
Today, I noticed Don is featured on Sun's customer success stories page:
Don McAskill is the CEO and Chief Geek of Smugmug, a photo and now hi-def video (using H.264) sharing site with a successful business model behind it.
I initially met Don last year at the MySQL Conference when my then boss told me that he is interested in meeting him. That was my introduction to Smugmug. I was impressed by SmugMug's presentation of photos and the care they took to make your photos and galleries look awesome.
This year, as a member of Smugmug, me and my wife got to interact with Don on a personal level.
We had several suggestions related to how our Smugmug experience could be improved and Don listened very carefully. One of the things I was most interested in seeing implemented was blocking Smugmug subdomains from being indexed if a customer is hosting them on their own subdomain.
I was truly impressed by how much Don thinks and cares about his members. It isn't a surprise that he runs a very successful site. From my conversations with Don, It seems there are many interesting projects Don and his team are working on and I can't wait to see them implemented. Almost all of the projects we heard about were focused towards customers. No wonder Smugmug has a high customer retention rate.
Technology wise, I am a fan of decisions Don has made to run Smugmug. He uses MySQL, S3, EC2 for processing and video conversion, Solaris 10 and Sun hardware.
Despite being the CEO, Don is the MySQL guy at Smugmug. His latest blog post, Death of MySQL read replication high exaggerated, was a good natured jab at discussion Brian Aker started with Arjen Lentz and me jumping in.
In the following video, Don Grantham interviews Don McAskill (yup, two Dons together) about Smugmug's relationship with Sun and the challenges of running a successful Web 2.0 business with more than 350,000 paying customers and more than 300,000,000 photos. As you can see in the video, customer satisfaction is more important than growth to Smugmug.
Since I joined Smugmug, several of my friends including Ronald Bradford have also joined. You can view my galleries by clicking on the image below and Ronald's photos from the MySQL conference by clicking on the image underneath:
If you use Smugmug as well, drop your Smugmug URL as a comment (of course, only if you want to share).
To stay up to date with exciting stuff happening at Smugmug checkout Don's blog.
I probably won't be able to cover everyone I met (sorry about that) but I intend to cover as many as possible. There will be no order in which I cover people. Also, there is no secret agenda and of course whatever I say is just my personal opinion. Just whenever I have a few thoughts ready about someone, they will pop out :)
Saturday, April 26, 2008
Found a scary story today about hundreds of thousands of websites using Microsoft IIS and SQL Servers being affected by Internet-wide SQL injection attacks. The story originally reported by F-Secure is now on Slashdot as well.
On the IIS forum, panic is visible. Those who had backups are breathing a sigh of relief like one administrator who commented, "We have been hit by this as well. Lucky backup ran last night just prior to the attack."
Others without backups are just screwed.
F-secure reports in an update to the story, "Do note that this attack doesn't use any vulnerabilities in any of those two applications. What makes this attack possible is poorly written ASP and ASPX (.net) code."
Although this attack is targeted towards IIS and SQL Server, there are lessons to be learned for sites using other servers and databases. There are several guides available on the Internet that will show you how to secure your application against SQL Injection attacks, like http://www.blogger.com/img/gl.link.gifthis one that is focused on securing PHP and MySQL applications.
In this year's "Disaster is Inevitable--Are you Ready" presentation at the MySQL Conference (Yes, I have read Baron's post), I covered a few types of disasters. However, I missed an important kind of disaster: ones that are caused by SQL Injection. My next presentation on this topic will certainly cover this. BTW, if you missed my presentation, you can thank Artem Russakovskii, who took meticulous notes that you can read.
What saddens me is comments like, "but we have all patches applied to the version we are using." There is of course, a disconnect here as far as understanding the problem is concerned.
Patches don't secure you against SQL injection attacks; Properly written code does. Sanity check is very important!
Replication as a backup method won't help against SQL Injection
Based on my survey, a disturbingly high number of sites use replication as their backup strategy. If replication is your sole method of backup, then beware, SQL injection based disasters aren't going to help. Unless, of course, you have time delayed slaves and are able to stop replication before the slaves are affected.
Every year there are a number of backup related presentations at MySQL Conference. All, except one of the following, were presented this year!:
- What do you mean there's no backup? -- A timeless presentation by Mike Kruckenberg and Jay Pipes originally presented in 2006.
- Backup and Recovery Basics by Kai Voigt
- MySQL Backups go near continous by David Wartell
- MySQL Online Backup: An In-depth presentation by Chuck Bell
- Online Backup, Open Replication and a world of contribution by Lars Thalmann and Chuck Bell
- Performing MySQL Backups using LVM Snapshots by Lenz Grimmer
- Top 5 Considerations While Setting Up Your MySQL Backups
Friday, April 25, 2008
I really enjoyed being a keynote panelist with my peers. We were seated according to our Alexa ranking with the highest ranking YouTube on the right side. Even though I was representing the thirteenth largest site, our traffic compared to Facebook and YouTube was humbling.
All of the keynote panelists met early in the morning to get equipped with microphones and to go over the format.
See the video (below) to hear some funny "can't say" answers by Paul Tuckfield. I wish Google won't keep him so secretive about numbers such as how many database servers etc. Does that really give out YouTube's secrets?
Following are some photos, videos and links to notes from the keynote.
From left to right:
Monty Taylor (MySQL),Matt Ingenthron (Sun), John Allspaw (Flickr), Farhan "Frank" Mashraqi (Fotolog), Domas Mituzas (MySQL/Wikipedia), Jeff Rothschild (Facebook) and Paul Tuckfield (YouTube)
Jam packed ballroom during the keynote.
Above Photos copyright: James Duncan Davidson.
Kaj Arnö leads the scaling MySQL keynote panel discussion.
Me getting animated.
Domas Mituzas, Jeff Rothschild and Paul Tuckfield at the keynote.
Matt answers a question as everyone listens
More photos from the keynote session are available at http://photos.mashraqi.com.
Video of keynote session:
- Sheeri/Technocation: Download, Play
- A short video by Zack Urlocker
Notes from scaling up or out keynote:
- Biographies of keynote panelists
- Keith Murphy: Scaling MySQL - - Up or Out? Panel @ UC
- Ronald Bradford: Scaling Wisdom
- Venu Anuganti: Notes from Scaling MySQL Up or Out
Thursday, April 24, 2008
Unfortunately, I do not have a FrSIRT account currently (need to get one ASAP) so I couldn't dig this vulnerability further. However, I am dying to learn more about this.
Wednesday, April 23, 2008
Java users can especially thank Sun now. Also this supports Sun's vision of Open Source.
"We've been engaging with the open-source community for Java to finish off the OpenJDK project, and the specific thing that we've been working on with them is clearing the last bits that we didn't have the rights," to distribute, Sands said.
"Over the past year, we have pretty much removed most of those encumbrances," Sands said. Work still needs to be done to offer the Java sound engine and SNMP code via open source; that effort is expected to be completed this year. Developers, though, may be able to proceed without a component like the sound engine, Sands said.
Source: Yahoo News
I think Monty found the right environment to work in.
Update: Original post mentioned "Java now fully Open Source" however as the article points, Java is expected to become fully open source later this year. I wonder how much role MySQL conference played in this announcement coming earlier.
Tuesday, April 22, 2008
There are less than 100 tickets left. If you are attending and use MySQL, Solaris 10 or Sun hardware in your environment, I would love to chat with you.
And, there are no presentations :)
------ EVENT DETAILS ----
What: MashBash NYC : Mashable’s NYC Spring Party!
Who: 2,500 Sold Out Crowd, 400 Mashable VIP Tickets on Balcony, Grandmaster Flash starts the night off
When: Friday, May 16th, 2008
Drinks: Open Bar, 8 - 10 pm sponsored by Kluster
Where: Webster Hall, 125 East 11th Street, New York, NY
Schedule for the Evening: 8 - 10 pm: Mashable is hosting an exclusive 400 person VIP event on the 2nd Floor Balcony of Webster Hall’s Grand Ballroom. There will be an open bar sponsored by Kluster.com.
10:00 pm: Doors open to the public, a 2500 person sold out crowd
10:15 pm: Opening for Mashable’s VIP guests is none other than the legendary Grandmaster Flash.
Midnight till 4 am+: Mashable’s VIP guests are welcome to stay in the VIP area all night for music from acts including MSTRKRFT, L.A. Riots and more.
Monday, April 21, 2008
This year's conference was the best ever for me. I have a lot of people to thank and a lot to blog about. The number of pings I have received about lack of my blogging during conference is truly humbling. However, I did have a good reason for not being able to blog.
First, I was presenting three sessions, with two on the final day of the conference. Since I have the habit of continuously revising my presentations, that put a little bit of pressure on me. A big thanks to all those who came to my sessions.
Second, I was given a great opportunity to be a keynote panelist at the "Scaling up or out" session at the MySQL Conference. If you missed the keynote, you can watch the full video of the keynote posted by Sheeri.
Third, me, my wife and a few friends were invited to a trip of the lifetime by hardcore community evangelists at Proven Scaling (Jeremy Cole, Eric Bergen and Mike Griffiths). We had a great time visiting Yosemite National Park (more on this later). This was my first time without checking email or being on the Internet in nine years.
Now that I am back, I intend to put all my thoughts regarding the conference and the trip as blog posts in the coming days so stay tuned.
Sunday, April 13, 2008
Wednesday, April 09, 2008
The message reads,
Warning: Facebook detected a potential scam to steam your account!
To prevent future problems, please reset your password.
Also, I was hearing in news today that a significant percentage of scams are now targeted towards social networking sites.
Of course, it goes without saying that one should not use their "important" passwords with social networking sites.
Then on Friday evening we will be going to visit more family in Monterey. We will arrive at Hyatt Regency, Santa Clara, on Sunday afternoon.
Once at Hyatt, I will be happy to give a ride to anyone going to the Pre-Conference dinner.
After the conference, my plan is to spend time with a few friends. I will be flying red-eye, Sunday night, back to home.
Like previous conferences, I can't wait to see all my old and new friends.
My passions include InnoDB, memcached, BLOBs, Latent Semantic Analysis, Ruby on Rails (why won't it scale), SEO, monetization, Solaris 10, Sun hardware, Hadoop, Lucene, replication and Blue Moon :), I would love to meet/talk with other users passionate about similar stuff.
Monday, April 07, 2008
Brian makes an important point in a comment to my post regarding backup. He points out "Backups are always onerous on IO" and that a better way to backup is to use slaves or a standby master (if using multi master replication).
If you *must* run backups on a production server, then ibbackup becomes very important as it doesn't affect performance as much as the evil snapshots created by snapshot tools like fssnap and LVM. I have found that in our case purchasing ibbackup licenses were worth every penny.
In our environment, running backups using copy-on-write snapshots was killing performance. Writes would start stalling several hours into the backup process. It didn't help that backups would take 27 hours to finish. I moved most systems to using ibbackup and for those systems running backups hasn't been an issue at all.
Of course, if you must backup production servers, take snapshots to backup everything except the databases. That way the snapshots will be held for a much smaller period and you can continue backing up databases using ibbackup.
What about mysqldump? I don't consider mysqldump an appropriate tool for periodic backups. I can see it working for small databases but running it on enterprise level databases for daily backups is just not going to be feasible.
I would love to discuss backups more at the conference. I also would like to evaluate some of the backup vendors exhibiting at the conference.
By making App Engine available only for Python, Google is giving the language a big boost.
Amazon's EC2 (Elastic Compute Cloud) allows developers to choose their own stack. Furthermore, Amazon's S3 allows third party applications to connect directly. With Google AppEngine it seems one must interact with BigTable using Python application.
Here's what Google's AppEngine promises developers:
- Write code once and deploy
- Absorb spikes in traffic
- Easily integrate with other Google services
Google App Engine is limited to first 10,000 developers
The website for Google App Engine (http://appengine.google.com/) goes live at 12:00 AM EST tonight. Only the first 10,000 developers will be given beta accounts. So hurry now before you are left out.
What is offered
The current limits imposed by Google include:
- 500 MB storage
- 200 million megacycles/day CPU time
- 10 GB bandwidth per day
Google App Engine Pricing
During the beta period, the service is completely free. Google has not announced the pricing after beta period finishes.
I tried gaining an account right at 12:01 AM but thanks to Google "profiling" (which they have complete right to :) ), I got the following message:
Unfortunately, space is limited during Google App Engine's preview release. As we expand, we'll invite more developers, but for now you'll have to wait.
Would you like to be notified by email when space becomes available?
It seems like an "invite only" service. If you have invites or figure out how to get an account, please let me know. I'd love to get one.
Many thanks to Nick Johnson of Google and others for sending me invites. Also, thanks to those who posted a comment. I was able to get an account and couldn't be happier.
- Google Jumps Head First Into Web Services With Google AppEngine.
- Google App Engine readies for brawl with Amazon
- Google Launching App Engine for Python Developers
- Google Cloud Now on Tap for Python Developers
"The apps all appear on the appspot.com domain. Each developer currently gets three application ids. When apps are uploaded they will appear at http://application-id.appspot.com. Developers can, of course, bring their own domains. You can see the current set of apps in the application gallery. I love the Appspot domain name; it's an homage of sorts to Blogspot and fits in nicely with Jotspot."
- App Engine: Host your Python Apps with Google
- Google App Engine Blog- Introducing Google App Engine
For those unfamiliar with BigTable:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
This kinds of usernames usually indicate spammers. If you feel like this was in err, contact the wiki administrator.
Return to MySQLConf2008CommunityDinner.
Ok, whatever. I then tried again with a "non spammer" username and multiple email addresses but kept getting this:
A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was:
(SQL query hidden)
from within function "User::addToDatabase". MySQL returned error "1062: Duplicate entry '' for key 3 (localhost)".
So there seems to be an issue with MySQL Forge registration.
Sunday, April 06, 2008
Backup is irrelevant for those of you who care about this discussion. LVM/ZFS snapshots are the rule of the land.
While I agree with most of Brian's statements in the article, I respectfully disagree with the statement above, especially the bolded part. Copy-on-write snapshots are EVIL for very large databases operating in a high I/O environment and backup, by no means, is entirely irrelevant. Please correct me if I am wrong but it is my understanding that both LVM and ZFS implement copy-on-write snapshots. Backup may be irrelevant for most sites but not for us.
If, however, by "irrelevant" Brian meant that not important in choosing one database over another, I can agree with that. Why? Because no one benchmarks backup methodologies until backup process starts becoming a major PITA.
Backup methods can be a performance killer when dealing with very large databases. If you're interested in finding out why, and more importantly how, ask me at the conference, come to my scaling MySQL and InnoDB on Solaris session, or check on this blog after the conference.
Believe it, or not, she hammered me with all sorts of questions. I spent some time answering her questions. I scanned my brain to gather more evidence to support myself including that at work we are moving and staying away from replication as much as possible.
Then, I got busy writing the post about Facebook using MySQL replication to update Memcached. After publishing the post, I checked Planet MySQL and found Arjen discussing (and agreeing with) Brian Aker's post about "The Death of Read Replication."
At that point, I simply turned my MacBook screen towards my wife and smiled :)
I consider Brian's post a brave one from MySQL point of view as I can imagine not everyone at Sun/MySQL will be happy about this. I appreciate his can
However, what Brian says about replication, caching and memcached is very true. memcached is an incredibly important part of our infrastructure. It doesn't has painful latency of MySQL replication associated with it. It requires much less hassle to setup, reset and scale. Like Facebook and all other major Web 2.0 sites, we have a considerably large memcached farm that allows us to serve our ever increasing demand.
P.S. Just to be clear, I highly favor using master-master replication for high availability and a small number of slaves. I just don't favor investing money in slaves alone for scaling reads.
P.P.S. I will leave you with a quote from Arjen's post:
"What needs to be fixed is distributed writes. And economically!"
This is very smart! I am curious about how they implemented this. I wonder if by "replication stream" they are just referring to binary logs. The article didn't mention whether they hacked MySQL to do synchronous replication as well, like Google. That would be really neat: synchronous replication that updates memcached.
Synchronous or not, the idea is still uber cool and I would love to see more discussion from Planet MySQL community regarding this.
Making replication possible for Brian Aker's memcached storage engine for MySQL can be another way in the future to making MySQL replicate to memcached. Brian's blog post shows:
ENGINE=MEMCACHE DEFAULT CHARSET=latin1 CONNECTION='localhost,piggy,bitters'The multiple host specification looks very interesting. I will definitely love to talk about this with the brains at the conference.
Also, something like this would make a nice candidate for programs like Google summer of code.
Thanks to my colleague and friend A. Lee for brining this to my attention.
Friday, April 04, 2008
We'll have to take a hit in certain database operations to benefit from this 62% gain. (Update: however, luckily, those operations do not occur everyday.) I will be presenting results of my benchmarks and information at the MySQL conference. If you are evaluating Sun servers for MySQL, you will find my session very interesting.
Now, I can't wait to receive bunch of T5120s to replace all our db servers.
I have been wanting to write on KickFire but I certainly won't be able to beat Baron. He does a wonderful job in capturing what is KickFire and presenting a detailed insight for PlanetMySQL readers.
Like Baron, I only provided consulting and didn't get a chance to actually play with the solution. If KickFire is able to deliver what they have been promising then I can see them becoming a major solution provider to MySQL community.
I can't wait for Kickfire's keynote. Should be very interesting for those interested in giving MySQL scalability a whole new meaning.
Thursday, April 03, 2008
I was made the offer to play golf today at our weekly manager's meeting. Why will I miss it? Because I will be in California, speaking at the MySQL Conference.
There are several Sun related interesting events happening in New York during the time I will be in California for the MySQL conference. This would have been a great chance for me to mingle with the top executives and talent at Sun.
I feel sad for missing this opportunity but very excited as the conference time comes closer and closer. Can't wait to see old friends and make new ones.
The discussions were very interesting and informational. Some of the topics (that I am allowed to discuss publicly) were PNFS, QFS, LDAP for large scale authentication, Sun's new servers developed with Fujitsu and Sun's storage solutions.
Architecture wise, I was able to gain some more insight into UltraSparc T1/T2, Sparc M series, and M1 vs M2 architecture. Yes, there was clarification needed every time someone said T2 and T1 to differentiate T1000s and T2000s from UltraSparc T1 (Niagara 1) and UltraSparc T2 (Niagra 2). Someone please tell Sun they can use other letters of the alphabet to describe their servers and series.
The food, though very small in portions, was just out of this world. I can't wait to take my wife there.
Wednesday, April 02, 2008
My morning started with checking servers, then heading to PlanetMySQL where I found the "sad" news. Both me and my wife spent the next hour discussing nothing else but Ronald and every topic we could think of related to his 'situation'. In the back of my mind, I was thinking that this could be a joke, but then I thought I knew Ronald well enough that he won't play a joke like this. Of course, I was wrong.
When I got Ronald's message saying "April Fools!" my response was "I HATE YOU!!!!"
In the evening, when I talked to a very good mutual friend, Marc, I found he was equally "mad" at Ronald. Today, I see that we were not alone and poor Jay was very worried.
I would love to form a coalition of all those affected by this so we can take revenge :)
"Web companies, big and small, face many of the same challenges: sites must be faster, infrastructure needs to scale, and everything must be available to customers at all times, no matter what. Velocity is the place to obtain the crucial skills and knowledge to build successful web sites that are fast, scalable, resilient, and highly available."
When the call for papers was open for Velocity, I submitted a talk proposal regarding cutting MySQL IO for cost effective scaling and performance optimization.
Fotolog is one of the largest sites on the Internet. We are ranked 13th most visited site by Alexa and 3rd most active social network by ComScore. In the past two years, we have experienced and continue to experience incredible growth. By focusing on efficient data modeling and cutting I/O, we have literally pushed the limits of optimization and scalability when it comes to MySQL.
Learning today that my session was not accepted obviously came as a major disappointment to me. While I truly respect the conference chair's decision, I believe my session would have been useful for those who are experiencing strong growth but cannot afford to re-architect their database backend for one reason or another.
There is some good news as well: While Velocity rejected my proposal, I am presenting a somewhat similar session at this year's MySQL Conference. The session is called "Optimizing MySQL and InnoDB on Solaris 10 for World's Largest Photo Blogging Community". If you're attending the conference and interested in knowing how you can push the limits of your MySQL database servers on Solaris, don't forget to attend my session. It will be a lot of fun, I promise!
I am also presenting two more talks at the MySQL Conference, Disaster is Inevitable—Are You Prepared? and The Power of Lucene.