Googling around, I came across Bradford Cross' article, Big Data Is Less About Size, And More About Freedom. Bradford writes, " The scale of data and computations is an important issue, but the data age is less about the raw size of your data, and more about the cool stuff you can do with it."
Even though the article makes some good points, I'm not sure I can agree with Bradford's point of view here. As an architect, when I think in terms of Big Data, the ability to do "cool stuff" is probably the last thing that crosses my mind. Big Data, to me, is about ensuring constant response time as the data grows in size without sacrificing functionality.
What do you think Big Data is about? Is it merely about being able to do 'cool stuff' with your data? Is it about ensuring constant access/response times? Or is it about something else? I'm eager to hear your thoughts.
Specializing in big data deployments using MySQL / NoSQL Solutions. Topics: [mysql tutorial] [database design] [mysql data types] [mysql commands] [mysql dump] [database development] [mysql training] [mysql scalability] [mysql sharding] [mysql performance tuning]
Showing posts with label scalability. Show all posts
Showing posts with label scalability. Show all posts
Wednesday, December 01, 2010
Tuesday, November 30, 2010
Lean Startups and Scalability
I wrote this as a reply to Does Lean Startup Methodology Apply to Consumer Startups?" However, due to comment length restrictions on that blog, I am posting my comment here and welcome your thoughts.
"An enterprise will pilot products and iterate with a vendor: Let's run a 6 month consulting engagement/pilot to evaluate if this new database solves the problem."
Only an enterprise where there is a major disconnect between management and engineering will opt for this path. In enterprises where needed data I/O patterns are understood, taking such path may spell disaster.
The primary problem with the 'lean startup' methodology that I see is that it blindly preaches entrepreneurs to close their eyes, cut corners and just get the product to market without fully understanding the future scalability needs. Scalability doesn't has to be sacrificed in order to build a lean startup, except in those cases where there is no architect on board.
Many entrepreneurs think of taking route of frameworks in early days only to find out the haunting effects as growth happens.
Case studies exist where massive infrastructures with low latencies have been built without blowing budgets and need for re-architecting infrastructures. The concept of lean startups is great, but incomplete (especially as it is being preached).
When the right architects and team is on-board, building the right way also becomes the lean way.
<update> Let's not forget that data is the most valuable asset of an organization and every migration a great opportunity for a screwup. Do you really want to migrate it around from database to database as you sit with your fingers crossed hoping that the latest vendor will solve your problem? Or does it make more sense to invest in an experienced architect and then make decisions rather than shooting in the dark? I'll let you decide the rest.
"An enterprise will pilot products and iterate with a vendor: Let's run a 6 month consulting engagement/pilot to evaluate if this new database solves the problem."
Only an enterprise where there is a major disconnect between management and engineering will opt for this path. In enterprises where needed data I/O patterns are understood, taking such path may spell disaster.
The primary problem with the 'lean startup' methodology that I see is that it blindly preaches entrepreneurs to close their eyes, cut corners and just get the product to market without fully understanding the future scalability needs. Scalability doesn't has to be sacrificed in order to build a lean startup, except in those cases where there is no architect on board.
Many entrepreneurs think of taking route of frameworks in early days only to find out the haunting effects as growth happens.
Case studies exist where massive infrastructures with low latencies have been built without blowing budgets and need for re-architecting infrastructures. The concept of lean startups is great, but incomplete (especially as it is being preached).
When the right architects and team is on-board, building the right way also becomes the lean way.
<update> Let's not forget that data is the most valuable asset of an organization and every migration a great opportunity for a screwup. Do you really want to migrate it around from database to database as you sit with your fingers crossed hoping that the latest vendor will solve your problem? Or does it make more sense to invest in an experienced architect and then make decisions rather than shooting in the dark? I'll let you decide the rest.
Labels:
architect,
leanstartup,
lowlatency,
scalability
Sunday, March 01, 2009
FriendFeed uses MySQL to store "Schema-less" data
Came across an interesting post by Bret (co-founder of FriendFeed) about how FriendFeed uses MySQL to store "schema-less" data. According to the post, they weren't having issues with scaling existing features but rather they were experiencing pain when trying to add features.
Now the way they are using MySQL is interesting and bizarre at the same time. At a very high level, it seems their approach is to use a RDBMS as if it is a column-oriented database. Of course, it makes me wonder why not just use a column-oriented database? I need to read the post again in the morning (too tired right now so just gave it a quick glance).
I am very interested in hearing thoughts from my peers at Planet MySQL regarding this approach. They seem to have gone great lengths to go this route. What issues and benefits you see of this approach and whether you ever see yourself taking this route? I, for one, am not entirely convinced of this approach and whether it can really scale down the road. Also, if it was someone other than Friend feed going down that route, I might have actually lost my tempered and yelled :)
Side note: Friendfeed is growing fast, and it would have been cool if Bret was speaking at one of the three upcoming MySQL events in April.
Now the way they are using MySQL is interesting and bizarre at the same time. At a very high level, it seems their approach is to use a RDBMS as if it is a column-oriented database. Of course, it makes me wonder why not just use a column-oriented database? I need to read the post again in the morning (too tired right now so just gave it a quick glance).
I am very interested in hearing thoughts from my peers at Planet MySQL regarding this approach. They seem to have gone great lengths to go this route. What issues and benefits you see of this approach and whether you ever see yourself taking this route? I, for one, am not entirely convinced of this approach and whether it can really scale down the road. Also, if it was someone other than Friend feed going down that route, I might have actually lost my tempered and yelled :)
Side note: Friendfeed is growing fast, and it would have been cool if Bret was speaking at one of the three upcoming MySQL events in April.
Sunday, November 09, 2008
Scalability As A Functional Or Non Functional Requirement
I am currently tasked with writing Software Requirements Specification (SRS) document for a project. Effective sharding (based on specific criterion) and Scalability are key requirements of the project.
Scalability is traditionally classified as a non-functional requirement. My question to the community is that if scalability is crucial to a project, would it still be classified as a non-functional requirement? Are their cases when scalability requirements would be best classified as functional requirements?
Scalability is traditionally classified as a non-functional requirement. My question to the community is that if scalability is crucial to a project, would it still be classified as a non-functional requirement? Are their cases when scalability requirements would be best classified as functional requirements?
Saturday, May 31, 2008
Michael Arrington Asks Twitter a Few Tough Questions
Michael Arrington of TechCrunch asks Twitter a few questions. I have only included a sample list below but you should read his blog post for all the questions:
A 'yes' answer to any of these questions by Twitter would be disturbing to say the least. However, it won't be surprising as companies expect databases to just somehow magically work without creating and supporting a proper architecture. High availability doesn't comes cheap and reputation for companies is everything.
I find it amusing that Twitter isn't even looking for a DBA. May be that's considered a job for the SA over there :)
- Is it true that you only have a single master MySQL server running replication to two slaves, and the architecture doesn’t auto-switch to a hot backup when the master goes down?
- Do you really have a grand total of three physical database machines that are POWERING ALL OF TWITTER?
- Is it true that the only way you can keep Twitter alive is to have somebody sit there and watch it constantly, and then manually switch databases over and re-build when one of the slaves fail?
A 'yes' answer to any of these questions by Twitter would be disturbing to say the least. However, it won't be surprising as companies expect databases to just somehow magically work without creating and supporting a proper architecture. High availability doesn't comes cheap and reputation for companies is everything.
I find it amusing that Twitter isn't even looking for a DBA. May be that's considered a job for the SA over there :)
Labels:
high availability,
mysql,
scalability,
twitter
Wednesday, April 30, 2008
Optimizing MySQL and InnoDB on Solaris 10 for World's Largest Photo Blogging Community - Video
The video of one of my three sessions, "Optimizing MySQL and InnoDB on Solaris 10 for World's Largest Photo Blogging Community", presented at MySQL Conference & Expo 2008 has been uploaded by Sheeri. I am very thankful to her for doing all the hard work and making it available.
There are a few slides that were edited out of video because of reasons beyond my control. However, you should still be able to enjoy most of the video.
There is one point related to this video that I would like to make: Based on my particular experience I was leading to believe that Solaris 10 Kernel had the same issue as Linux Kernel related to swappiness and swapping where the kernel will start putting more importance on maintaining file system cache than the mysqld process. However, towards the end of the session, it was pointed out by a Sun engineer (thanks!) that there must be something else going on as UFS on Solaris 10 shouldn't depict this behavior and a process shouldn't swap in favor of maintaining file system cache. I am having this issue on 3 of my servers and I am currently working with Sun engineers to get to the bottom of the issue.
There are a few slides that were edited out of video because of reasons beyond my control. However, you should still be able to enjoy most of the video.
There is one point related to this video that I would like to make: Based on my particular experience I was leading to believe that Solaris 10 Kernel had the same issue as Linux Kernel related to swappiness and swapping where the kernel will start putting more importance on maintaining file system cache than the mysqld process. However, towards the end of the session, it was pointed out by a Sun engineer (thanks!) that there must be something else going on as UFS on Solaris 10 shouldn't depict this behavior and a process shouldn't swap in favor of maintaining file system cache. I am having this issue on 3 of my servers and I am currently working with Sun engineers to get to the bottom of the issue.
Labels:
architecture,
innodb,
mysql,
optimization,
performance,
scalability,
solaris 10
Velocity Conference -- Web Performance and Operations Conference

Jesse Robbins, chair for Velocity conference graciously provided a 20% discount coupon as a comment on my blog post.
The early registration is about to end, but I find it really interesting that many slots still mention TBC (to be confirmed). I would have expected the schedule to be fully determined by now, however, I still believe this should be a great conference to attend.
Earlier I wrote about my proposed session being rejected at Velocity Conference which was a big disappointment especially since my presentation was about a top 13 website in the world. Wasn't that the point of this conference to begin with? There are several sessions at this conference that have been presented several times at other conferences including MySQL and a little Google search turns up the slides. So some company's 'secret sauce' is worth repeating and others not? Oh well, no hard feelings. As I said, I still think there would be some interesting sessions.
Let me know if you are planning to attend the conference. I will be flying to SFO on Sunday evening and flying back on Wednesday afternoon.
Labels:
burlingame,
events,
operations,
performance,
scalability,
velocity
Friday, April 25, 2008
Scaling Up Or Out - Keynote at MySQL Conference 2008
At this year's MySQL Conference I was invited to be a keynote panelist at Scaling MySQL Up Or Out keynote. Other keynote panelists included Jeff Rothschild (VP of technology at Facebook and a consulting partner with Accel Partners), Paul Tuckfield (DBA at YouTube), John Allspaw (manager of operations engineering at Flickr) and Domas Mituzas (DBA at Wikipedia). There were also representatives from MySQL (Monty Taylor) and Sun (Matt Ingenthron).
I really enjoyed being a keynote panelist with my peers. We were seated according to our Alexa ranking with the highest ranking YouTube on the right side. Even though I was representing the thirteenth largest site, our traffic compared to Facebook and YouTube was humbling.
All of the keynote panelists met early in the morning to get equipped with microphones and to go over the format.
See the video (below) to hear some funny "can't say" answers by Paul Tuckfield. I wish Google won't keep him so secretive about numbers such as how many database servers etc. Does that really give out YouTube's secrets?
Following are some photos, videos and links to notes from the keynote.

From left to right:
Monty Taylor (MySQL),Matt Ingenthron (Sun), John Allspaw (Flickr), Farhan "Frank" Mashraqi (Fotolog), Domas Mituzas (MySQL/Wikipedia), Jeff Rothschild (Facebook) and Paul Tuckfield (YouTube)

Jam packed ballroom during the keynote.
Above Photos copyright: James Duncan Davidson.

Kaj Arnö leads the scaling MySQL keynote panel discussion.

Me getting animated.

Domas Mituzas, Jeff Rothschild and Paul Tuckfield at the keynote.

Matt answers a question as everyone listens
More photos from the keynote session are available at http://photos.mashraqi.com.
Video of keynote session:
-
- Sheeri/Technocation: Download, Play
- A short video by Zack Urlocker
Notes from scaling up or out keynote:
- Biographies of keynote panelists
- Keith Murphy: Scaling MySQL - - Up or Out? Panel @ UC
- Ronald Bradford: Scaling Wisdom
- Venu Anuganti: Notes from Scaling MySQL Up or Out
I really enjoyed being a keynote panelist with my peers. We were seated according to our Alexa ranking with the highest ranking YouTube on the right side. Even though I was representing the thirteenth largest site, our traffic compared to Facebook and YouTube was humbling.
All of the keynote panelists met early in the morning to get equipped with microphones and to go over the format.
See the video (below) to hear some funny "can't say" answers by Paul Tuckfield. I wish Google won't keep him so secretive about numbers such as how many database servers etc. Does that really give out YouTube's secrets?
Following are some photos, videos and links to notes from the keynote.

From left to right:
Monty Taylor (MySQL),Matt Ingenthron (Sun), John Allspaw (Flickr), Farhan "Frank" Mashraqi (Fotolog), Domas Mituzas (MySQL/Wikipedia), Jeff Rothschild (Facebook) and Paul Tuckfield (YouTube)

Jam packed ballroom during the keynote.
Above Photos copyright: James Duncan Davidson.

Kaj Arnö leads the scaling MySQL keynote panel discussion.

Me getting animated.

Domas Mituzas, Jeff Rothschild and Paul Tuckfield at the keynote.

Matt answers a question as everyone listens
More photos from the keynote session are available at http://photos.mashraqi.com.
Video of keynote session:
-
- Sheeri/Technocation: Download, Play
- A short video by Zack Urlocker
Notes from scaling up or out keynote:
- Biographies of keynote panelists
- Keith Murphy: Scaling MySQL - - Up or Out? Panel @ UC
- Ronald Bradford: Scaling Wisdom
- Venu Anuganti: Notes from Scaling MySQL Up or Out
Labels:
mysql,
scalability,
scale out,
scale up
Wednesday, April 02, 2008
Velocity Conference
O'Reilly's Velocity Conference is happening this year from June 23-24 at Burlingame, CA. Velocity site describes this new conference as:
When the call for papers was open for Velocity, I submitted a talk proposal regarding cutting MySQL IO for cost effective scaling and performance optimization.
Fotolog is one of the largest sites on the Internet. We are ranked 13th most visited site by Alexa and 3rd most active social network by ComScore. In the past two years, we have experienced and continue to experience incredible growth. By focusing on efficient data modeling and cutting I/O, we have literally pushed the limits of optimization and scalability when it comes to MySQL.
Learning today that my session was not accepted obviously came as a major disappointment to me. While I truly respect the conference chair's decision, I believe my session would have been useful for those who are experiencing strong growth but cannot afford to re-architect their database backend for one reason or another.
There is some good news as well: While Velocity rejected my proposal, I am presenting a somewhat similar session at this year's MySQL Conference. The session is called "Optimizing MySQL and InnoDB on Solaris 10 for World's Largest Photo Blogging Community". If you're attending the conference and interested in knowing how you can push the limits of your MySQL database servers on Solaris, don't forget to attend my session. It will be a lot of fun, I promise!
I am also presenting two more talks at the MySQL Conference, Disaster is Inevitable—Are You Prepared? and The Power of Lucene.
"Web companies, big and small, face many of the same challenges: sites must be faster, infrastructure needs to scale, and everything must be available to customers at all times, no matter what. Velocity is the place to obtain the crucial skills and knowledge to build successful web sites that are fast, scalable, resilient, and highly available."
When the call for papers was open for Velocity, I submitted a talk proposal regarding cutting MySQL IO for cost effective scaling and performance optimization.
Fotolog is one of the largest sites on the Internet. We are ranked 13th most visited site by Alexa and 3rd most active social network by ComScore. In the past two years, we have experienced and continue to experience incredible growth. By focusing on efficient data modeling and cutting I/O, we have literally pushed the limits of optimization and scalability when it comes to MySQL.
Learning today that my session was not accepted obviously came as a major disappointment to me. While I truly respect the conference chair's decision, I believe my session would have been useful for those who are experiencing strong growth but cannot afford to re-architect their database backend for one reason or another.
There is some good news as well: While Velocity rejected my proposal, I am presenting a somewhat similar session at this year's MySQL Conference. The session is called "Optimizing MySQL and InnoDB on Solaris 10 for World's Largest Photo Blogging Community". If you're attending the conference and interested in knowing how you can push the limits of your MySQL database servers on Solaris, don't forget to attend my session. It will be a lot of fun, I promise!
I am also presenting two more talks at the MySQL Conference, Disaster is Inevitable—Are You Prepared? and The Power of Lucene.
Labels:
mysql conference,
scalability,
velocity
Wednesday, October 24, 2007
Scaling MySQL with Solaris 10: Webinar
On Wednesday, October 24, at 2 PM EST I will be presenting an Information Week webinar about scaling MySQL on Solaris 10. The webinar is free to attend and sponsored by Sun. The webinar will be followed by a Live QA session.
Register for the webinar
Representatives from both Sun and MySQL will be present to take your questions. Hope to see you there!
Update: The recording (with audio) of this webinar is now available on Information Week website. Please note that registration is still required. After registration you should be able to view the recording.
Register for the webinar
DESCRIPTION
Fotolog is the world's largest photo blogging social network, boasting more than 700,000 new photos per day and more than 3 billion page views a month. More than 11 million fotologgers communicate and connect through their photos on Fotolog. With the help of Solaris 10 and MySQL Enterprise, Fotolog has scaled to become a top 20 destination on the Internet according to Alexa.
In this TechWebcast, Sun Microsystem's partner Fotolog shows creative ways used to scale their Web 2.0 architecture and MySQL Enterprise with Solaris 10 advanced features. Unlike typical scalability presentations which focus on the entire stack, Fotolog will emphasize on the role that understanding one's application, operating system and storage engine can play in addressing scalability challenges.
Web 2.0 applications world-wide rely on Sun to deliver the right combination of product innovation and capability, coupled with market leading partner solutions like Fotolog and MySQL.
Representatives from both Sun and MySQL will be present to take your questions. Hope to see you there!
Update: The recording (with audio) of this webinar is now available on Information Week website. Please note that registration is still required. After registration you should be able to view the recording.
Labels:
mysql,
presentations,
scalability,
solaris 10,
sun,
sun hardware
Wednesday, April 25, 2007
Fotolog: Scaling the world's largest photo blogging community
Yesterday my talk about "Scaling the world's largest photo blogging community" went very well and I couldn't be more happier. There were a lot of questions from the audience at the end which made me really happy as it was a clear sign that my presentation wasn't flying over their head :)
Thank you to all those who attended. I will be posting the slides to my talk later tonight (it sucks that blogger doesn't has file upload).
Thank you to all those who attended. I will be posting the slides to my talk later tonight (it sucks that blogger doesn't has file upload).
Subscribe to:
Posts (Atom)