Wednesday, June 01, 2011

MySQL for Big Data

An excerpt from article on mysql for big data published in Dow Jones Venture Wire by Scott Denne.

There is one possible solution to the problem that doesn't include companies having to buy new software tools or even an all-new database: With the right expertise, MySQL can be engineered to handle almost any data-intensive application. The only problem is that there's a shortage of people who have the expertise to make it work.

"There's a big time gap until we, as an industry, think we have data under control," said Frank Mashraqi, chief technology officer at MyLawsuit.com and former database chief at Fotolog Inc., a photo blogging site. "The roadmap to getting that expertise is very difficult and time doesn't allow for it."

Monday, February 07, 2011

Presenting "Real-Life Use Cases From Data Administration Hell" at LAMySQL

If you're in the Los Angeles area on Feb 15, come hear my talk at LAMySQL inspired by learnings from real-life experiences. In addition to hearing a very unique and interesting talk, you can win an AppleTV thanks to awesome folks at @NoodleYard.

Real-Life Use Cases From Data Administration Hell

Data is the most valuable asset of an organization because it's irreplaceable.

Yet, we hear about f**k ups related to data administration every day by startups and organizations of all sizes. Sometimes it's no one's fault. Sometimes it's the fault of a drunk friend who shouldn't have been [wherever he was] at the first place.

Yet, at other times, the disaster could have been prevented. Sometimes, these f**k ups are caused by bad design. Sometimes, it's a bad query that made it into the production branch. Sometimes, it's a human error that ruins the day.

Ever had a bad query slip through QA? Or a configuration option that you thought would help the situation? Sometimes, the resulting disaster could have been prevented if those operating had simply followed the rules. Sometimes it's the lack of presence of rules that leads to a disaster. Sometimes, the "acts of prevention" worsen the impact of the disaster. Sometimes it's over confidence of those administering data.

Imagine deletion of a wrong record or from a wrong server. Or not treating the only SAN as a SPOF. Sometimes, the f**k up has been happening for years, yet no one realized or fixed it.

Sometimes, the f**k up is created intentionally. By focusing on things other than operational and capacity requirements. Sometimes, a small error threatens the very existence of a company.

At least once of a $100M company.

These f**k ups happen everywhere. At organizations of all sizes.

In this talk, Frank Mashraqi will explore real-life inspired, breath-taking (anonymized) use cases that created data administration hell for an organization. He will also explore how, if at all, these f**k ups could have been avoided.

This session presents an opportunity to learn from the real-life costly data administration mistakes of others and what strategies can help you with not getting caught off guard.

Bio

With more than a decade of scalability, disaster recovery and engineering management experience under his belt, Frank specializes in building and scaling NoSQL and SQL based platforms for graph processing and big data deployments using low concurrencies.

He is an expert in audience acquisition through organic search engine optimization and audience monetization through cutting edge technologies as re-targeting, social targeting and influencer targeting.

His past experience includes co-founding a graph processing company that applies advanced sociological theories to online advertising, scaling Fotolog to help it become the 13th most visited site on the Internet, and advising companies like Betaworks, Bitly, TwitterFeed, Chartbeat and ShermansTravel.

He holds a BBA in Accounting and a BS in Computer Information Systems.


PS: Many thanks to JoeDevon for inviting me to speak

PS: I'll be driving from the Bay Area so if anyone is interested in riding with me from SF to LA and back, let me know :)

Wednesday, December 08, 2010

SEO / SEM SWOT

Since most of my friends here have asked me questions on SEO. I thought i'd share this presentation that I posted on SlideShare. This was a pre-engagement presentation. It contains over a 100 slides and covers many dimensions of SEO. Enjoy!

Monday, December 06, 2010

Disaster @ Tumblr

Tumblr has been down for more than 12 hours due to an issue with their database cluster. Here is the comment I left on GigaOm.com

This is the freshest lesson for entrepreneurs and startups:
- Learn to value your data
- Implement a high availability plan
- Plan a disaster recovery strategy

“Tumblr likely has the resources to recover…”

I really hope that holds out true but remember, data is the only irreplaceable asset of an organization. Once it’s gone, it’s gone.

When I was handling the disaster at Fotolog (massive database corruption when our SAN crashed), I couldn’t find any company or consulting firm ready to handle the situation and help with data recovery. It was a miracle that I came across the concept of DUDE (Data Unloading by Data Extraction) and started writing InnoDB data recovery programs in sheer desperation. In case of Fotolog, we had all basic infrastructure in place for redundancy and high availability. The component that caused the disaster was the one we relied most upon: “the financial grade strength SAN.”

The point I am trying to make is having access to cash in the bank + large userbase + really smart engineers doesn’t provide any guarantee that your data will be safe in case of a disaster.

Times like these can be of incredible stress on those handling the situation. I feel for folks at Tumblr and hoping for a speedy recovery.

Good luck Tumblr guys! You’re in my thoughts.

Frank

Friday, December 03, 2010

Sequoia backs MongoDB with $6.5M investment

Some exciting news coming from 10Gen, the company behind MongoDB. It announced today that Seqouia is investing $6.5M in it's high performance, document-oriented (BSON), key-value based NoSQL solution that supports automatic sharding and dynamic queries. Foursquare, Disqus, Etsy, Sourceforge, eVite, EventBrite and New York Times are all users of 10Gen. The features this young NoSQL solution offers is truly impressive. See MongoDB page on my Big Data Low Latency site for quick review of MongoDB.

I had the opportunity to meet with Roelof Botha few months ago as Sequoia was looking to invest in the NoSQL space and was evaluating both hardware and software solutions to solving big data challenge. Since then I was eager to hear which of the many startups in the NoSQL space will receive Sequoia's blessing. Now we know :)

Wednesday, December 01, 2010

Video: Netflix's migration to AWS cloud

Found this video regarding Netflix's migration to Amazon's AWS cloud very informative. Enjoy!

Cloud Migration Whitepapers

Amazon's AWS team has published a series of whitepapers covering various scenarios for migrating into AWS cloud infrastructure. Links to these whitepapers are provided below for your convenience:
Cloud Migration
- Migrating applications to the AWS cloud
- Migrating web application
- Migrating batch processing applications
- Migrating backend processing pipelines

Big Data: Freedom or Something Else?

Googling around, I came across Bradford Cross' article, Big Data Is Less About Size, And More About Freedom. Bradford writes, " The scale of data and computations is an important issue, but the data age is less about the raw size of your data, and more about the cool stuff you can do with it."

Even though the article makes some good points, I'm not sure I can agree with Bradford's point of view here. As an architect, when I think in terms of Big Data, the ability to do "cool stuff" is probably the last thing that crosses my mind. Big Data, to me, is about ensuring constant response time as the data grows in size without sacrificing functionality.

What do you think Big Data is about? Is it merely about being able to do 'cool stuff' with your data? Is it about ensuring constant access/response times? Or is it about something else? I'm eager to hear your thoughts.

Tuesday, November 30, 2010

SQL and NoSQL

Alaric Snell-Pym discusses why choose between SQL and NoSQL? Why can't you use both in your infrastructure?

"NoSQL engines abandon SQL for the chance to have more flexible data models and softer semantics for update operations - but they also abandon it because it’s a lot of work to implement. And, creating a new database from scratch, they’re keen on solving the interesting hard problems (such as replicated data storage), rather than following the well-trodden path of writing SQL parsers and query planners, with a few decades of catching up with the competition ahead of them."

Hate the dirty recruitment tactics

I hate it when recruiters reach out to you with a message indicating that they are looking for 'key positions' and when you follow up, the tone changes to "we're just looking for engineers." This happens all the time and the latest company to play this recruitment tactic is LinkedIn. Guys, can't you decide whether you are looking to fill a 'key position' or just an engineering position before reaching out to candidates? I can see that mentioning 'key position' will get a candidate's attention but this is just a low level tactic.

Lean Startups and Scalability

I wrote this as a reply to Does Lean Startup Methodology Apply to Consumer Startups?" However, due to comment length restrictions on that blog, I am posting my comment here and welcome your thoughts.

"An enterprise will pilot products and iterate with a vendor: Let's run a 6 month consulting engagement/pilot to evaluate if this new database solves the problem."

Only an enterprise where there is a major disconnect between management and engineering will opt for this path. In enterprises where needed data I/O patterns are understood, taking such path may spell disaster.

The primary problem with the 'lean startup' methodology that I see is that it blindly preaches entrepreneurs to close their eyes, cut corners and just get the product to market without fully understanding the future scalability needs. Scalability doesn't has to be sacrificed in order to build a lean startup, except in those cases where there is no architect on board.

Many entrepreneurs think of taking route of frameworks in early days only to find out the haunting effects as growth happens.

Case studies exist where massive infrastructures with low latencies have been built without blowing budgets and need for re-architecting infrastructures. The concept of lean startups is great, but incomplete (especially as it is being preached).

When the right architects and team is on-board, building the right way also becomes the lean way.

<update> Let's not forget that data is the most valuable asset of an organization and every migration a great opportunity for a screwup. Do you really want to migrate it around from database to database as you sit with your fingers crossed hoping that the latest vendor will solve your problem? Or does it make more sense to invest in an experienced architect and then make decisions rather than shooting in the dark? I'll let you decide the rest.

Monday, November 22, 2010

Probably the worst way to deal with a stuck query...

is to disable a customer's account for more than 24 hours without any warning whatsoever. This happened to one of my accounts and I'm beyond furious at the database and network administrators of HostGator.com. Seriously, guys, I don't know of a more unprofessional way of dealing with a stuck query.

Wednesday, October 27, 2010

MySQL at Facebook

Mark your calendars for Nov 2 as Mark Callaghan and Facebook's MySQL team will be talking about how MySQL is used at Facebook.

Thursday, April 02, 2009

Hadoop Elastic MapReduce by AWS

Amazon today launched a beta of it's Elastic MapReduce (hosted hadoop). This is exciting and just in time for my upcoming, Hadoop and MySQL: Friends with benefits, session at the MySQL Conference & Expo.

I can't wait to try it out!

Wednesday, March 18, 2009

Community One East - What will Sun announce?

I will be attending Community One tomorrow and on Thursday at Marriott Marquis Hotel, New York, NY. I am especially looking forward to the announcements tomorrow which sound very interesting :)

The first day is a free event featuring:

* Cloud Platforms – Development and deployment in the cloud.
* Social and Collaborative Platforms – Social networks and Web 2.0 trends.
* RIAs and Scripting – Rich Internet Applications, scripting and tools.
* Web Platforms – Dynamic languages, databases, and Web servers.
* Server-side Platforms – SOA, tools, application servers, and databases.
* Mobile Development – Mobile platforms, devices, tools and application development.
* Operating Systems and Infrastructure – Operating systems and virtualization.
* Free and Open – Open-source projects, business models, and trends.

The second day of the event is focused on Deep Dives with two half-day sessions on MySQL and two full-day sessions on Java and Web development. I will be attending the session, "Using Java EE and SOA to Architect and Design Robust Enterprise Applications."

Following the conference, I will be a panelist at a Cloud Computing Seminar at Microsoft office in NY. It's going to be a long but exciting day!

It will be great to catch up with old and new friends at the event.

Monday, March 02, 2009

Cloud Computing - Executive Seminar

Tomorrow, I'll be attending the Executive Seminar on Cloud Computing at NASDAQ MarketSite (NY). Speakers include Dr. Werner Vogels and Mårten Mickos (ex-CEO of MySQL). Big thanks to Amazon and RightScale who were able to accommmodate my RSVP even when the registration had formally closed.

I hope to be able to catch up with Mårten Mickos during the event. In case I do succeed in catching up, is there any question you want me to ask him? You can email me or post a comment.

It's funny that the event site still shows Mårten's title as "SVP of Sun Microsystems’ Database Group."

Sunday, March 01, 2009

FriendFeed uses MySQL to store "Schema-less" data

Came across an interesting post by Bret (co-founder of FriendFeed) about how FriendFeed uses MySQL to store "schema-less" data. According to the post, they weren't having issues with scaling existing features but rather they were experiencing pain when trying to add features.

Now the way they are using MySQL is interesting and bizarre at the same time. At a very high level, it seems their approach is to use a RDBMS as if it is a column-oriented database. Of course, it makes me wonder why not just use a column-oriented database? I need to read the post again in the morning (too tired right now so just gave it a quick glance).

I am very interested in hearing thoughts from my peers at Planet MySQL regarding this approach. They seem to have gone great lengths to go this route. What issues and benefits you see of this approach and whether you ever see yourself taking this route? I, for one, am not entirely convinced of this approach and whether it can really scale down the road. Also, if it was someone other than Friend feed going down that route, I might have actually lost my tempered and yelled :)

Side note: Friendfeed is growing fast, and it would have been cool if Bret was speaking at one of the three upcoming MySQL events in April.

Thursday, November 13, 2008

OpenSQL Camp Starts Tomorrow!

So the good news is that the inaugural OpenSQL Camp is going to be an awesome event with mouth watering sessions by noted experts. The bad news (for me) is that I won't be attending it, which makes me sad. I cannot leave my town because my wife can go into labor anytime now.

The session list looks great! Congratulations and thanks to Baron, Sheeri, Ronald, all the sponsors and contributors for organizing the first OpenSQL Camp.

I will be watching PlanetMySQL closely for juicy blog posts. Hopefully, the one and only Sheeri is taking her camcorder!

Monday, November 10, 2008

Stack Overflow: Q&A Site

Today I discovered Stack Overflow, a collaborative site that focuses on technical Questions. You can ask questions related to any language, apparently without having to register. The site is currently in beta. There are also a few MySQL questions that are currently unanswered.

Sunday, November 09, 2008

Scalability As A Functional Or Non Functional Requirement

I am currently tasked with writing Software Requirements Specification (SRS) document for a project. Effective sharding (based on specific criterion) and Scalability are key requirements of the project.

Scalability is traditionally classified as a non-functional requirement. My question to the community is that if scalability is crucial to a project, would it still be classified as a non-functional requirement? Are their cases when scalability requirements would be best classified as functional requirements?