Thursday, April 26, 2007

InnoDB Performance Optimization

I am sitting in the session "InnoDB Performance Optimization" by Heikki Tuuri and Peter Zaitsev. This is going to be the last session of a very enjoyable and informative MySQL Conference and Expo 2007.

General Application Design is Paramount
Of course, you should design your schema, indexes and queries right. Storage engines aspects are often fine tuning. Storage engine selection may affect your schema layout and indexes.

Each storage engine has unique design and operating properties. App written for 1 storage engine may not perform best with other storage engines. Special optimizations exist for each storage engine.

There are always transactions with InnoDB, even if you don't use them explicitly. There is a cost with transactions. If running in autocommit mode, each transaction commit overhead for each statement.

You should wrap multiple updates in the same transaction for efficient operation (SET AUTOCOMMIT=0; ... COMMIT; ... COMMIT'_

HT points that InnoDB cannot purge deleted rows if transactions are open. Transaction commit doesn't affect query caching. If you run SELECT inside transaction Query cache may not be used.

You should make sure that you're catching deadlocks and wait timeouts. Transactions should not be too large. Deadlocks are normal and you should expect them. In app, try to reduce them but they will occur. If you run SELECTs you don't get deadlocks normally. Your application should deal with deadlocks appropriately.

Do not use LOCK tables as LOCK TABLES is designed to work with table level locking storage engines. With tow level lock storage engines, transactions are better choice. The behavior of LOCK TABLES is different in MySQL versions and depends on --innodb_table_locks. It can give problems for portable applications if you port from MySQL4 to later. Behavior may not be what you expect from MyISAM.

PRIMARY KEY CLUSTERING: It is very important. Accessing data by PK is faster than other keys and is true for both in-memory and disk based accesses. You should try to do most lookups by primary key.

In InnoDB data is clustered by PK and sequential PK values will likely have data on the same page. PK range and prefix lookups are also very efficient. It can be used to cluster data that is accessed together.

If storing user messages, you can use (user_id, message_id) primary key to keep all users messages in a few pages.

PK creates a "covering index" for any set of fields in the PK because PK holds all the data

What is the cost of clustered primary key:
PK in random order are costly and lead to table fragmentation. PK inserts should normally be in an ASC order. If you can, load data in PK order. At times, you may find that changing primary key to auto_increment is a good idea.

If you don't specify one, then InnoDB will create a clustered key internally, so you are better off defining one and using it.

Primary Key updates are very expensive and requires row data physically to be moved from one place in the index to another. The row has to be deleted and then inserted after which all secondary keys are updated. Never update primary key, change it to AUTO-INCREMENT. HT: If you need to update primary key then there is something wrong with your schema. Generally not a good schema/application design either.

What is a huge PK? It all depends on the access patterns of your application. Benchmark different keys.

What about 32-bit hash? That's like shooting yourself. Its really long and will be in non-sequential order.

If really large primary key then use AUTO-INC and demote your original primary key to unique key.

Keep your primary key short as secondary indexes use primary key to refer to the clustering index which makes PK value part of any/every index.

Long PK makes your indexes slow and long so you are better off reducing them.

Go easy on UNIQUE indexes as they do not use the "insert buffer" which can speed up index updates significantly. Indexes in InnoDB are not prefix compressed so they can take much more space then for MyISAM indexes (upto 10 times larger for the same structure). You should really avoid excessive or unneeded indexes. Remember MySQL will at most use one index for a query, not two.

Keep your updates fully indexed or you can see unexpected locking problems. A statement such as DELETE FROM users WHERE name="peter" may have to lock all rows in the table if the column name is not indexed. HT: MySQL 5.1 read-committed then only those rows are locked that are actually deleted.

AUTO-INCREMENT may limit scalability: AUTO-INC INSERTS may use table level locks (but only to the end of the INSERT statement, not transaction) even if you specify the AUTO-INC column value.

AUTO-INCREMENT limits scalability for concurrent inserts. HT: they have a patch and they are deciding on whether to implement it in 5.1.

MVCC: Complements row level locking to get even better concurrency. Standard SELECT statements set no locks, just reads appropriate row version.

LOCK IN SHARE MODE, FOR UPDATE modifiers can be done to do locking reads.

Even long running selects do not block updates to the table or other selects. Overly long queries (transactions) are bad for performance as a lot of unpurged versions accumulate. InnoDB can only purge rows when no active transactions are running. READ COMMITTED can ease the problem in such cases. InnoDB can only remove a row version when no transactions are open which can read it. HT: READ COMMITTED releases the snapshots at SELECT statements and doesn't keep it for the duration of transaction. READ COMMITTED is better for long transactions as it will let InnoDB purge transactions in background.

How MVCC plays with indexes? When you have index, reference is stored in index. When you update an index more and more versions are in the index that can slow index operations as well.

For UPDATE and LOCK IN SHARE MODE: Locking selects are executed in read committed mode because you cannot lock a row which doesn't exist. Because of this the results of these queries can be different than for standard SELECTs.

SELECT ... FOR UPDATE always has to access row data page to set the lock, so it can't run index covered queries which can further slow down queries significantly.

Reducing deadlocks: Deadlocks are normal for a transactional database. Non-locking SELECT statements do not deadlock with InnoDB. You should make sure to handle deadlocks and lock wait timeouts in your application. You should also make sure that your transactions lock data in the same order when possible.

Your update chunks should be smaller. You can achieve that by chopping transactions.

If you are going to update most of the selected rows then you are better off using SELECT ... FOR UPDATE. External locking can be used to avoid problem such as application level locks, SELECT GET_LOCK('mylock') etc.

Affect of isolation modes over performance:
Isolation modes in InnoDB can be set globally, per connection, or per transaction.

READ-UNCOMMITTED: If you are fine with dirty reads. Performance improvement is still limited.

Foreign Keys Performance: InnoDB checks foreign keys as soon as a row is updated, no batching is performed or checks delayed till transaction commit. FKs have their own cost and often have serious performance overhead. They are ideal for maintaining data consistency.

FKs increase amount of row level locking done and can make it spread to a lot of tables besides the one being updated.

Restrict number of open transactions: InnoDB performs best with a limited number of opern transactions and running queries. Multiple running queries may cause a lot of thrashing bumping into each other. Work is being done to improve performance in such cases. InnoDB_thread_concurrency can be used to restrict number of threads in InnoDB kernel. Many open transactions make lock graph building more cplicated and increase some other overhead. Whenever possible, you should keep a limited number of queries running at the same time by queuing on application side.

Guidelines for setting concurrency level: Manual says set it higher or lower. It really depends on application and number of hard drives.

HK: if you are worried about thread thrashing then set thread concurrency to 1.

INSERT...SELECT statement runs locking select. It is required for logical level replication to work properly. The problem goes away with MySQL 5.1 row level replication and the READ COMMITTED isolation level. Behavior is the same whenever you have log-bin enabled or not, to keep things consistent. innodb_locks_unsafe_for_binlog helps in 5.0 but your replication may get broken so do it if you know what you are doing. It also disables next-key locking.

SELECT...INTO OUTFILE + LOAD DATA INFILE can be often use as non-blocking safe alternative.

COUNT(*) facts and myths:
MyTH: "InnoDB doesn't handle count(*) queries well": Most count(*) queries are executed same way by all storage engines if you have a WHERE clause, otherwise you InnoDB will have to perform a full table scan.

FACT: InnoDB doesn't optimize count(*) queries without the where clause

You can use triggers and counter table to work it around or use SHOW table status like 'users' to get an approximation.

InnoDB and GROUP COMMIT: GC lets you commit several outstanding transactions with single log write. You can improve performance dramatically especially if no RAID with BBU. In 5.0, group commit does not work with binary logging due to a way XA (distributed transactions) support was implemented. Watch out for it if upgrading from MySQL 4.0.

Heikki will now talk further about server tuning settings:: Everything starts with innodb_buffer_pool_size which specifies main InnoDB buffer: data and index pages, insert buffer and locks are stored here. Most activity of InnoDB runs inside buffer pool. Why innoDB has its own cache? It's more efficient than OS cache. THe buffer pool is very important for performance on large data sets and much more efficient than OS cache especially for writes. InnoDB has to bypass OS buffering for writes. You can set buffer pool to 70-80% of memory for dedicated InnoDB-Only MySQL. Default value is just 8M, independent of available memory, just make sure to configure it.

InnoDB_additional_mem_pool: just stores dictionary information and is automatically increased, do not set too high. HK just uses 10MB for it. In future they may remove it altogether.

InnoDB_log_file_size: dramatically affects write performance. You should keep it high. However, high values increase recovery time. Check how large logs you can afford. Combined log file size should be 50% of the buffer pool memory.there is a 4GB total size limit.

InnoDB_log_files_in_group: this is the number of files of specified size that are used for log. There is usually no need to change default value.

InnoDB_log_buffer_size: You shouldn't set over 2-8MB unless you use huge BLOBs. Log file is flushed at least once per second anyway. Check innodb_os_log_written growth to see how actively your logs are written. innoDB logs are physio-logical, not page based so they are very compact.

InnoDB_flush_logs_at_trx_commit: By default logs are flushed to the disk at each transaction commit. This is required for ACID guarantees and is expensive. You can set to 2 or 0 if you can afford losing transactions for last one second or so (ie if you are using it as MyISAM tables replacement). Value 2 is nice if mysqld crashes then all your transactions are preserved, it is only bad in case of power crash or OS crash.

InnoDB log resizing: is not as simple as changing option and restarting. You need to shut down MySQL server and make sure it shuts down normally (check error log for errors to confirm). Then move away InnoDB log files ib_log*. Then start the MySQL server and check error log files to see if it successfully created new log files.

InnoDB_flush_method: specifies a way InnoDB will work with OS file system. On Windows unbuffered IO mode is always used. On Unix you can use fsync() or O_SYNC/O_DSYNC for flushing files. fsync() is usually faster. Always use default one or O_DIRECT (Peter).

InnoDB_file_per_table: allows InnoDB to store each table in its own file and makes disk management easier. Main tablespace is still needed for system needs. It can help to spread tables to multiple disks. It also allows you to reclaim space if a table is dropped. Sometimes slower for writes as fsync() is called sequentially. Can increase startup/shutdown time with large number of tables. I asked Heikki but unfortunately there are no plans to remove dependency on main tablespace as there are rollback segments that need to be there :((

Will there be InnoDB tablespace per database? No plans. Flexible tablespaces are similar to Oracle.

Other file IO settings:
innodb_autoextend_increment: HT doesn't think this is important. It specifies growth increment for shared tablespace (not for per table tablespaces). Larger values allow to reduce fragmentation.

InnoDB_file_io_threads: changes number of IO threads on Windows only. Note that all 4 threads are doing different jobs. In future it may change?

InnoDB_open_files: number of files used for per table tablespaces. You should increase it you have a lot of tables. No stats are available so far to show number of re-opens InnoDB needs to do.

innodb_support_xa: if set to 0 it reduces work InnoDB should do on transaction commit, however binlog can get out of sync.

So how do you minimize restart times: InnoDB buffer pool may have a lot of unflushed data and shutdown may take very long time. If you need to minimize downtime then

SET GLOBAL innodb_max_dirty_pages_pct=0

Watch for InnoDB_buffer_pool_pages_dirty in SHOW STATUS. As it gets close to 0, shut down the server.

InnoDB doesn't remove rows on delete and old row versions on update because these may be needed by other transactions. Purge thread is used to clean up these unused rows. In some workloads, the purge thread may not be able to keep up and the tablespace will grow witout bounds. CHECK TRANSACTIONS section in SHOW INNODB STATUS.

innodb_max_purge_lag limits number of transactions which have updated/deleted rows. This will delay insert/updates so purge thread can keep up. What is a good value of innodb_max_purge_lag? HT will set it to 100,000. It is measured in history list which contains transactions. Million transactions for this is too much.

innodb_commit_concurrency: umber of threads that are allowed at commit stage at the same time.

Questions: Are there any plans to make InnoDB's output to error log in a more defined/structured format?
Why does InnoDB show status sometimes only shows X transactions running and then truncates.


arathael said...

Great note pal!

I wish there were conferences like those in Mexico... When I say words like DB Optimization or Performance Audit everybody says ¿HUH?...


Frank said...


until I moved to NY from GA, I experienced the same :)


Bipin said...

Gr8 article