This post is my first one after the blog got a new design! Yayyy! I hope you guys enjoy it and remember: feedback is always welcome!
One of the most common questions that I receive is how to improve MySQL replication performance. Or to fix a replica (old slave nomenclature) falling behind the source (old master nomenclature).
So, let’s start with some solutions and consider their advantages and disadvantages.
Source Parameters
binlog_transaction_dependency_tracking: The default value is COMMIT_ORDER and means that the dependency information is generated from the source’s commit timestamps. Changing this parameter to WRITESET allows MySQL to parallelize tuples that are not updating the same value.
If you are running MySQL 5.7, before setting the binlog_transaction_dependency_tracking, you need to set the transaction_write_set_extraction parameter.
Drawbacks: NONE
binlog_group_commit_sync_delay: Controls how many microseconds the binary log commit waits before synchronizing the binary log file to disk. In other words, increasing this parameter allows MySQL to write more transactions in the same commit group, increasing the level of parallelism.
Drawbacks: Increasing this value too much will make the application write slower
Recommended Master Settings
Do not forget to add the settings to your my.cnf file to survive restarts.
1 2 3 4 5 6 7 8 |
master [localhost:46009] {msandbox} ((none)) > set global binlog_group_commit_sync_delay = 3000; Query OK, 0 rows affected (0.00 sec) master [localhost:46009] {msandbox} ((none)) > set global transaction_write_set_extraction=XXHASH64; Query OK, 0 rows affected (0.00 sec) master [localhost:46009] {msandbox} ((none)) > set global binlog_transaction_dependency_tracking=writeset; Query OK, 0 rows affected (0.00 sec) |
Replica Parameters
sync_binlog: MySQL server synchronizes its binary log to disk (using fsync()) before the commit of every transaction. This option is the safest choice but brings a significant drop in performance.
Drawbacks: If the replica crashes, its binary log might lose some transactions.
innodb_flush_log_at_trx_commit: This is probably the most well know parameter by MySQL DBAs. Setting the innodb_flush_log_at_trx_commit to 0 or 2 disables the strict ACID compliance. The value of 2 instructs InnoDB to write into log files after transaction commit, but don’t fsync. Instead, it fsyncs log files every second. This way, you can lose 1 second of updates in case if OS crashes. With modern hardware that supports up to 1M inserts per second, this can be quite a huge number. Version 0 is even worse: you may lose up to 1 second of transactions even if mysqld crashes, not only the operating system.
Drawbacks: If MySQL or the OS crashes, data can be lost.
slave_parallel_type: Transactions that are part of the same binary log group commit on a source are applied in parallel on a replica. (Hence the importance of setting binlog_group_commit_sync_delay in the master). The default option (DATABASE) only parallelizes transactions that come from different databases.
Drawbacks: NONE
slave_parallel_workers: We need to do some math to calculate this number. That’s because it depends on the workload that the replica is handling. To estimate the potential parallelism execute the following on the REPLICA:
1 2 3 4 5 |
UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE 'events_transactions%'; UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME = 'transaction'; |
And create the view:
1 2 3 4 5 6 7 |
use sys; CREATE VIEW mts_summary_trx AS select performance_schema.events_transactions_summary_by_thread_by_event_name.THREAD_ID AS THREAD_ID, performance_schema.events_transactions_summary_by_thread_by_event_name.COUNT_STAR AS COUNT_STAR from performance_schema.events_transactions_summary_by_thread_by_event_name where performance_schema.events_transactions_summary_by_thread_by_event_name.THREAD_ID in (select performance_schema.replication_applier_status_by_worker.THREAD_ID from performance_schema.replication_applier_status_by_worker); |
Finally, you need to run the query to evaluate the parallelism:
1 2 |
SELECT SUM(count_star) FROM mts_summary_trx INTO @total; SELECT 100*(COUNT_STAR/@total) AS PCT_USAGE FROM mts_summary_trx; |
Now, let’s see an example and analyze the results:
1 2 3 4 5 6 7 8 9 |
+-----------+ | PCT_USAGE | +-----------+ | 41.5845 | | 31.4046 | | 13.0119 | | 11.0909 | | 2.9081 | +-----------+ |
From the result above, we can observe that two slave threads are performing 72% of the replication. Furthermore, one of the threads is applying only 2.9% of the effort. So, if we analyze the numbers, it is interesting to set slave_parallel_workers value to 4.
Drawbacks: Setting a high value can cause high CPU usage and be ineffective; on the other hand, setting to a low value might cap the replication performance.
Recommended Replica Settings
Do not forget to add the settings to your my.cnf file to survive restarts.
1 2 3 4 5 6 7 8 9 10 11 |
slave1 [localhost:46010] {msandbox} (sys) > set global sync_binlog = 0; Query OK, 0 rows affected (0.00 sec) slave1 [localhost:46010] {msandbox} (sys) > set global innodb_flush_log_at_trx_commit=0; Query OK, 0 rows affected (0.00 sec) slave1 [localhost:46010] {msandbox} (sys) > set global slave_parallel_type = LOGICAL_CLOCK ; Query OK, 0 rows affected (0.00 sec) slave1 [localhost:46010] {msandbox} (sys) > set global slave_parallel_workers = 4; Query OK, 0 rows affected (0.00 sec) |
Final Considerations
I recommend having GTID enabled for some known issues. I recommend reading this blog post:
https://www.percona.com/blog/2015/01/29/multi-threaded-replication-with-mysql-5-6-use-gtids/
As you could observe, most of the decisions involve a balance between performance and the risk of losing some data on the replica. Some DBAs to avoid this, they create multiple replicas while others assume the risk, and in case of a crash, they recreate the slave from the master.
I hope it helped! See you!
Enabling innodb_flush_log_at_trx_commit=0 having chance of data loss if Mysql crashes. In that case 1 is recommended value?
Hi Ramesh,
Yes, the default value (1) is the safest choice and ensure ACID properties.
Excelente post Vinnie, parabéns
Obrigado Aldo 🙂