MySQL Compressed Binary Logs

On a busy server, the binary logs can end up being one of the largest contributors to amount of disk space used. That means higher I/O, larger backups (you are backing up your binary logs, right?), potentially more network traffic when replicas fetch the logs, and so on. In general, binary logs compress well, so it has been a long time wish for a feature that allowed you to compress the logs while MySQL are still using them. Starting from MySQL 8.0.20 that is now possible. I will take a look at the new feature in this post.

Configuration

The binary log compression feature is controlled by two variables, one for enabling the feature and another to specify the compression level. These are summarized in the below table.

Variable NameDefault
Value
Allowed
Values
binlog_transaction_compressionOFFOFF/ON
binlog_transaction_compression_level_zstd31-22

The two option names are fairly self explanatory. binlog_transaction_compression specifies whether compression is enabled or not, and binlog_transaction_compression_level_zstd specifies the compression level. The high level, the (in principle – see also the tests later in the blog) better compression at the cost of increased CPU.

Both options can be set dynamically both at the global and session scopes. However, it is not allowed to change the session values in the middle of a transaction. If you do that, you get an error like this:

mysql> SET SESSION binlog_transaction_compression = ON;
ERROR: 1766 (HY000): The system variable binlog_transaction_compression cannot be set when there is an ongoing transaction.

While it is easy to configure binary log compression, there are some limitations, you should be aware of.

Limitations

The short version of the limitations boils down to only transactional row events are compressed. Statement based events, GTID information, rotation events, row-based events for non-transactional tables, etc. are not compressed. Also if you have a transaction and add non-transactional changes, then neither will be compressed.

If you use all default values for your binary logs and use the InnoDB storage engine (the default), then compression will work. For the complete list of limitations, see Binary Log Transaction Compression in the manual.

Information

This also means that each transaction is compressed on its own. See the examples later in the blog for more on what that means.

As I usually preach, monitoring is key to understanding your system. What does the binary log compression feature support in terms of monitoring?

Monitoring

There are two ways to monitor the performance of the binary log compression feature. A Performance Schema table with compression statistics and new stage events.

The binary_log_transaction_compression_stats table in the Performance Schema includes statistics since the last restart of MySQL (or last time the table was truncated) for the compression. The table has two rows for the binary log, one for compressed events and one for events that are not compressed. Replicas will similarly have two rows for the relay log. An example of the content as well as the binary logs at the same time is:

mysql> SELECT * FROM binary_log_transaction_compression_stats\G
*************************** 1. row ***************************
                            LOG_TYPE: BINARY
                    COMPRESSION_TYPE: ZSTD
                 TRANSACTION_COUNTER: 15321
            COMPRESSED_BYTES_COUNTER: 102796461
          UNCOMPRESSED_BYTES_COUNTER: 252705572
              COMPRESSION_PERCENTAGE: 59
                FIRST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:30730
  FIRST_TRANSACTION_COMPRESSED_BYTES: 313
FIRST_TRANSACTION_UNCOMPRESSED_BYTES: 363
         FIRST_TRANSACTION_TIMESTAMP: 2020-05-07 19:26:37.744437
                 LAST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:46058
   LAST_TRANSACTION_COMPRESSED_BYTES: 712
 LAST_TRANSACTION_UNCOMPRESSED_BYTES: 1627
          LAST_TRANSACTION_TIMESTAMP: 2020-05-07 19:38:14.149782
*************************** 2. row ***************************
                            LOG_TYPE: BINARY
                    COMPRESSION_TYPE: NONE
                 TRANSACTION_COUNTER: 20
            COMPRESSED_BYTES_COUNTER: 5351
          UNCOMPRESSED_BYTES_COUNTER: 5351
              COMPRESSION_PERCENTAGE: 0
                FIRST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:30718
  FIRST_TRANSACTION_COMPRESSED_BYTES: 116
FIRST_TRANSACTION_UNCOMPRESSED_BYTES: 116
         FIRST_TRANSACTION_TIMESTAMP: 2020-05-07 19:26:37.508155
                 LAST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:31058
   LAST_TRANSACTION_COMPRESSED_BYTES: 116
 LAST_TRANSACTION_UNCOMPRESSED_BYTES: 116
          LAST_TRANSACTION_TIMESTAMP: 2020-05-07 19:30:30.840767
2 rows in set (0.0026 sec)

mysql> SHOW BINARY LOGS;
+---------------+-----------+-----------+
| Log_name      | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000142 |       240 | No        |
| binlog.000143 |      4933 | No        |
| binlog.000144 |  28238118 | No        |
| binlog.000145 |  24667167 | No        |
| binlog.000146 |  39221771 | No        |
| binlog.000147 |  11944631 | No        |
| binlog.000148 |       196 | No        |
+---------------+-----------+-----------+
7 rows in set (0.0005 sec)

The example is generated with GTIDs enabled, and all of the binary logs except binlog.000142 have been created after the last restart.

This shows that there has been 15341 transactions (74470a0c-8ea4-11ea-966e-080027effed8:30718-46058) of which 15321 have been compressed with an average of 59% compression (compressing 252,705,572 bytes to 102,796,461 bytes). There are also statistics for the first and most recent transaction that was compressed. Similarly there have been 20 “transactions” that could not be compressed.

One thing is to know the compression rate, you also need to know the price you pay in terms of the overhead of performing the compression and decompression. That can be monitored using two stage events in the Performance Schema:

  • stage/sql/Compressing transaction changes.
  • stage/sql/Decompressing transaction changes.

(Yes, the period is part of the name.)

Neither are enabled by default and for the instruments to be collection, you will also need to enable the events_stages_current consumer. For example to enable these in the configuration file:

[mysqld]
performance-schema-instrument = "stage/sql/%Compressing transaction changes.=ON"
performance-schema-consumer-events-stages-current = ON

Warning

There is some overhead by enabling instrumentation of stages. Make sure you test the impact first, if you consider enabling these on a production system.

You can compare that with the wait/io/file/sql/binlog event (enabled by default) which is the time spent doing I/O. For example:

mysql> SELECT EVENT_NAME, COUNT_STAR,
              FORMAT_PICO_TIME(SUM_TIMER_WAIT) AS total_latency,
              FORMAT_PICO_TIME(MIN_TIMER_WAIT) AS min_latency,
              FORMAT_PICO_TIME(AVG_TIMER_WAIT) AS avg_latency,
              FORMAT_PICO_TIME(MAX_TIMER_WAIT) AS max_latency
         FROM performance_schema.events_stages_summary_global_by_event_name
        WHERE EVENT_NAME LIKE 'stage/sql/%transaction changes.'\G
*************************** 1. row ***************************
   EVENT_NAME: stage/sql/Compressing transaction changes.
   COUNT_STAR: 15321
total_latency: 6.10 s
  min_latency: 22.00 ns
  avg_latency: 397.96 us
  max_latency: 982.12 ms
*************************** 2. row ***************************
   EVENT_NAME: stage/sql/Decompressing transaction changes.
   COUNT_STAR: 0
total_latency:   0 ps
  min_latency:   0 ps
  avg_latency:   0 ps
  max_latency:   0 ps
2 rows in set (0.0008 sec)

mysql> SELECT *
         FROM sys.io_global_by_wait_by_latency
        WHERE event_name = 'sql/binlog'\G
*************************** 1. row ***************************
   event_name: sql/binlog
        total: 27537
total_latency: 4.83 min
  avg_latency: 10.51 ms
  max_latency: 723.92 ms
 read_latency: 138.25 us
write_latency: 290.69 ms
 misc_latency: 4.82 min
   count_read: 37
   total_read: 1002 bytes
     avg_read:   27 bytes
  count_write: 16489
total_written: 99.26 MiB
  avg_written: 6.16 KiB
1 row in set (0.0015 sec)

In this case, I have used the sys.io_global_by_wait_by_latency view as a short cut as it automatically makes the latencies human readable. For the latencies of the compression/decompression stages, the FORMAT_PICO_TIME() function convert them.

In this example, MySQL has spent 6.21 seconds compressing binary logs averaging just under 400 microseconds per transaction. For comparison, 4.8 minutes have been spent doing I/O on the binary log files.

You should check the time spent on writing and reading the binary log files before enabling compression, so you can determine the change in performance. You should also check the change in CPU usage.

MySQL 8 Query Performance Tuning

Book

If you want to learn more about MySQL performance tuning, then I have written MySQL 8 Query Performance Tuning published by Apress.

The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.

In the above outputs, it showed the compression was 59%, but how is it for different kinds of workloads?

Examples – Workloads

In order to determine how well the compression works, I have performed a series of tasks and compared the size of the binary logs with and without compression. For comparison, I have also tried to manually compress the binary logs from the series of tests without compression to see the optimal compression (as opposed to the per-transaction compression used by MySQL). The tests have been performed with the default configuration except the settings necessary for the given test.

The following workloads have been performed:

  • Bulk load: Loading the employees sample database.
  • Bulk update: Updating the salary column of all rows in the employees.salaries table: UPDATE employees.salaries SET salary = salary + 1.
  • Bulk load: Populating four tables with 100,000 rows using sysbench as preparation for the oltp_read_write benchmark.
  • OLTP workload: Running the sysbench oltp_read_write benchmark with --events=15000 using the four tables from the previous test.
  • Single row deletes: Deleting all 100,000 rows in one of the tables from the sysbench test. The rows are deleted one by one which represents a worst case scenario for the compression as the transactions are very small and there is only the before image in the binary log for each deleted row.

The single row deletes can be performed easily using MySQL Shell, for example in the Python language mode:

from datetime import datetime

for i in range(100000):
    if i % 5000 == 0:
        print('{0}: i = {1}'.format(datetime.now(), i))
    session.run_sql('DELETE FROM sbtest.sbtest1 WHERE id = {0}'.format(i+1))
print('{0}: i = 1000000'.format(datetime.now()))

The size of each task generates uncompressed binary logs in the range of 25 MiB to 82 MiB.

The tests were performed with the following settings:

  • No compression
  • Compression enabled
  • Encrypted but no compressed
  • Encryption and compression enabled
  • No compression in MySQL + compressing with zstd

The compression with zstd was chosen as MySQL uses the Zstandard compression algorithm. If you want to try yourself, you can download the Zstandard source code from Facebook's GitHub repository which also includes the compilation instructions.

The resulting size of the binary log in bytes for each combination can be found in the following table.

TestNormalCompressEncryptEncrypt +
Compress
zstd
Load employees6637863928237933663791512823863426892387
Bulk Update8569832024667051856988322466767724779953
sysbench prepare7636774039221052763682523922180639775067
sysbench run261902001193371326190712119365618468625
Single Row Deletes4730015639492400473006683963768416525219

That the encrypted binary log compresses as well as the unencrypted suggest that the compression is done before the encryption. (Encrypted data does not compress well.) Because there is no difference whether encryption is enabled, only the normal (uncompressed), compressed, and zstd results will be discussed further. The binary log sizes can also be seen in this figure:

Binary log sizes for different compressions schemes.
Binary log sizes for different compressions schemes.

Unsurprising the bulk loads and update perform the best with compressed binary log being 29% to 51% of the uncompressed size. The sysbench OLTP workload also compresses well ending up being 46% of the size. Also unsurprising, the single row deletes do not compress nearly as well with the compressed binary log being 83% of the size of the uncompressed binary log.

When comparing the binary logs compressed by MySQL with those compressed manually with zstd, the file size is around the same for the bulk loads reflecting that for large transactions, compressing on a per-transaction basis works as well as compressing the whole file. As the transaction size becomes smaller, the relative efficiency of the per-transaction compression reduces which is particularly visible for the single row deletes.

The other factor to consider is the compression level.

Examples – Compression Level

There are some oddities when it comes to the compression level, and the short story is that there is no need to change the setting.

The first oddity is that the allowed values are 1-22 but zstd only supports levels 1-19. There is nothing in the MySQL documentation explaining the difference.

The second oddity is that there is effectively no change in the size of the compressed binary logs by changing the value of binlog_transaction_compression_level_zstd as can be seen from the table:

Level/TestMySQLzstd
LoadOLTPLoadOLTP
12823814211935450344832078531545
32823793311933713268923878468625
112823812811902669247371946639524
192823824611937664188671875724300
222823812511910022

For comparison, the compression at levels 1, 3, 11, and 19 are included compressing the binary log manually with zstd. “Load” is for loading the employees database and OLTP is the sysbench oltp_read_write benchmark. The data can also be seen in this figure:

The binary log size as a function of the compression level.
The binary log size as a function of the compression level.

As it can be seen, there is essentially no difference in the file size irrespective of the compression level used in MySQL whereas for zstd the file size reduces as expected with increased compression level. For level 1 with the load test MySQL even compresses significantly better than zstd. It is like the compression level is never being set in MySQL.

Information

One possible explanation is that Zstandard supports training the algorithm for a given type of data (creating a dictionary). This particularly helps improving compression of small data. I do not know whether MySQL uses a dictionary and if so whether that makes all compression levels work roughly equal.

Conclusion

The new binary log transaction compression works great and can be a great way to reduce the amount of I/O, disk usage, and network usage. Should you enable it?

Most likely you will benefit from enabling binary log compression unless you are very strapped for CPU. From these tests, it is likely the disk space occupied by binary logs can be roughly cut in half, but as always it is workload dependent and you should test with your workload.

Advice

You can also enable compression at the protocol level for the transmission of the binary logs. There is no reason to both enable binary log transaction compression and the protocol compression.

On the other hand, there is not reason at present to change the compression level.