MySQL Compressed Binary Logs

On a busy server, the binary logs can end up being one of the largest contributors to amount of disk space used. That means higher I/O, larger backups (you are backing up your binary logs, right?), potentially more network traffic when replicas fetch the logs, and so on. In general, binary logs compress well, so it has been a long time wish for a feature that allowed you to compress the logs while MySQL are still using them. Starting from MySQL 8.0.20 that is now possible. I will take a look at the new feature in this post.


The binary log compression feature is controlled by two variables, one for enabling the feature and another to specify the compression level. These are summarized in the below table.

Variable NameDefault

The two option names are fairly self explanatory. binlog_transaction_compression specifies whether compression is enabled or not, and binlog_transaction_compression_level_zstd specifies the compression level. The high level, the (in principle – see also the tests later in the blog) better compression at the cost of increased CPU.

Both options can be set dynamically both at the global and session scopes. However, it is not allowed to change the session values in the middle of a transaction. If you do that, you get an error like this:

mysql> SET SESSION binlog_transaction_compression = ON;
ERROR: 1766 (HY000): The system variable binlog_transaction_compression cannot be set when there is an ongoing transaction.

While it is easy to configure binary log compression, there are some limitations, you should be aware of.


The short version of the limitations boils down to only transactional row events are compressed. Statement based events, GTID information, rotation events, row-based events for non-transactional tables, etc. are not compressed. Also if you have a transaction and add non-transactional changes, then neither will be compressed.

If you use all default values for your binary logs and use the InnoDB storage engine (the default), then compression will work. For the complete list of limitations, see Binary Log Transaction Compression in the manual.


This also means that each transaction is compressed on its own. See the examples later in the blog for more on what that means.

As I usually preach, monitoring is key to understanding your system. What does the binary log compression feature support in terms of monitoring?


There are two ways to monitor the performance of the binary log compression feature. A Performance Schema table with compression statistics and new stage events.

The binary_log_transaction_compression_stats table in the Performance Schema includes statistics since the last restart of MySQL (or last time the table was truncated) for the compression. The table has two rows for the binary log, one for compressed events and one for events that are not compressed. Replicas will similarly have two rows for the relay log. An example of the content as well as the binary logs at the same time is:

mysql> SELECT * FROM binary_log_transaction_compression_stats\G
*************************** 1. row ***************************
                            LOG_TYPE: BINARY
                    COMPRESSION_TYPE: ZSTD
                 TRANSACTION_COUNTER: 15321
            COMPRESSED_BYTES_COUNTER: 102796461
                FIRST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:30730
         FIRST_TRANSACTION_TIMESTAMP: 2020-05-07 19:26:37.744437
                 LAST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:46058
          LAST_TRANSACTION_TIMESTAMP: 2020-05-07 19:38:14.149782
*************************** 2. row ***************************
                            LOG_TYPE: BINARY
                    COMPRESSION_TYPE: NONE
                 TRANSACTION_COUNTER: 20
                FIRST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:30718
         FIRST_TRANSACTION_TIMESTAMP: 2020-05-07 19:26:37.508155
                 LAST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:31058
          LAST_TRANSACTION_TIMESTAMP: 2020-05-07 19:30:30.840767
2 rows in set (0.0026 sec)

| Log_name      | File_size | Encrypted |
| binlog.000142 |       240 | No        |
| binlog.000143 |      4933 | No        |
| binlog.000144 |  28238118 | No        |
| binlog.000145 |  24667167 | No        |
| binlog.000146 |  39221771 | No        |
| binlog.000147 |  11944631 | No        |
| binlog.000148 |       196 | No        |
7 rows in set (0.0005 sec)

The example is generated with GTIDs enabled, and all of the binary logs except binlog.000142 have been created after the last restart.

This shows that there has been 15341 transactions (74470a0c-8ea4-11ea-966e-080027effed8:30718-46058) of which 15321 have been compressed with an average of 59% compression (compressing 252,705,572 bytes to 102,796,461 bytes). There are also statistics for the first and most recent transaction that was compressed. Similarly there have been 20 “transactions” that could not be compressed.

One thing is to know the compression rate, you also need to know the price you pay in terms of the overhead of performing the compression and decompression. That can be monitored using two stage events in the Performance Schema:

  • stage/sql/Compressing transaction changes.
  • stage/sql/Decompressing transaction changes.

(Yes, the period is part of the name.)

Neither are enabled by default and for the instruments to be collection, you will also need to enable the events_stages_current consumer. For example to enable these in the configuration file:

performance-schema-instrument = "stage/sql/%Compressing transaction changes.=ON"
performance-schema-consumer-events-stages-current = ON


There is some overhead by enabling instrumentation of stages. Make sure you test the impact first, if you consider enabling these on a production system.

You can compare that with the wait/io/file/sql/binlog event (enabled by default) which is the time spent doing I/O. For example:

              FORMAT_PICO_TIME(SUM_TIMER_WAIT) AS total_latency,
              FORMAT_PICO_TIME(MIN_TIMER_WAIT) AS min_latency,
              FORMAT_PICO_TIME(AVG_TIMER_WAIT) AS avg_latency,
              FORMAT_PICO_TIME(MAX_TIMER_WAIT) AS max_latency
         FROM performance_schema.events_stages_summary_global_by_event_name
        WHERE EVENT_NAME LIKE 'stage/sql/%transaction changes.'\G
*************************** 1. row ***************************
   EVENT_NAME: stage/sql/Compressing transaction changes.
   COUNT_STAR: 15321
total_latency: 6.10 s
  min_latency: 22.00 ns
  avg_latency: 397.96 us
  max_latency: 982.12 ms
*************************** 2. row ***************************
   EVENT_NAME: stage/sql/Decompressing transaction changes.
total_latency:   0 ps
  min_latency:   0 ps
  avg_latency:   0 ps
  max_latency:   0 ps
2 rows in set (0.0008 sec)

mysql> SELECT *
         FROM sys.io_global_by_wait_by_latency
        WHERE event_name = 'sql/binlog'\G
*************************** 1. row ***************************
   event_name: sql/binlog
        total: 27537
total_latency: 4.83 min
  avg_latency: 10.51 ms
  max_latency: 723.92 ms
 read_latency: 138.25 us
write_latency: 290.69 ms
 misc_latency: 4.82 min
   count_read: 37
   total_read: 1002 bytes
     avg_read:   27 bytes
  count_write: 16489
total_written: 99.26 MiB
  avg_written: 6.16 KiB
1 row in set (0.0015 sec)

In this case, I have used the sys.io_global_by_wait_by_latency view as a short cut as it automatically makes the latencies human readable. For the latencies of the compression/decompression stages, the FORMAT_PICO_TIME() function convert them.

In this example, MySQL has spent 6.21 seconds compressing binary logs averaging just under 400 microseconds per transaction. For comparison, 4.8 minutes have been spent doing I/O on the binary log files.

You should check the time spent on writing and reading the binary log files before enabling compression, so you can determine the change in performance. You should also check the change in CPU usage.

MySQL 8 Query Performance Tuning


If you want to learn more about MySQL performance tuning, then I have written MySQL 8 Query Performance Tuning published by Apress.

The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.

In the above outputs, it showed the compression was 59%, but how is it for different kinds of workloads?

Examples – Workloads

In order to determine how well the compression works, I have performed a series of tasks and compared the size of the binary logs with and without compression. For comparison, I have also tried to manually compress the binary logs from the series of tests without compression to see the optimal compression (as opposed to the per-transaction compression used by MySQL). The tests have been performed with the default configuration except the settings necessary for the given test.

The following workloads have been performed:

  • Bulk load: Loading the employees sample database.
  • Bulk update: Updating the salary column of all rows in the employees.salaries table: UPDATE employees.salaries SET salary = salary + 1.
  • Bulk load: Populating four tables with 100,000 rows using sysbench as preparation for the oltp_read_write benchmark.
  • OLTP workload: Running the sysbench oltp_read_write benchmark with --events=15000 using the four tables from the previous test.
  • Single row deletes: Deleting all 100,000 rows in one of the tables from the sysbench test. The rows are deleted one by one which represents a worst case scenario for the compression as the transactions are very small and there is only the before image in the binary log for each deleted row.

The single row deletes can be performed easily using MySQL Shell, for example in the Python language mode:

from datetime import datetime

for i in range(100000):
    if i % 5000 == 0:
        print('{0}: i = {1}'.format(, i))
    session.run_sql('DELETE FROM sbtest.sbtest1 WHERE id = {0}'.format(i+1))
print('{0}: i = 1000000'.format(

The size of each task generates uncompressed binary logs in the range of 25 MiB to 82 MiB.

The tests were performed with the following settings:

  • No compression
  • Compression enabled
  • Encrypted but no compressed
  • Encryption and compression enabled
  • No compression in MySQL + compressing with zstd

The compression with zstd was chosen as MySQL uses the Zstandard compression algorithm. If you want to try yourself, you can download the Zstandard source code from Facebook's GitHub repository which also includes the compilation instructions.

The resulting size of the binary log in bytes for each combination can be found in the following table.

TestNormalCompressEncryptEncrypt +
Load employees6637863928237933663791512823863426892387
Bulk Update8569832024667051856988322466767724779953
sysbench prepare7636774039221052763682523922180639775067
sysbench run261902001193371326190712119365618468625
Single Row Deletes4730015639492400473006683963768416525219

That the encrypted binary log compresses as well as the unencrypted suggest that the compression is done before the encryption. (Encrypted data does not compress well.) Because there is no difference whether encryption is enabled, only the normal (uncompressed), compressed, and zstd results will be discussed further. The binary log sizes can also be seen in this figure:

Binary log sizes for different compressions schemes.
Binary log sizes for different compressions schemes.

Unsurprising the bulk loads and update perform the best with compressed binary log being 29% to 51% of the uncompressed size. The sysbench OLTP workload also compresses well ending up being 46% of the size. Also unsurprising, the single row deletes do not compress nearly as well with the compressed binary log being 83% of the size of the uncompressed binary log.

When comparing the binary logs compressed by MySQL with those compressed manually with zstd, the file size is around the same for the bulk loads reflecting that for large transactions, compressing on a per-transaction basis works as well as compressing the whole file. As the transaction size becomes smaller, the relative efficiency of the per-transaction compression reduces which is particularly visible for the single row deletes.

The other factor to consider is the compression level.

Examples – Compression Level

There are some oddities when it comes to the compression level, and the short story is that there is no need to change the setting.

The first oddity is that the allowed values are 1-22 but zstd only supports levels 1-19. There is nothing in the MySQL documentation explaining the difference.

The second oddity is that there is effectively no change in the size of the compressed binary logs by changing the value of binlog_transaction_compression_level_zstd as can be seen from the table:


For comparison, the compression at levels 1, 3, 11, and 19 are included compressing the binary log manually with zstd. “Load” is for loading the employees database and OLTP is the sysbench oltp_read_write benchmark. The data can also be seen in this figure:

The binary log size as a function of the compression level.
The binary log size as a function of the compression level.

As it can be seen, there is essentially no difference in the file size irrespective of the compression level used in MySQL whereas for zstd the file size reduces as expected with increased compression level. For level 1 with the load test MySQL even compresses significantly better than zstd. It is like the compression level is never being set in MySQL.


One possible explanation is that Zstandard supports training the algorithm for a given type of data (creating a dictionary). This particularly helps improving compression of small data. I do not know whether MySQL uses a dictionary and if so whether that makes all compression levels work roughly equal.


The new binary log transaction compression works great and can be a great way to reduce the amount of I/O, disk usage, and network usage. Should you enable it?

Most likely you will benefit from enabling binary log compression unless you are very strapped for CPU. From these tests, it is likely the disk space occupied by binary logs can be roughly cut in half, but as always it is workload dependent and you should test with your workload.


You can also enable compression at the protocol level for the transmission of the binary logs. There is no reason to both enable binary log transaction compression and the protocol compression.

On the other hand, there is not reason at present to change the compression level.

Apress Blog: MySQL Performance Tuning Best Practices

To celebrate the publishing of my new book MySQL 8 Query Performance Tuning, the Apress team invited me (thanks Jonathan and Liz) to write a post for the Apress blog. I decided to write about my top six best practices:

  • Be wary of best practices
  • Monitor
  • Work methodically
  • Consider the full stack
  • Make small, incremental changes
  • Understand the change

Yes, my first best practice is to be wary of best practices. Read why I added that and the other best practices at MySQL Performance Tuning Best Practices.

MySQL 8.0.20: Index-Level Optimizer Hints

MySQL introduced optimizer hints in version 5.7 and greatly extended the feature in MySQL 8. One thing that has been missing though is the ability to specify index hints using the syntax of optimizer hints. This has been improved of in MySQL 8.0.20 with the introduction of index-level optimizer hints for the FORCE and IGNORE versions of the index hints. This blog will look at the new index hint syntax.

Example of using index-level optimizer hints.


Do not add index hints – neither using the old or new style – unless you really need them. When you add index hints, you limit the options of the optimizer which can prevent the optimizer obtaining the optimal query plan as new optimizer improvements are implemented or the data changes.

On the other hand, if you really have a query where ANALYZE TABLE and increasing the number of pages analyzed in the random index dives do not help you, index hints can be very useful to ensure optimal performance.

The Short Story

To make a long story short, consider this query in 8.0.19 and earlier:

SELECT ci.CountryCode, co.Name AS Country, ci.Name AS City, ci.District
  FROM co IGNORE INDEX (Primary)
                  ON ci.CountryCode = co.Code
 WHERE co.Continent = 'Asia'
 ORDER BY ci.CountryCode, ci.ID;

This query has two index hints, IGNORE INDEX in the second line and USE INDEX FOR ORDER BY in the third line.

In MySQL 8.0.20, you can write the query as:

       ci.CountryCode, co.Name AS Country, ci.Name AS City, ci.District
  FROM co
       INNER JOIN ci
                  ON ci.CountryCode = co.Code
 WHERE co.Continent = 'Asia'
 ORDER BY ci.CountryCode, ci.ID;


However, note that there seems to be a bug, so the ORDER_INDEX() hint makes the optimizer choose a plan like NO_JOIN_INDEX() for the same index is also specified. The workaround is to also add the JOIN_INDEX().

Let's take a look which index-level optimizer hints that have been added and how they map to the old index hints.

The Details

There are four pairs of new index hints which all maps back to the old style hints as in the below table.

New HintOld Hint

The new hints support all of the usual features of optimizer hints such as specifying the query block for a hint, adding them inline in subqueries, etc. For the full details, see the manual and the release notes (which has an extensive description of the new hints).

If you need to specify multiple index hints, there are two ways to accomplish it depending on whether the indexes are on the same table or not. Consider a query on the table where you will not allow neither the primary key nor the CountryCode index to be used. You can accomplish that as in this example:

SELECT /*+ NO_INDEX(ci PRIMARY, CountryCode) */
       ID, CountryCode, Name, District, Population
  FROM ci
 WHERE Population > 1000000;

On the other hand, if the indexes are on different tables, then you will have to specify multiple hints. Let's say you want to force the optimizer to choose the hash join algorithm when joining the country and city tables in the world database by ignoring the primary key on the country table and the CountryCode index on the city table (effectively forcing the join not to use an index irrespective of the join order). In this case, you can use the NO_INDEX() hint twice, once on each table:

SELECT /*+ NO_INDEX(co PRIMARY) NO_INDEX(ci CountryCode) */
       ci.CountryCode, co.Name AS Country, ci.Name AS City, ci.District
  FROM co
       INNER JOIN ci ON ci.CountryCode = co.Code
 WHERE co.Continent = 'Asia';

How About USE INDEX?

The old index hint syntax also includes the USE INDEX variant, but none of the new index-level optimizer hints corresponds to it. So, what do you do if you want to specify the softer USE INDEX rather than FORCE INDEX? First of all, in that case there is a good chance, you do not need the index hint at all, so try to remove it and verify whether the optimizer uses the same query plan. If so, it is better to remove it, so the optimizer can use the optimal join strategy as new optimizer features become available or the data changes.

If you really need the USE INDEX hint, the simplest is to continue to use the old syntax. However, be aware that you cannot mix the old and new syntax hints (in that case, the old hints are ignored).

Alternative, you can simulate USE INDEX by specifying all other applicable indexes in a NO_INDEX() optimizer hint. For example, consider the world.countrylanguage table:

mysql> SHOW CREATE TABLE countrylanguage\G
*************************** 1. row ***************************
       Table: countrylanguage
Create Table: CREATE TABLE `countrylanguage` (
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `Language` char(30) NOT NULL DEFAULT '',
  `IsOfficial` enum('T','F') NOT NULL DEFAULT 'F',
  `Percentage` float(4,1) NOT NULL DEFAULT '0.0',
  PRIMARY KEY (`CountryCode`,`Language`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`)
1 row in set (0.0008 sec)

If you want to find all languages spoken in Australia and for some reason want to either use the CountryCode index or no index at all, then you can tell the optimizer to ignore the primary key:

       CountryCode, Language, IsOfficial, Percentage
  FROM world.countrylanguage cl
 WHERE CountryCOde = 'AUS';

That is equivalent of the old syntax hints with USE INDEX (CountryCode):

SELECT CountryCode, Language, IsOfficial, Percentage
  FROM world.countrylanguage cl USE INDEX (CountryCode)
 WHERE CountryCOde = 'AUS';

While this works the same and is simple enough with just one alternative index, it is not so easy in the general case. To be sure your NO_INDEX() optimizer hint is the same as the USE INDEX index hint, you will need to fetch the list of indexes and add them to the NO_INDEX() hint each time you execute the query. That is the reason for the suggestion to keep using the old style index hint if you rely on USE INDEX.

New Book: MySQL 8 Query Performance Tuning

I have over the last few years been fortunate to have two books published through Apress, Pro MySQL NDB Cluster which I wrote together with Mikiya Okuno and MySQL Connector/Python Revealed. With the release of MySQL 8 around a year ago, I started to think of how many changes there has been in the last few MySQL versions. Since MySQL 5.6 was released as GA in early 2013, some of the major features related to performance tuning includes the Performance Schema which was greatly changed in 5.6, histograms, EXPLAIN ANALYZE, hash joins, and visual explain. Some of these are even unique in MySQL 8.

So, I was thinking that it could be interesting to write a book that focuses on performance tuning in MySQL 8. In order to try to limit the scope somewhat (which as you can see from the page count that I was not too successful with), I decided to mainly look at the topics related to query performance. I proposed this to my acquisition editor Jonathan Gennick, and he was very interested. Oracle whom I worked for at the time was also interested (thanks Adam Dixon, Victoria Reznichenko, Edwin DeSouza, and Rich Mason for supporting me and approving the project). Also thanks to the Apress editors and staff who has been involved including but not limited to Jonathan Gennick, Jill Balzano, Laura Berendson, and Arockia Rajan Dhurai.

Now around a year later, the final result is ready: MySQL 8 Query Performance Tuning. If you are interested, you can read more about the content and/or buy it at Apress, Amazon, and others book shops:

ShopPaper BookE-Book
ApressSoftcoverDRM free ePub/PDF
Amazon USSoftcoverMobi (Kindle)

Book Structure

The book is divided into six parts with a total of 27 chapters. I have attempted to keep each chapter relatively self-contained with the aim that you can use the book as a reference book. The drawback of this choice is that there is some duplication of information from time to time. An example is Chapter 18 which describes the more theoretical side of locks and how to monitor locks, and Chapter 22 which provides practical examples of investigating lock contention. Chapter 22 naturally draws on the information in Chapter 18, so some of the information is repeated. This was a deliberate choice, and I hope it helps you reduce the amount of page flipping to find the information you need.

The six parts progressively move you through the topics starting with some basic background and finishing with more solution-oriented tasks. The first part starts out discussing the methodology, benchmarks, and test data. The second part focuses on the sources of information such as the Performance Schema. The third part covers the tools such as MySQL Shell used in this book. The fourth part provides the theoretical background used in the last two parts. The fifth part focuses on analyzing queries, transactions, and locks. Finally, the sixth part discusses how to improve performance through the configuration, query optimization, replication, and caching. There are cases where some content is a little out of place, like all replication information is contained in a single chapter.


Part I: Getting Started

Part I introduces you to the concepts of MySQL query performance tuning. This includes some high-level considerations, of which some are not unique to MySQL (but are of course discussed in the context of MySQL). The four chapters are

  • Chapter 1 – MySQL Performance Tuning
    This introductory chapter covers some high-level concepts of MySQL performance tuning such as the importance of considering the whole stack and the lifecycle of a query.
  • Chapter 2 – Query Tuning Methodology
    It is important to work in an effective way to solve performance problems. This chapter introduces a methodology to work effectively and emphasizes the importance of working proactively rather than doing firefighting.
  • Chapter 3 – Benchmarking with Sysbench
    It is often necessary to use benchmarks to determine the effect of a change. This chapter introduces benchmarking in general and specifically discusses the Sysbench tool including how to create your own custom benchmarks.
  • Chapter 4 – Test Data
    The book mostly uses a few standard test databases which are introduced in this chapter.

Part II: Sources of Information

MySQL exposes information about the performance through a few sources. The Performance Schema, the sys schema, the Information Schema, and the SHOW statement are introduced in each their chapter. There are only relatively few examples of using these sources in this part; however, these four sources of information are used extensively in the remainder of the book. If you are not already familiar with them, you are strongly encouraged to read this part. Additionally, the slow query log is covered. The five chapters are

  • Chapter 5 – The Performance Schema
    The main source of performance related information in MySQL is – as the name suggests – the Performance Schema. This chapter introduces the terminology, the main concepts, the organization, and the configuration.
  • Chapter 6 – The sys Schema
    The sys schema provides reports through predefined views and utilities in stored functions and programs. This chapter provides an overview of what features are available.
  • Chapter 7 – The Information Schema
    If you need metadata about the MySQL and the databases, the Information Schema is the place to look. It also includes important information for performance tuning such as information about indexes, index statistics, and histograms. This chapter provides an overview of the views available in the Information Schema.
  • Chapter 8 – SHOW Statements
    The SHOW statements are the oldest way to obtain information ranging from which queries are executing to schema information. This chapter relates the SHOW statements to the Information Schema and Performance Schema and covers in somewhat more detail the SHOW statements without counterparts in the two schemas.
  • Chapter 9 – The Slow Query Log
    The traditional way to find slow queries is to log them to the slow query log. This chapter covers how to configure the slow query log, how to read the log events, and how to aggregate the events with the mysqldump utility.

Part III: Tools

MySQL provides several tools that are useful when performing the daily work as well as specialized tasks. This part covers three tools ranging from monitoring to simple query execution. This book uses Oracle’s dedicated MySQL monitoring solution (requires commercial subscription but is also available as a trial) as an example of monitoring. Even if you are using other monitoring solutions, you are encouraged to study the examples as there will be a large overlap. These three tools are also used extensively in the remainder of the book. The three chapters in this part are

  • Chapter 10 – MySQL Enterprise Monitor
    Monitoring is one of the most important aspects of maintaining a stable and well-performing database. This chapter introduces MySQL Enterprise Monitor (MEM) and shows how you can install the trial and helps you navigate and use the graphical user interface (GUI).
  • Chapter 11 – MySQL Workbench
    MySQL provides a graphical user interface through the MySQL Workbench product. This chapter shows how you can install and use it. In this book, MySQL Workbench is particularly important for its ability to create diagrams – known as Visual Explain – representing the query execution plans.
  • Chapter 12 – MySQL Shell
    One of the newest tools around from Oracle for MySQL is MySQL Shell which is a second-generation commandline client with support for executing code in both SQL, Python, and JavaScript. This chapter gets you up to speed with MySQL Shell and teaches you about its support for using external code modules, its reporting infrastructure, and how to create custom modules, reports, and plugins.

Part IV: Schema Considerations and the Query Optimizer

In Part IV, there is a change of pace, and the focus moves to the topics more directly related to performance tuning starting with topics related to the schema, the query optimizer, and locks. The six chapters are

  • Chapter 13 – Data Types
    In relational databases, each column has a data type. This data type defines which values can be stored, which rules apply when comparing two values, how the data is stored, and more. This chapter covers the data types available in MySQL and gives guidance on how to decide which data types to use.
  • Chapter 14 – Indexes
    An index is used to locate data, and a good indexing strategy can greatly improve the performance of your queries. This chapter covers the index concepts, considerations about indexes, index types, index features, and more. It also includes a discussion on how InnoDB uses indexes and how to come up with an indexing strategy.
  • Chapter 15 – Index Statistics
    When the optimizer needs to determine how useful an index is and how many rows match a condition on an indexed value, it needs information on the data in the index. This information is index statistics. This chapter covers how index statistics work in MySQL, how to configure them, monitoring, and updating the index statistics.
  • Chapter 16 – Histograms
    If you want the optimizer to know how frequent a value occurs for a given column, you need to create a histogram. This is a new feature in MySQL 8, and this chapter covers how histograms can be used, their internals, and how to query the histogram metadata and statistics.
  • Chapter 17 – The Query Optimizer
    When you execute a query, it is the query optimizer that determines how to execute it. This chapter covers the tasks performed by the optimizer, join algorithms, join optimizations, configuration of the optimizer, and resource groups.
  • Chapter 18 – Locking Theory and Monitoring
    One of the problems that can cause the most frustration is lock contention. The first part of this chapter explains why locks are needed, lock access levels, and lock types (granularities). The second part of the chapter goes into what happens when a lock cannot be obtained, how to reduce lock contention, and where to find information about locks.

Part V: Query Analysis

With the information from Part IV, you are now ready to start analyzing queries. This includes finding the queries for further analysis and then analyzing the query using EXPLAIN or the Performance Schema. You also need to consider how transactions work and investigate lock contention when you have two or more queries fighting for the same locks. The four chapters are

  • Chapter 19 – Finding Candidate Queries for Optimization
    Whether part of the daily maintenance or during an emergency, you need to find the queries that you need to analyze and potentially optimize. This chapter shows how you can use the Performance Schema, the sys schema, MySQL Workbench, your monitoring solution, and the slow query log to find the queries that are worth looking into.
  • Chapter 20 – Analyzing Queries
    Once you have a candidate query, you need to analyze why it is slow or impacts the system too much. The main tool is the EXPLAIN statement which provides information about the query plan chosen by the optimizer. How to generate and read – including examples – the query plans using EXPLAIN is the main focus of the chapter. You can also use the optimizer trace to get more information on how the optimizer arrived at the selected query plan. An alternative way to analyze queries is to use the Performance Schema and sys schema to break queries down into smaller parts.
  • Chapter 21 – Transactions
    InnoDB executes everything as a transaction, and transactions is an important concept. Proper use of transactions ensures atomicity, consistency, and isolation. However, transactions can also be the cause of severe performance and lock problems. This chapter discusses how transactions can become a problem and how to analyze them.
  • Chapter 22 – Diagnosing Lock Contention
    This chapter goes through four scenarios with lock contention (flush locks, metadata locks, record-level locks, and deadlocks) and discusses the symptoms, the cause, how to set up the scenario, the investigation, the solution, and how to prevent problems.

Part VI: Improving Queries

You have found your problem queries and analyzed them and their transaction to understand why they are underperforming. But how do you improve the queries? This chapter goes through the most important configuration options not covered elsewhere, how to change the query plan, schema changes and bulk loading, replication, and caching as means to improve the performance. The five chapters are

  • Chapter 23 – Configuration
    MySQL requires resources when executing a query. This chapter covers the best practices for configuring these resources and the most important configuration options that are not covered in other discussions. There is also an overview of the data lifecycle in InnoDB as background for the discussion of configuring InnoDB.
  • Chapter 24 – Change the Query Plan
    While the optimizer usually does a good job at finding the optimal query execution plan, you will from time to time have to help it on its way. It may be that you end up with full table scans because no indexes exist or the existing indexes cannot be used. You may also wish to improve the index usage, or you may need to rewrite complex conditions or entire queries. This chapter covers these scenarios as well as shows how you can use the SKIP LOCKED clause to implement a queue system.
  • Chapter 25 – DDL and Bulk Data Load
    When you perform schema changes or load large data sets into the system, you ask MySQL to perform a large amount of work. This chapter discusses how you can improve the performance of such tasks including using the parallel data load feature of MySQL Shell. There is also a section on general data load considerations which also applies to data modifications in general and shows the difference between sequential and random order inserts. That discussion is followed by considerations on what this means for the choice of primary key.
  • Chapter 26 – Replication
    The ability to replicate between instances is a popular feature in MySQL. From a performance point of view, replication has two sides: you need to ensure replication performs well, and you can use replication to improve performance. This chapter discusses both sides of the coin including covering the Performance Schema tables that can be used to monitor replication.
  • Chapter 27 – Caching
    One way to improve the performance of queries is to not execute them at all, or at least avoid executing part of the query. This chapter discusses how you can use caching tables to reduce the complexity of queries and how you can use Memcached, the MySQL InnoDB Memcached plugin, and ProxySQL to avoid executing the queries altogether.

I hope you will enjoy the book.

Connector/Python C Extension Prepared Statement Support

MySQL Connector/Python 8 made the C Extension the default for the platform/Python version combinations supporting it. One thing that was missing from the C Extension implementation (unless you used the _mysql_connector module) was support for prepared statements. That has been taken care of with the release of version 8.0.17.

The two main advantages of using prepared statements are security and performance. The security comes in as you can pass query parameters and have them applied server-side, so you are sure they are quoted and escaped correctly taking the data type into consideration. The performance benefit happens, when you execute the same query (except for the parameters) several times as MySQL will prepare it only for the first execution and then reuse the prepared statement – that is where the name comes from.

You use the prepared statements with the C Extension in the same way as for the pure Python implementation – by setting the prepared argument to True when creating a cursor. The simplest way to explain is to show an example.

import mysql.connector

connect_args = {
    "user": "root",
    "host": "localhost",
    "port": 3306,
    "password": "password",
    "use_pure": False,

db = mysql.connector.connect(**connect_args)
cursor = db.cursor(prepared=True)

sql = "SELECT * FROM WHERE ID = %s"
city_ids = [130, 456, 3805]

print("  ID  Name            Country  District         Popluation")
print("-" * 58)
fmt = "{0:4d}  {1:14s}  {2:^7s}  {3:15s}  {4:10d}"
for city_id in city_ids:
    cursor.execute(sql, (city_id,))
    city = cursor.fetchone()


In the connection arguments, use_pure is set to False. Since that is the default, it is not needed, but it has been added here to make it explicit that the C Extension is used.


Do not hardcode the connection arguments in your programs. It is done here to keep the example simple, but it is both insecure and inflexible to do in real programs.

When the cursor is created in line 12, prepared is set to True making it a prepared statement cursor. To verify that, the cursor is printed in the next line.

You create the statement by adding the string %s as a placeholder where you want to add the parameters to the query. You can then keep executing the query. In the example, the query is executed for three different IDs. (Yes, for this example, all three cities could have been fetched in one query, but imagine this query is used as part of a larger application where the three cities are not required at the same time. This could for example be for three independent user requests.) The parameter is provided as a tuple to the execute() method of the cursor. The output of the program is:

CMySQLCursorPrepared: (Nothing executed yet)

  ID  Name            Country  District         Popluation
 130  Sydney            AUS    New South Wales     3276207
 456  London            GBR    England             7285000
3805  San Francisco     USA    California           776733

Notice that the cursor uses the class CMySQLCursorPrepared, which is the prepared statement cursor class for the C Extension.

MySQL Connector/Python Revealed


If you want to learn more about MySQL Connector/Python, then I have written MySQL Connector/Python Revealed published by Apress. The book both covers the traditional Python Database API (PEP 249) and the X DevAPI which is new as of MySQL 8.

The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.

Have fun coding.

More Statistics for Slow Queries: log_slow_extra

The slow query log is the trusted old method of recording slow query, so the database administrator can determine which queries are in the most need for optimization. Since MySQL 5.6, it has to some extend been overshadowed by the Performance Schema which has lower overhead and thus allows collecting statistics about all queries. The slow query log has one major advantage though: the data is persisted. In MySQL 8.0.14 which was recently released, there is an improvement for the slow query log: additional statistics about the recorded queries.

The slow query log with log_slow_extra enabled.
The slow query log with log_slow_extra enabled.


Thanks for Facebook for contributing a patch for the new feature.

The additional information is not recorded by default. To enable the feature, enable the log_slow_extra option:

mysql> SET PERSIST log_slow_extra = ON;
Query OK, 0 rows affected (0.05 sec)

Here, SET PERSIST is used, so the configuration change is persisted when MySQL is restarted. If you just want to try the feature, you can use SET GLOBAL, then decide later whether you want to keep it enabled.

That is all that is required. You can now execute a “slow” query and take a look at the recorded data. An easy way to execute a slow query is to execute DO SLEEP(...) where you can replace ... with the number of seconds you want to sleep. However that is not a very interesting query. Another option is to lower long_query_time to ensure your query is captured. If you set long_query_time to zero, all queries are recorded:

mysql> SET SESSION long_query_time = 0.0;
Query OK, 0 rows affected (0.01 sec)

Here SET SESSION is used so only the current connection is affected.

Be Aware

The slow query log does have overhead. It is for most systems not recommended to set long_query_time to zero for all connections all the time. Doing so can cause performance problems.

Finally, execute a query:

mysql> SELECT * FROM ORDER BY Population DESC LIMIT 10;
| ID   | Name             | CountryCode | District         | Population |
| 1024 | Mumbai (Bombay)  | IND         | Maharashtra      |   10500000 |
| 2331 | Seoul            | KOR         | Seoul            |    9981619 |
|  206 | São Paulo        | BRA         | São Paulo        |    9968485 |
| 1890 | Shanghai         | CHN         | Shanghai         |    9696300 |
|  939 | Jakarta          | IDN         | Jakarta Raya     |    9604900 |
| 2822 | Karachi          | PAK         | Sindh            |    9269265 |
| 3357 | Istanbul         | TUR         | Istanbul         |    8787958 |
| 2515 | Ciudad de México | MEX         | Distrito Federal |    8591309 |
| 3580 | Moscow           | RUS         | Moscow (City)    |    8389200 |
| 3793 | New York         | USA         | New York         |    8008278 |
10 rows in set (0.04 sec)

The resulting slow query log record is:

# Time: 2019-01-31T07:24:20.518505Z
# User@Host: root[root] @ localhost [::1]  Id:    15
# Query_time: 0.001827  Lock_time: 0.000087 Rows_sent: 10  Rows_examined: 4089 Thread_id: 15 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 694 Read_first: 1 Read_last: 0 Read_key: 1 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 4080 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 10 Sort_scan_count: 1 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2019-01-31T07:24:20.516678Z End: 2019-01-31T07:24:20.518505Z
SET timestamp=1548919460;

The extra data can be a little hard to see due to the formatting of the blog – the extra fields are:

  • Thread_id: 15
  • Errno: 0
  • Killed: 0
  • Bytes_received: 0
  • Bytes_sent: 694
  • Read_first: 1
  • Read_last: 0
  • Read_key: 1
  • Read_next: 0
  • Read_prev: 0
  • Read_rnd: 0
  • Read_rnd_next: 4080
  • Sort_merge_passes: 0
  • Sort_range_count: 0
  • Sort_rows: 10
  • Sort_scan_count: 1
  • Created_tmp_disk_tables: 0
  • Created_tmp_tables: 0
  • Start: 2019-01-31T07:24:20.516678Z
  • End: 2019-01-31T07:24:20.518505Z

As comparison, here is the information for the same query with log_slow_extra = OFF:

# Time: 2019-01-31T07:24:06.100447Z
# User@Host: root[root] @ localhost [::1]  Id:    15
# Query_time: 0.002286  Lock_time: 0.000133 Rows_sent: 10  Rows_examined: 4089
SET timestamp=1548919446;


Install MySQL Enteprise Monitor (MEM) 3.0 Using Limited Resources

MySQL Enterprise Monitor (MEM) is the monitoring solution offered as part of MySQL Enterprise Edition and MySQL Cluster Carrier Grade Edition. In this blog, I will not go into details of the features of MEM, but rather focus on the case where you want to install MEM 3.0 to try it out on your personal desktop or laptop.

A trial version (the full product, but the can only be used for 30 days) is available from Oracle's Software Delivery Cloud. If you are a MySQL customer, it is recommended that you download MEM from My Oracle Support (MOS).

Once you have downloaded and extracted the installation binaries, you can start the installation. You have the choice between using a GUI, text based, and unattended install. Here I will use the GUI install, but if you want to try one of the other install options, launch the installer with the –help option to get more information.

MEM 3.0 consists of two parts:

  • The Service Manager
  • Agents

Only the Service Manager is required, so that will be the first thing to install. As this is intended to show how you can test MEM, I will use a normal user rather than root for the installation.

It is also recommended to take a look at the MEM 3.0 Reference Manual.

Installation Wizard

When launching the installer the first screen asks which language to use – you have the choice of English and Japanese:

Installing the MEM 3.0 Service Manager - Step 1: Choose language
Step 1: Choose language

Next is an information screen that you should ensure you keep track of the usernames and passwords entered during the installation process:

Installing the MEM 3.0 Service Manager - Step 2: Remember the usernames and passwords entered during the installation process
Step 2: Remember the usernames and passwords entered during the installation process

The actual install process is now ready to start:

Installing the MEM 3.0 Service Manager - Step 3: Ready to start the actual install process
Step 3: Ready to start the actual install process

The following steps are to configure the installation – the first of which is to decide where to install the Service Manager:

Installing the MEM 3.0 Service Manager - Step 4: Choose the installation directory
Step 4: Choose the installation directory

The Service Manager will need three ports:

  • Tomcat Server Port: For the web UI when using non-SSL connections
  • Tomcat Shutdown Port: Used internally to shutdown the web server
  • Tomcat SSL Port: For the web UI when using SSL connections
Installing the MEM 3.0 Service Manager - Step 5: Choose the port numbers
Step 5: Choose the port numbers

The Service Manager uses a MySQL database to store the data collected. The next screen allows you to choose between using the MySQL database bundled with MEM or an existing one. Using the bundled instance has the advantage that MEM will configure it automatically and upgrades can be easier, however it will mean running two MySQL instances if you already have MySQL installed. For a test instance using the bundled instance also has the advantage that it's easy to uninstall the whole installation again, so we will use the bundled instance in this example:

Installing the MEM 3.0 Service Manager - Step 6: Choose whether to use the bundled MySQL database or an existing one
Step 6: Choose whether to use the bundled MySQL database or an existing one

The final of the pre-installation configuration is to choose the username and password to use for the connection to the MySQL database. This is the username and password that you were reminded of earlier to make sure you remember:

Installing the MEM 3.0 Service Manager - Step 7: Specify username and password for the Service Manager to connect to the MySQL database storing the data collected through the monitoring
Step 7: Specify username and password for the Service Manager to connect to the MySQL database storing the data collected through the monitoring

Next a note that because we have chosen to install the Service Manager as a non-root user, it will not be started automatically when the server is restarted:

Installing the MEM 3.0 Service Manager - Step 8: Info that MEM will not start automatically when not installed as root
Step 8: Info that MEM will not start automatically when not installed as root

Ready to Install:

Installing the MEM 3.0 Service Manager - Step 9: Configuration completed
Step 9: Configuration completed – ready to install

The Service Manager is now being installed – this will take a little while as it both includes copying all the files in place as well as configuring and starting the web server and the bundled MySQL database:

Installing the MEM 3.0 Service Manager - Step 10: Installation is in progress
Step 10: Installation is in progress

MEM includes an uninstall script if you need to remove the Service Manager again:

Installing the MEM 3.0 Service Manager - Step 11: Files are now installed
Step 11: Files are now installed

To improve security MEM 3.0 by default uses SSL. The installation process adds a self-signed certificate, but you can choose to install your own certificate later:

Installing the MEM 3.0 Service Manager - Step 12: Information that MEM is using SSL with a self-signed certificate by default
Step 12: Information that MEM is using SSL with a self-signed certificate by default

The installer is now done and it is time to launch the UI:

Installing the MEM 3.0 Service Manager - Step 13: Ready to launch the Service Manager
Step 13: Ready to launch the Service Manager

Unattended Install

If you go through the installation process often, it is more convenient to use the unattended installation as it allows you to automate the installation. To perform the same installation as above using the unattended installation you can execute:

shell$ ./mysqlmonitor- \
            --mode unattended --unattendedmodeui none \
            --installdir /home/jesper/mysql/enterprise/monitor \
            --adminuser service_manager --adminpassword mypassword \
            --dbport 13306 --mysql_installation_type bundled \
            --tomcatport 18080 --tomcatshutdownport 18005 \
            --tomcatsslport 18443

As several of the values are the default ones, you can skip some of the options, but they are included here for completeness.

When the above command completes, continue with the post-installation configuration as you would have done using the installation wizard.

Post-Installation Configuration

Once the Service Manager UI has opened in your browser there is a little post-installation configuration to take care of:

  • You need to create an admin user for the web UI
  • You need to create a user agents can use if you choose to install any agents
  • You can choose whether to have the Service Manager automatically check for updates and how for how long time to keep historical data
Installing the MEM 3.0 Service Manager - Step 14: Post installation configuration
Step 14: Post installation configuration

The last step if to choose your timezone and which locale to use in the user interface:

Installing the MEM 3.0 Service Manager - Step 15: Choose timezone and the locale
Step 15: Choose timezone and the locale

After this you are ready to play around with the monitor. If you choose to install agents, the steps are similar.

Reducing Memory Footprint

One thing to be aware of though is that the default configuration used by the Service Manager is aimed at a small to medium production installation. If you for example installed MEM in a virtual machine or computer with limited memory available, the default configuration may not be optimal.

With the installation above just completed, the memory usage of the web server (Tomcat) and the bundled MySQL database is:

15653 jesper    20   0 3794m 924m  13m S  1.0 23.4   4:46.01 java
15586 jesper    20   0 3425m 608m 9588 S  2.6 15.4   2:23.88 mysqld

So around 1.5G resident memory. That is without adding any additional agents and/or MySQL instances to monitor.

So when I use MEM for testing, the first thing I do is to change a few configuration values to reduce the memory usage. The configuration options I change are located in two files (paths are given relative to the installation directory):

  • mysql/my.cnf – options related to the bundled MySQL database
  • apache-tomcat/bin/ – options related to the web server


For the MySQL database I change two setting:

  • innodb_buffer_pool_size – this is by default 768M, but if you only monitor a couple of MySQL instances, something like 100M is enough
  • table_definition_cache – this is by default 2048. One side effect of the value being so large is that the auto-sizing of the Performance Schema considers the installation as a large instance, so the Performance Schema tables are made large. Reducing the value to 800 is enough for testing and will reduce the memory usage of the Performance Schema with several hundred megabytes.

So my mysql/my.cnf looks something like this after the changes:

# Main Caches
table_definition_cache = 800

# InnoDB
innodb_buffer_pool_size = 100M


It is only necessary to edit one line in the Tomcat configuration file – the default settings are:

JAVA_OPTS="-Xmx768M -Xms768M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/jesper/mysql/enterprise/monitor/apache-tomcat/temp -XX:+UseParallelOldGC -XX:MaxPermSize=512M"

I change that to:

JAVA_OPTS="-Xmx256M -Xms256M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/jesper/mysql/enterprise/monitor/apache-tomcat/temp -XX:+UseParallelOldGC -XX:MaxPermSize=200M"

I.e. the three setting I have changed are:

  • -Xmx (maximum  heap size) from 768M to 256M
  • -Xms (minimum heap size) from 768M to 256M
  • -XX:MaxPermSize from 512M to 200M

Enabling the New Configurations

It requires restarting the Service Manager to make the new configurations take effect. You can do the restart by going into the installation directory and execute:

monitor$ ./ restart
Using CATALINA_BASE:   /home/jesper/mysql/enterprise/monitor/apache-tomcat
Using CATALINA_HOME:   /home/jesper/mysql/enterprise/monitor/apache-tomcat
Using CATALINA_TMPDIR: /home/jesper/mysql/enterprise/monitor/apache-tomcat/temp
Using JRE_HOME:        /home/jesper/mysql/enterprise/monitor/java
Using CLASSPATH:       /home/jesper/mysql/enterprise/monitor/apache-tomcat/bin/bootstrap.jar:/home/jesper/mysql/enterprise/monitor/apache-tomcat/bin/tomcat-juli.jar
Using CATALINA_PID:    /home/jesper/mysql/enterprise/monitor/apache-tomcat/temp/
Stopping tomcat service . [ OK ]
Stopping mysql service . [ OK ]
./ : mysql  started
131128 13:58:59 mysqld_safe Logging to '/home/jesper/mysql/enterprise/monitor/mysql/runtime/mysqld.log'.
131128 13:58:59 mysqld_safe Starting mysqld daemon with databases from /home/jesper/mysql/enterprise/monitor/mysql/data/
Using CATALINA_BASE:   /home/jesper/mysql/enterprise/monitor/apache-tomcat
Using CATALINA_HOME:   /home/jesper/mysql/enterprise/monitor/apache-tomcat
Using CATALINA_TMPDIR: /home/jesper/mysql/enterprise/monitor/apache-tomcat/temp
Using JRE_HOME:        /home/jesper/mysql/enterprise/monitor/java
Using CLASSPATH:       /home/jesper/mysql/enterprise/monitor/apache-tomcat/bin/bootstrap.jar:/home/jesper/mysql/enterprise/monitor/apache-tomcat/bin/tomcat-juli.jar
Using CATALINA_PID:    /home/jesper/mysql/enterprise/monitor/apache-tomcat/temp/

The memory usage is now somewhat smaller:

19521 jesper    20   0 2847m 544m  15m S  2.0 13.8   1:51.61 java
19484 jesper    20   0 2407m 160m 9296 S  0.3  4.1   0:09.21 mysqld