Automatic update of InnoDB Persistent Statistics Never Triggers

Since MySQL 5.6 InnoDB has supported persistent index statistics. This means that when you restart MySQL, InnoDB does not have to recalculate the statistics for the tables, rather it can read the statistics from its persistent storage. This has several advantages over the transient statistics, but as it turns out, there is also a catch: MySQL may under some circumstances never get around to update the index statistics. This particularly affects instances that are restarted frequently and tables with a large number of rows.

Information

Persistent statistics are the default, so unless you have explicitly chosen not to use them, then you are.

Background

Before I dive into why there are scenarios where persistent statistics are never updated, it is necessary to recap how persistent statistics work. Every time the statistics are updated, the result is stored in the mysql.innodb_index_stats table with the index statistic of the primary key doubling as table statistics in mysql.innodb_table_stats. These tables are normal InnoDB tables, so you can query them to learn what information InnoDB stores. For example using the world.city table (the world database can be downloaded from https://dev.mysql.com/doc/index-other.html):

mysql> SHOW CREATE TABLE world.city\G
*************************** 1. row ***************************
       Table: city
Create Table: CREATE TABLE `city` (
  `ID` int NOT NULL AUTO_INCREMENT,
  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`),
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `country` (`Code`)
) ENGINE=InnoDB AUTO_INCREMENT=4080 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.0019 sec)

mysql> SELECT *
         FROM mysql.innodb_table_stats
        WHERE database_name = 'world'
              AND table_name = 'city';
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update         | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| world         | city       | 2021-08-22 12:03:00 |   4046 |                   25 |                        7 |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
1 row in set (0.0007 sec)

mysql> SELECT *
         FROM mysql.innodb_index_stats
        WHERE database_name = 'world'
              AND table_name = 'city';
+---------------+------------+-------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name  | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-------------+---------------------+--------------+------------+-------------+-----------------------------------+
| world         | city       | CountryCode | 2021-08-22 12:03:00 | n_diff_pfx01 |        232 |           6 | CountryCode                       |
| world         | city       | CountryCode | 2021-08-22 12:03:00 | n_diff_pfx02 |       4079 |           6 | CountryCode,ID                    |
| world         | city       | CountryCode | 2021-08-22 12:03:00 | n_leaf_pages |          6 |        NULL | Number of leaf pages in the index |
| world         | city       | CountryCode | 2021-08-22 12:03:00 | size         |          7 |        NULL | Number of pages in the index      |
| world         | city       | PRIMARY     | 2021-08-22 12:03:00 | n_diff_pfx01 |       4046 |          20 | ID                                |
| world         | city       | PRIMARY     | 2021-08-22 12:03:00 | n_leaf_pages |         24 |        NULL | Number of leaf pages in the index |
| world         | city       | PRIMARY     | 2021-08-22 12:03:00 | size         |         25 |        NULL | Number of pages in the index      |
+---------------+------------+-------------+---------------------+--------------+------------+-------------+-----------------------------------+
7 rows in set (0.0362 sec)

The exact values may differ for you as the calculation of index statistics is not exact.

Tip

Notice how the index on the CountryCode column includes the ID column. InnoDB organises the rows according to the clustered index (“index-organised table” in Oracle DB terms) using the primary key if present as the clustered index. Since the primary key is what is used to locate the rows, it is required as being part of the index, so for non-unique secondary indexes, InnoDB exposes the primary key at the end of the index, so the optimiser can take advantage of it

Important for this discussion is that InnoDB automatically updates the index statistic in the background every time 10% of the table has been modified, though only changes that actually has potential to change the statistics are counted. For example, if you with the default table definition update the population, you can update the entire table without triggering an update of the index statistics as there is no index on the Population column. To avoid that the index statistics get updated continuously on small tables, the index statistics calculation can at most be triggered every 10 seconds for a table.

The catch is that the number of rows that have been updates since the last update of the index statistics is stored in memory and not persisted when you restart MySQL. This means that if you restart MySQL often enough compared to the size of your tables, then the index statistics will never get updated. Once scenario where this is particularly likely to occur is for instances that are regularly shut down to create cold backups. In that case you need to be particularly careful as instances restored from the backups will come up with out of date index statistics which can drastically impact performance. However, even for instances that are rarely restarted, you may encounter the issue on tables with a very large number of rows; consider a table with 10 billion rows, then the automatic update of the index statistics only triggers every time 1 billion rows have been updated which may not happen all that frequently.

Danger

If you regularly shut down MySQL to create cold backups, you must update the index statistics manually before bringing an instance into rotation after restoring a backup. Otherwise that instance may have very poor index statistics causing severe performance issues. More on this below.

Let us look at an example to see how this works in practice.

MySQL 8 Query Performance Tuning

Book

The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.

Example

In order to illustrate how the background updates of index statistics work, let us look at an example. We will again use the world.city table as it is small enough to easily trigger the automatic updates. We will add an index to the Population column as the population is easy to update:

mysql> ALTER TABLE world.city
         ADD INDEX (Population);
Query OK, 0 rows affected (0.9746 sec)

Records: 0  Duplicates: 0  Warnings: 0

The number of rows in the city table for China and India combines to cover more than 10% of the table while each country on its own has less than 10% of the cities:

mysql> SELECT CountryCode, COUNT(*)
         FROM world.city
        WHERE CountryCode IN ('CHN', 'IND')
        GROUP BY CountryCode;
+-------------+----------+
| CountryCode | COUNT(*) |
+-------------+----------+
| CHN         |      363 |
| IND         |      341 |
+-------------+----------+
2 rows in set (0.0013 sec)

Thus we can update the population of the cities in China and India to trigger an update of the index statistics (but remember there must be a 10 second delay between triggering recalculations of the index statistics).

No Restart

First, we will try to update the population of the Chinese and Indian cities without a restart in between updating the cities of each country:

mysql> SELECT last_update, stat_name
         FROM mysql.innodb_index_stats
        WHERE database_name = 'world'
              AND table_name = 'city'
              AND index_name = 'Population';
+---------------------+--------------+
| last_update         | stat_name    |
+---------------------+--------------+
| 2021-08-22 12:49:25 | n_diff_pfx01 |
| 2021-08-22 12:49:25 | n_diff_pfx02 |
| 2021-08-22 12:49:25 | n_leaf_pages |
| 2021-08-22 12:49:25 | size         |
+---------------------+--------------+
4 rows in set (0.0010 sec)

mysql> UPDATE world.city
          SET Population = Population * 1.10
        WHERE CountryCode = 'CHN';
Query OK, 363 rows affected (0.4259 sec)

Rows matched: 363  Changed: 363  Warnings: 0

mysql> SELECT last_update, stat_name
         FROM mysql.innodb_index_stats
        WHERE database_name = 'world'
          AND table_name = 'city'
          AND index_name = 'Population';
+---------------------+--------------+
| last_update         | stat_name    |
+---------------------+--------------+
| 2021-08-22 12:49:25 | n_diff_pfx01 |
| 2021-08-22 12:49:25 | n_diff_pfx02 |
| 2021-08-22 12:49:25 | n_leaf_pages |
| 2021-08-22 12:49:25 | size         |
+---------------------+--------------+
4 rows in set (0.0007 sec)

mysql> UPDATE world.city
          SET Population = Population * 1.10
        WHERE CountryCode = 'IND';
Query OK, 341 rows affected (0.1024 sec)

Rows matched: 341  Changed: 341  Warnings: 0

mysql> SELECT last_update, stat_name
         FROM mysql.innodb_index_stats
        WHERE database_name = 'world'
              AND table_name = 'city'
              AND index_name = 'Population';
+---------------------+--------------+
| last_update         | stat_name    |
+---------------------+--------------+
| 2021-08-22 12:50:29 | n_diff_pfx01 |
| 2021-08-22 12:50:29 | n_diff_pfx02 |
| 2021-08-22 12:50:29 | n_leaf_pages |
| 2021-08-22 12:50:29 | size         |
+---------------------+--------------+
4 rows in set (0.0012 sec)

The result is the expected that InnoDB updates the index statistics after the cities of both countries have been updated.

With Restart

Now let us try the same example, but add a restart between updating the cities of China and the cities of India:

mysql> ANALYZE TABLE world.city;
+------------+---------+----------+----------+
| Table      | Op      | Msg_type | Msg_text |
+------------+---------+----------+----------+
| world.city | analyze | status   | OK       |
+------------+---------+----------+----------+
1 row in set (0.1527 sec)

mysql> SELECT last_update, stat_name
         FROM mysql.innodb_index_stats
        WHERE database_name = 'world'
              AND table_name = 'city'
              AND index_name = 'Population';
+---------------------+--------------+
| last_update         | stat_name    |
+---------------------+--------------+
| 2021-08-22 12:51:48 | n_diff_pfx01 |
| 2021-08-22 12:51:48 | n_diff_pfx02 |
| 2021-08-22 12:51:48 | n_leaf_pages |
| 2021-08-22 12:51:48 | size         |
+---------------------+--------------+
4 rows in set (0.0008 sec)

mysql> UPDATE world.city
          SET Population = Population * 1.10
        WHERE CountryCode = 'CHN';
Query OK, 363 rows affected (0.1065 sec)

Rows matched: 363  Changed: 363  Warnings: 0

mysql> RESTART;
Query OK, 0 rows affected (0.0011 sec)

The global session got disconnected..
Attempting to reconnect to 'mysqlx://myuser@localhost:33060/world'..
The global session was successfully reconnected.

mysql> SELECT last_update, stat_name
         FROM mysql.innodb_index_stats
        WHERE database_name = 'world'
              AND table_name = 'city'
              AND index_name = 'Population';
+---------------------+--------------+
| last_update         | stat_name    |
+---------------------+--------------+
| 2021-08-22 12:51:48 | n_diff_pfx01 |
| 2021-08-22 12:51:48 | n_diff_pfx02 |
| 2021-08-22 12:51:48 | n_leaf_pages |
| 2021-08-22 12:51:48 | size         |
+---------------------+--------------+
4 rows in set (0.0438 sec)

mysql> UPDATE world.city
          SET Population = Population * 1.10
        WHERE CountryCode = 'IND';
Query OK, 341 rows affected (0.1299 sec)

Rows matched: 341  Changed: 341  Warnings: 0

mysql> SELECT last_update, stat_name
         FROM mysql.innodb_index_stats
        WHERE database_name = 'world'
              AND table_name = 'city'
              AND index_name = 'Population';
+---------------------+--------------+
| last_update         | stat_name    |
+---------------------+--------------+
| 2021-08-22 12:51:48 | n_diff_pfx01 |
| 2021-08-22 12:51:48 | n_diff_pfx02 |
| 2021-08-22 12:51:48 | n_leaf_pages |
| 2021-08-22 12:51:48 | size         |
+---------------------+--------------+
4 rows in set (0.0007 sec)

This examples starts by analysing the world.city table as otherwise some of the rows updated in the previous example would have counted towards the 10% for the next calculation of the index statistics. (The trigger of the automatic updates does not happen at statement boundaries but rather as soon as the 10% has been reached.) The RESTART command (new in MySQL 8) is used to restart MySQL.

Advice

The RESTART command only works if MySQL is run under an “angel” process such as systemd, mysqld_safe, or as a Microsoft Windows service.

The important point is that this time, the update of the population of the Indian cities does not trigger an update of the index statistics.

Workaround

There are several configuration options for InnoDB persistent statistics both globally and per table, but unfortunately none that can solve this issues. Optimally there would be a way to have InnoDB automatically – as a background task – update the statistics at a regular interval on a per table basis.

The workaround is to create your own scheduled job to regularly look for tables that need their index statistics updated. You can use the last_update column in mysql.innodb_table_stats table to look for tables that are candidate for a manual update:

mysql>  SELECT database_name, table_name, n_rows, last_update
          FROM mysql.innodb_table_stats
         WHERE database_name = 'world';
+---------------+-----------------+--------+---------------------+
| database_name | table_name      | n_rows | last_update         |
+---------------+-----------------+--------+---------------------+
| world         | city            |   4046 | 2021-08-22 12:51:48 |
| world         | country         |    239 | 2021-08-22 12:49:15 |
| world         | countrylanguage |    984 | 2021-08-22 12:49:35 |
+---------------+-----------------+--------+---------------------+
3 rows in set (0.0007 sec)

Optionally, you can include the estimated number of rows to skip small tables where the automatic update will get to them as required.

You can then update the index statistics using the ANALYZE TABLE statement like:

mysql> ANALYZE TABLE world.city, world.country, world.countrylanguage;
+-----------------------+---------+----------+----------+
| Table                 | Op      | Msg_type | Msg_text |
+-----------------------+---------+----------+----------+
| world.city            | analyze | status   | OK       |
| world.country         | analyze | status   | OK       |
| world.countrylanguage | analyze | status   | OK       |
+-----------------------+---------+----------+----------+
3 rows in set (0.7794 sec)

Alternatively the mysqlcheck program can be used:

shell$ mysqlcheck --user=root --password --host=localhost --port=3306 --analyze world city country countrylanguage
Enter password: ********
world.city OK
world.country OK
world.countrylanguage OK

If you want to update all tables in a schema, you do not even need to specify the table names.

Warning

Before MySQL 8.0.24 (or 5.6.38/5.7.20 if you use Percona's binaries), it was dangerous to execute ANALYZE TABLE on a production system with “long running” queries. The reason is that ANALYZE TABLE would trigger an implicit table flush (but only after ANALYZE TABLE had returned making it hard to diagnose) meaning that until all queries using the table had completed, the implicit flush would block all new queries using the table from running.

Monitoring

If you want to determine the number of rows that has been updated and counted towards triggering the next automatic calculation of index statistics, you can query the information_schema.INNODB_TABLESTATS view (INNODB_SYS_TABLESTATS in MySQL 5.6 and 5.7):

mysql> ANALYZE TABLE world.city;
+------------+---------+----------+----------+
| Table      | Op      | Msg_type | Msg_text |
+------------+---------+----------+----------+
| world.city | analyze | status   | OK       |
+------------+---------+----------+----------+
1 row in set (0.1706 sec)

mysql> SELECT *
         FROM information_schema.INNODB_TABLESTATS
        WHERE NAME = 'world/city'\G
*************************** 1. row ***************************
         TABLE_ID: 4159
             NAME: world/city
STATS_INITIALIZED: Uninitialized
         NUM_ROWS: 0
 CLUST_INDEX_SIZE: 0
 OTHER_INDEX_SIZE: 0
 MODIFIED_COUNTER: 0
          AUTOINC: 4080
        REF_COUNT: 1
1 row in set (0.5694 sec)

mysql> UPDATE world.city
          SET Population = Population * 1.10
        WHERE CountryCode = 'CHN';
Query OK, 363 rows affected (0.7183 sec)

Rows matched: 363  Changed: 363  Warnings: 0

mysql> SELECT *
         FROM information_schema.INNODB_TABLESTATS
        WHERE NAME = 'world/city'\G
*************************** 1. row ***************************
         TABLE_ID: 4159
             NAME: world/city
STATS_INITIALIZED: Initialized
         NUM_ROWS: 4046
 CLUST_INDEX_SIZE: 25
 OTHER_INDEX_SIZE: 12
 MODIFIED_COUNTER: 363
          AUTOINC: 4080
        REF_COUNT: 2
1 row in set (0.0017 sec)

Notice the MODIFIED_COUNTER column. The STATS_INITIALIZED means whether the information exposed in this view have been loaded into memory and not whether the index statistics exist. Using this view, you can easily verify that restarting MySQL resets MODIFIED_COUNTER to 0.

I have worked with MySQL databases since 2006 both as an SQL developer, a database administrator, and for more than eight years as part of the Oracle MySQL Support team. I have spoken at MySQL Connect and Oracle OpenWorld on several occasions. I have contributed to the sys schema and four Oracle Certified Professional (OCP) exams for MySQL 5.6 to 8.0. I have written four books, all published at Apress.

0 Comments on “Automatic update of InnoDB Persistent Statistics Never Triggers

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.