What Does I/O Latencies and Bytes Mean in the Performance and sys Schemas?

The MEM InnoDB File I/O Graphs showing a peak in latency and bytes at 12:51.

The Performance Schema and sys schema are great for investigating what is going on in MySQL including investigating performance issues. In my work in MySQL Support, I have a several times heard questions whether a peak in the InnoDB Data File I/O – Latency graph in MySQL Enterprise Monitor (MEM) or some values from the corresponding tables and view in the Performance Schema and sys schema are cause for concern. This blog will discuss what these observations means and how to use them.

The Tables and Views Involved

This blog will look into three sources in the Performance Schema for I/O latencies, so let’s first take a look at those tables. The three Performance Schema tables are:

  • events_waits_summary_global_by_event_name: with the event name set to wait/io/table/sql/handler or wait/io/file/%. This is the table used for the waits_global_by_latency and wait_classes_global_by_% views in the sys schema.
  • table_io_waits_summary_by_table: this is the table used for the schema_table_statistics% views in the sys schema.
  • file_summary_by_instance: this is the table used for the io_global_by_file_by% views in the sys schema.

These are also the sources used in MySQL Enterprise Monitor for the InnoDB Data File I/O graphs shown above. Let’s take a look at an example of each of the tables.


The events_waits_summary_global_by_event_name has aggregate data for wait events grouped by the event name. For the purpose of this discussion it is the table and file wait/io events that are of interested. An example of the data returned is:

mysql> SELECT *
         FROM performance_schema.events_waits_summary_global_by_event_name
        WHERE EVENT_NAME = 'wait/io/table/sql/handler'
              OR EVENT_NAME LIKE 'wait/io/file/%'
        LIMIT 5;
| wait/io/file/innodb/innodb_log_file  |    4003029 | 97371921796204 |        5371684 |       24324412 |   180878537324 |
| wait/io/table/sql/handler            |    4049559 | 43338753895440 |         494128 |       10702072 |   138756437188 |
| wait/io/file/innodb/innodb_data_file |      25850 | 11395061934996 |              0 |      440814508 |    43029060332 |
| wait/io/file/sql/binlog              |      20041 |  1316679917820 |              0 |       65699088 |    25816580304 |
| wait/io/file/sql/io_cache            |       2439 |    68212824648 |        1448920 |       27967360 |      628484180 |
5 rows in set (0.00 sec)

              sys.format_time(SUM_TIMER_WAIT) AS SumTimerWait,
              sys.format_time(MIN_TIMER_WAIT) AS MinTimeWait,
              sys.format_time(AVG_TIMER_WAIT) AS AvgTimerWait,
              sys.format_time(MAX_TIMER_WAIT) AS MaxTimerWait
         FROM performance_schema.events_waits_summary_global_by_event_name
        WHERE EVENT_NAME = 'wait/io/table/sql/handler'
              OR EVENT_NAME LIKE 'wait/io/file/
| EVENT_NAME                           | COUNT_STAR | SumTimerWait | MinTimeWait | AvgTimerWait | MaxTimerWait |
| wait/io/file/innodb/innodb_log_file  |    4003029 | 1.62 m       | 5.37 us     | 24.32 us     | 180.88 ms    |
| wait/io/table/sql/handler            |    4049559 | 43.34 s      | 494.13 ns   | 10.70 us     | 138.76 ms    |
| wait/io/file/innodb/innodb_data_file |      25853 | 11.40 s      | 0 ps        | 440.78 us    | 43.03 ms     |
| wait/io/file/sql/binlog              |      20041 | 1.32 s       | 0 ps        | 65.70 us     | 25.82 ms     |
| wait/io/file/sql/io_cache            |       2439 | 68.21 ms     | 1.45 us     | 27.97 us     | 628.48 us    |
5 rows in set (0.01 sec)

mysql> SELECT *
         FROM sys.waits_global_by_latency
        WHERE events = 'wait/io/table/sql/handler'
              OR events LIKE 'wait/io/file/%'
        LIMIT 5;
| events                               | total   | total_latency | avg_latency | max_latency |
| wait/io/file/innodb/innodb_log_file  | 4003029 | 1.62 m        | 24.32 us    | 180.88 ms   |
| wait/io/table/sql/handler            | 4049559 | 43.34 s       | 10.70 us    | 138.76 ms   |
| wait/io/file/innodb/innodb_data_file |   25874 | 11.43 s       | 441.88 us   | 43.03 ms    |
| wait/io/file/sql/binlog              |   20131 | 1.32 s        | 65.66 us    | 25.82 ms    |
| wait/io/file/sql/io_cache            |    2439 | 68.21 ms      | 27.97 us    | 628.48 us   |
5 rows in set (0.01 sec)

These three queries show the same data, just obtained and displayed in different ways.

In the result there are two groups of events. The wait/io/table events (the wait/io/table/sql/handler is the only event of this group which is why it can be listed explicitly) and the wait/io/file group.

The table events are for accessing data in tables. It does not matter whether the data is cached in the buffer pool or is accessed on disk. In this table and view, there is no distinguishing between different types of access (read, write, etc.).

The file events are, as the name suggest, for actually accessing files. There is one file event per file type. For example, in he output there are the wait/io/file/innodb/innodb_log_file event for accessing the InnoDB redo log files, the wait/io/file/innodb/innodb_data_file event for accessing the InnoDB data files themselves, the wait/io/file/sql/binlog event for the binary log files, etc.

In the second query, all of the timings are wrapped in the sys.format_time() function. The timings returned by the Performance Schema are in picoseconds (10^-12 second) which are somewhat hard for us humans to read. The sys.format_time() function converts the picoseconds to human readable strings. When you sort, however, make sure you sort by the original non-converted values.


Use the sys.format_time() function to convert the picoseconds to a human readable value, but do only so for the returned row; sort and filter by the original values. The Performance Schema always returns timings in picoseconds irrespective of the timer used internally for the event.

The sys schema by default returns the timing as human readable values. If you need the values in picoseconds prefix the table name with x$, for example sys.x$waits_global_by_latency. The sys schema includes an ORDER BY clause in most views. For the waits_global_by_latency view, the default ordering is by the total latency, so there is no need to add an ORDER BY clause in this example.


The table_io_waits_summary_by_table Performance Schema table and schema_table_statistics% sys schema views are related to the wait/io/table/sql/handler event just discussed. These provide information about the amount of time spent per table. Unlike querying the wait/io/table/sql/handler in the wait_events_% tables, it split the time spent into whether it is used for reads, writes, fetch, insert, update, or delete. The read and write columns are the aggregates for the corresponding read and write operations, respectively. Since fetch is the only read operation, the read and fetch columns will have the same values.

The table and view show the table I/O, i.e. the access to table data irrespective of whether the data is accessed in-memory or on disk. This is similar to the wait/io/table/sql/handler event. An example of the result of querying the table and view for the employees.salaries table is:

mysql> SELECT *
         FROM performance_schema.table_io_waits_summary_by_table
        WHERE OBJECT_SCHEMA = 'employees'
              AND OBJECT_NAME = 'salaries'\
*************************** 1. row ***************************
   OBJECT_SCHEMA: employees
     OBJECT_NAME: salaries
      COUNT_STAR: 2844047
  SUM_TIMER_WAIT: 31703741623644
  MIN_TIMER_WAIT: 4975456
  AVG_TIMER_WAIT: 11147072
  MAX_TIMER_WAIT: 138756437188
      COUNT_READ: 0
     COUNT_WRITE: 2844047
 SUM_TIMER_WRITE: 31703741623644
 AVG_TIMER_WRITE: 11147072
 MAX_TIMER_WRITE: 138756437188
    COUNT_INSERT: 2844047
SUM_TIMER_INSERT: 31703741623644
MAX_TIMER_INSERT: 138756437188
1 row in set (0.00 sec)

mysql> SELECT *
         FROM sys.schema_table_statistics
        WHERE table_schema = 'employees'
              AND table_name = 'salaries'\G
*************************** 1. row ***************************
     table_schema: employees
       table_name: salaries
    total_latency: 31.70 s
     rows_fetched: 0
    fetch_latency: 0 ps
    rows_inserted: 2844047
   insert_latency: 31.70 s
     rows_updated: 0
   update_latency: 0 ps
     rows_deleted: 0
   delete_latency: 0 ps
 io_read_requests: 493
          io_read: 7.70 MiB
  io_read_latency: 611.04 ms
io_write_requests: 8628
         io_write: 134.91 MiB
 io_write_latency: 243.19 ms
 io_misc_requests: 244
  io_misc_latency: 2.50 s
1 row in set (0.05 sec)

In this case it shows that there has been mostly writes – inserts – for the table. The sys schema view effectively joins on the performance_schema.file_summary_by_instance for the read columns, so for the schema_table_statistics view fetch and read are not synonyms.

So, what is it the file_summary_by_instance table shows that is different table the “table I/O” that has been the topic of the first two tables? Let’s see.


Unlike the two previous tables, the file_summary_by_instance shows how much time is spent on actual file I/O and how much data is accessed. This makes the file_summary_by_instance table and the corresponding sys schema views very useful for determining where time is spent doing disk I/O and which files have the most data accesses on disk.

An example of using the Performance Schema and two of the sys schema views is:

mysql> SELECT *
         FROM performance_schema.file_summary_by_instance
        WHERE FILE_NAME LIKE '%\\\\employees\\\\salaries.ibd'\G
*************************** 1. row ***************************
                FILE_NAME: C:\ProgramData\MySQL\MySQL Server 8.0\Data\employees\salaries.ibd
               EVENT_NAME: wait/io/file/innodb/innodb_data_file
    OBJECT_INSTANCE_BEGIN: 2230271392896
               COUNT_STAR: 9365
           SUM_TIMER_WAIT: 3351594917896
           MIN_TIMER_WAIT: 11970500
           AVG_TIMER_WAIT: 357885020
           MAX_TIMER_WAIT: 40146113948
               COUNT_READ: 493
           SUM_TIMER_READ: 611040652056
           MIN_TIMER_READ: 65421052
           AVG_TIMER_READ: 1239433224
           MAX_TIMER_READ: 16340582272
              COUNT_WRITE: 8628
          SUM_TIMER_WRITE: 243186542696
          MIN_TIMER_WRITE: 11970500
          AVG_TIMER_WRITE: 28185588
          MAX_TIMER_WRITE: 1984546920
               COUNT_MISC: 244
           SUM_TIMER_MISC: 2497367723144
           MIN_TIMER_MISC: 154910196
           AVG_TIMER_MISC: 10235113564
           MAX_TIMER_MISC: 40146113948
1 row in set (0.00 sec)

mysql> SELECT *
         FROM sys.io_global_by_file_by_latency
        WHERE file = '@@datadir\\employees\\salaries.ibd'\G
*************************** 1. row ***************************
         file: @@datadir\employees\salaries.ibd
        total: 9365
total_latency: 3.35 s
   count_read: 493
 read_latency: 611.04 ms
  count_write: 8628
write_latency: 243.19 ms
   count_misc: 244
 misc_latency: 2.50 s
1 row in set (0.09 sec)

mysql> SELECT *
         FROM sys.io_global_by_file_by_bytes
         WHERE file = '@@datadir\\employees\\salaries.ibd'\G
*************************** 1. row ***************************
         file: @@datadir\employees\salaries.ibd
   count_read: 493
   total_read: 7.70 MiB
     avg_read: 16.00 KiB
  count_write: 8628
total_written: 134.91 MiB
    avg_write: 16.01 KiB
        total: 142.61 MiB
    write_pct: 94.60
1 row in set (0.10 sec)

This example is from Microsoft Windows, and as always when backslashes are in play, it is fun to try to determine the appropriate number of backslashes to use. When specifying the file name with LIKE, you need four backslashes per backslash in the file name; when using = you need two backslashes.

Again, the values are split into reads and writes (though not as detailed as before with fetch, insert, update, and delete – that is not known at the level where the file I/O happens). The miscellaneous values include everything that is not considered reads or write; this includes opening and closing the file.

The sys schema queries not only have formatted the timings, but also the path and the bytes. This has been done using the sys.format_path() and sys.format_bytes() functions, respectively.

From the output, it can be seen that despite no rows were ever fetched (read) from the employees.salaries table (that was found in the previous output), there has still been some read file I/O. This was what the sys.schema_table_statistics view reflected.

So, what does all of this mean? The graph in MySQL Enterprise Monitor showed that there was six seconds file I/O latency for the InnoDB data files. Is that bad? As often with these kinds of questions, the answer is: “it depends”.

What to Make of the Values

In this case we have a case where the graphs in MySQL Enterprise Monitor show a peak for the latency and bytes used doing I/O on the InnoDB data files. This is actually disk I/O. But what exactly does that means and should the alarm sound?

The MEM InnoDB File I/O Graphs showing a peak in latency and bytes at 12:31.

Let’s first refresh our memory on how the graphs looked:
If you are not using MySQL Enterprise Monitor, you may have similar graphs from your monitoring solution, or you have obtained latency and bytes values from the tables and views discussed in this blog.

The latency graph shows that we have done six seconds if I/O. What does that mean? It is the aggregate I/O during the period the data was collected. In this case, the data is plotted for every minute, so in the one minute around 12:51, of the 60 seconds a total of six seconds was spent doing I/O. Now, the six seconds suddenly do no sound so bad. Similar for the bytes, around 4.6MiB of data was read or written.

In general, the values obtained either from the monitoring graphs or from the underlying tables cannot be used to conclude whether the is a problem or not. They just show how much I/O was done at different times.

Similar for the values from the Performance Schema. On their own they do not tell much. You can almost say they are too neutral – they just state how much work was done, not whether it was too much or too little.

A more useful way to use this data is in case a problem is reported. This can be that system administrator reports the disks are 100% utilized or that end users report the system is slow. Then, you can go and look at what happened. If the disk I/O was unusually high at that point in time, then that is likely related, and you can continue your investigation from there.

MEM's Database File I/O Report
MySQL Workbench's Top File I/O Activity Report

There are more reports in MySQL Enterprise Monitor both as time series graphs and as point-in-time snapshots. The point-in-time snapshots are often using the sys schema views but allows sorting. An example is the Database File I/O reports:
MySQL Workbench also provides performance reports based on the sys scheme. The equivalent to the previous report is the Top File I/O Activity Report:
The MySQL Workbench performance reports also allows you to export the report or copy the query used to generate the report, so you can execute it manually.

With respect to the wait/io/table/sql/handler events, then remember that I/O here does not mean disk I/O, but table I/O. So, all it means that time is accumulating for these events – including when looking at the per table is that the data in the table is used. There are also per index values in table_io_waits_summary_by_index_usage Performance Schema table and the schema_index_statistics sys view which have the same meaning. Whether a given usage is too high depends on many things. However, it can again be useful to investigate what is causing problems.

For the example data from this blog, it was triggered by loading the employees sample database. So, there was no problem. If you want to put data into your database, it must necessarily be written to the data file, otherwise it will not be persisted. However, if you think the data import took too long, you can use the data as part of your investigation on how to make the import faster.

The conclusion is that you should not panic if you see the I/O latencies peak. On their own they just mean that more work was done during that period that usual. Without context a peak is no worse than a dip (which could indicate something is preventing work from being done or that there is a natural slow period). Instead use the I/O latencies and bytes together with other observations to understand where MySQL is spending time and for which files data is being read and written.


I will recommend you look at the following references, if you want to understand more about the Performance Schema tables and sys schema views discussed in this blog:

The GitHub repository for the sys schema includes all the definitions of the views (and other objects). Since these are written in plain SQL, they are very useful to see in more depth where the data is coming from. The GitHub website allows you to browse through each of the files. Each sys schema object is defined in its own file.

Configuring the MySQL Shell Prompt

With the introduction of MySQL Shell 8.0, the second major version of the new command-line tool for MySQL, a new and rich featured prompt was introduced. Unlike the prompt of the traditional mysql command-line client, it does not just say mysql> by default. Instead it comes in a colour coded spectacle.

The default prompt is great, but for one reason or another it may be that you want to change the prompt. Before getting to that, let’s take a look at the default prompt, so the starting point is clear.

The Default Prompt

An example of the default prompt can be seen in the screen shot below. As you can see, there are several parts to the prompt, each carrying its information.

MySQL Shell with the default font.
MySQL Shell with the default font.

There are six parts. From left to right, they are:

  • Status: Whether it is a production system or whether the connection is lost. This part is not included in the above screen shot.
  • MySQL: Just a reminder that you are working with a MySQL database.
  • Connection: Which host you are connected to (localhost), which port (33060 – to the X protocol port), and that SSL is being used.
  • Schema: The current default schema.
  • Mode: Whether you are using JavaScript (JS), Python (Py), or SQL (SQL) to enter commands.
  • End: As per tradition, the prompt ends with a >.

Depending on your current status one or more of the parts may be missing. For example, the configuration options will only be present, when you have an active connection to a MySQL Server instance.

The prompt works well on a black background and thus brightly coloured text as in the screen shot, but for some other background and text colours, it is not so – or you may simply want different colours to signify which whether you are connected to a development or production system. You may also find the prompt too verbose, if you are recording a video or writing training material. So, let’s move on and find out how the prompt is configured.

The Prompt Configuration

Since the prompt is not just a simple string, it is also somewhat more complex to configure it than just setting an option. The configuration is done in a JSON object stored in a file named prompt.json (by default – you can change this – more about that later).

The location of prompt.json depends on your operating system:

  • Linux and macOS: ~/.mysqlsh/prompt.json – that is in the .mysqlsh directory in the user’s home directory.
  • Microsoft Windows: %AppData%\MySQL\mysqlsh\prompt.json – that is in AppData\Roaming\MySQL\mysqlsh directory from the user’s home directory.

If the file does not exist, MySQL Shell falls back on a system default. For example, on Oracle Linux 7 installation, the file /usr/share/mysqlsh/prompt/prompt_256.json is used. This is also the template that is copied to %AppData%\MySQL\mysqlsh\prompt.json on Microsoft Windows 10 installation.

The MySQL Shell installation includes several templates that you can choose from. These are:

  • prompt_16.json: A coloured prompt limited to use 16/8 color ANSI colours and attributes.
  • prompt_256.json: The prompt uses 256 indexed colours. This is the one that are used by default both on Oracle Linux 7 and Microsoft Windows 10.
  • prompt_256inv.json: Similar to prompt_256.json, but with an “invisible” background colour (it just uses the same as for the terminal) and with different foreground colours.
  • prompt_256pl.json: Same as prompt_256.json but with extra symbols. This Powerline patched font such as the one that is installed with the Powerline project. This will add a padlock with the prompt when you use SSL to connect to MySQL and use “arrow” separators.
  • prompt_256pl+aw.json: Same as prompt_256pl.json but with “awesome symbols”. This additionally requires the awesome symbols to be included in the Powerline font.
  • prompt_classic.json: This is a very basic prompt that just shows mysql-js>, mysql-py>, or mysql-sql> based on the mode in use.
  • prompt_nocolor.json: Gives the full prompt information, but completely without colours. An example of a prompt is: MySQL [localhost+ ssl/world] JS>

These are templates that you can use as is or modify to suite yours needs and preferences. One way to pick a theme is to copy the template file into the location of your user’s prompt definition. The templates can be found in the prompt directory of the installation, for example:

  • Oracle Linux 7 RPM: /usr/share/mysqlsh/prompt/
  • Microsoft Windows: C:\Program Files\MySQL\MySQL Shell 8.0\share\mysqlsh\prompt

Another option is to define the MYSQLSH_PROMPT_THEME environment variable to point to the file you want to use. The value should be the full path to the file. This is particularly useful if you want to try the different template to see what works best for you. For example, to use the prompt_256inv.json template from the command prompt on Microsoft Windows:

C:\> set MYSQLSH_PROMPT_THEME=C:\Program Files\MySQL\MySQL Shell 8.0\share\mysqlsh\prompt\prompt_256inv.json

Which gives the prompt:

The prompt when using the prompt_256inv.json template.
The prompt when using the prompt_256inv.json template.

If none of the templates work for you, you can also dive in at the deep end of the pool and create your own configuration.

Creating Your Own Configuration

It is not completely trivial to create your own configuration, but if you use the template that is closest to the configuration you want as a starting point, it is not difficult either.

A good source of help to create the perfect prompt is also the README.prompt file that is located in the same directory as the template files. The README.prompt file contains the specification for the configuration.

Instead of going through the specification in details, let’s take a look at the prompt_256.json template and discuss some parts of it. Let’s start at the end of the file:

  "segments": [
      "classes": ["disconnected%host%", "%is_production%"]
      "text": " My",
      "bg": 254,
      "fg": 23
      "separator": "",
      "text": "SQL ",
      "bg": 254,
      "fg": 166
      "classes": ["disconnected%host%", "%ssl%host%session%"],
      "shrink": "truncate_on_dot",
      "bg": 237,
      "fg": 15,
      "weight": 10,
      "padding" : 1
      "classes": ["noschema%schema%", "schema"],
      "bg": 242,
      "fg": 15,
      "shrink": "ellipsize",
      "weight": -1,
      "padding" : 1
      "classes": ["%Mode%"],
      "text": "%Mode%",
      "padding" : 1

This is where the elements of the prompt is defined. There are a few things that is interesting to note here.

First, notice that there is an object with the classes disconnected%host% and %is_production%. The names inside the %s are variables defined in the same file or that comes from MySQL Shell itself (it has variables such as the host and port). For example, is_production is defined as:

  "variables" : {
    "is_production": {
        "match" : {
            "pattern": "*;%host%;*",
            "value": ";%env:PRODUCTION_SERVERS%;"
        "if_true" : "production",
        "if_false" : ""

So, a host is considered to be a production instance if it is included in the environment variable PRODUCTION_SERVERS. When there is a match, and additional element is inserted at the beginning of the prompt to make it clear, you are working on with a production system:

Connected to a production system.
Connected to a production system.

The second thing to note about the list of elements is that there are some special functions such as shrink which can be used to define how the text is kept relatively short. For example, the host uses truncate_on_dot, so only the part before the first dot in the hostname is displayed if the full hostname is too long. Alternatively ellipsize can be used to add … after the truncated value.

Third, the background and foreground colours are defined using the bg and fg elements respectively. This allows you to completely customize the prompt to your liking with respect to colours. The colour can be specified in one of the following ways:

  • By Name: There are a few colours that are known by name: black, red, green, yellow, blue, magenta, cyan, white.
  • By Index: A value between 0 and 255 (both inclusive) where 0 is black, 63 light blue, 127 magenta, 193 yellow, and 255 is white.
  • By RGB: Use a value in the #rrggbb format. Requires the terminal supports TrueColor colours.


If you want to do more than make a few tweaks to an existing template, read the README.prompt file to see the full specification including a list of supported attributes and built-in variables. These may change in the future as more features are added.

One group of built-in variables that deserve an example are the ones that in some way depend on the environment or the MySQL instance you are connected to. These are:

  • %env:varname%: This uses an environment variable. The way that it is determined whether you are connected to a production server is an example of how an environment variable
  • %sysvar:varname%: This uses the value of a global system variable from MySQL. That is, the value returned by SELECT @@global.varname.
  • %sessvar:varname%: Similar to the previous but using a session system variable.
  • %status:varname%: This uses the value of a global status variable from MySQL. That is, the value returned by SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'varname'.
  • %status:varname%: Similar to the previous, but using a session status variable.

If you for example want to include the MySQL version (of the instance you are connected to) in the prompt, you can add an element like:

      "separator": "",
      "text": "%sysvar:version%",
      "bg": 250,
      "fg": 166

The resulting prompt is:

Including the MySQL Server version in the prompt.
Including the MySQL Server version in the prompt.

What next? Now it is your turn to play with MySQL Shell. Enjoy.

MySQL 8.0.12: Instant ALTER TABLE

There are many nice changes included in the MySQL 8.0.12 release that were published a couple of days ago. One of the most exciting is the ability to make instant schema changes to tables. This blog will look into why I think that that is a stand-out change.

I will not go into details with the implementation other than noting that the new data dictionary in MySQL 8.0 has allowed for a nice implementation of the patch that was contributed by the Tencent Games DBA Team. If you are interested in learning more about the implementation, I will recommend you to read the blog by Bin Su (MySQL 8.0: InnoDB now supports Instant ADD COLUMN) and the worklog implementing the feature (WL#11250: Support Instant Add Column).


Thanks to the Tencent Games DBA Team who contributed the patch for this feature.

Why Do We Need Instant ALTER TABLE?

As you may know, InnoDB has since version 5.6 supported online schema changes. So, a fair thought may be why that is no longer good enough. The thing is that while online schema changes are very nice and allows you to make changes to the data (DML statements) while the schema change is made, there are still some problems:

  • Online ALTER TABLE still requires a meta data lock for the duration of the operation. That is, it is not possible to make other schema changes until the ALTER TABLE has completed.
  • In replication setups, the SQL thread handling the online schema change will still block other changes coming through the replication channel from occurring. This means that an online schema change that takes an hour suddenly makes the replication lag an hour.
  • This is compounded in chained replication setups where the delay on the last instances in the topology is multiplied with the number of times the change has been replication. Suddenly this schema change that did not really affect the replication master becomes a major pain and delays the deployment of the application upgrade that relies on the schema changes.
  • Even though the change is online, it still is heavy on resources: disk for storing the extra copy of the table when the change cannot be made in-place, CPU and disk activity that may cause other queries to become slow, etc.
  • There is a limit to the amount of DML changes that can be made during the schema change. If you make too many changes, the schema change will fail.

So, in short, online schema changes are only the first stop. Instance schema changes is the future.

Which Changes Can Be Made Instantly?

While it would be great if all schema changes could be made instantly, unfortunately that is not the case. Just as not all schema changes can be made online, there are limitations to the new instant feature. In short, the changes that can be made instantly must only affect the metadata of the table. The metadata is stored in the data dictionary. The changes that can be made with the instant ALTER TABLE feature as per 8.0.12 are:

  • Adding a new column as the last column in the table.
  • Adding a generated virtual column.
  • Dropping a generated virtual column.
  • Setting a default value for an existing column.
  • Dropping the default value for an existing column.
  • Changing the list of values allowed for a column with the ENUM or SET data types. A requirement is that the storage size does not change for the column.
  • Change whether the index type is set explicitly for an existing index.

And who knows, maybe later the feature can be extended to cover more changes. There are also a few limitations that are good to be aware of:

  • The row format cannot be COMPRESSED.
  • The table cannot have a fulltext index.
  • Tables in the data dictionary cannot use the instant algorithm.
  • Temporary tables are not supported.

How to Ensure You are Using the Expected Algorithm?

One problem with schema changes is that here are different algorithms depending on the schema change. Currently there are three different algorithms:

  • INSTANT: the change completes very quickly (yes not quite instantly) as only the metadata in the data dictionary needs to be updated.
  • INPLACE: the changes are made within the existing table, i.e. a complete table copy is avoided.
  • COPY: the table is copied into a new copy with the new definition.

By default, MySQL chooses the algorithm doing the least work. That is, INSTANT if that is supported, otherwise INPLACE if that is supported, and finally COPY. Additionally, there is the concept of locking which can be set to either NONE, SHARED, or EXCLUSIVE.

So, how do you ensure you are not ending up with a copying ALTER TABLE taking exclusive locks when you thought the operation was going to be instant? The answer is to explicitly set the algorithm and lock type. That way, MySQL will throw an error if your schema change is not compatible with the requested algorithm. The same principle can be used to force a full rebuild of the table if you for example have a small table and don’t see a reason to worry about instantly added columns (see also more later).

I will give some examples of specifying the ALGORITHM and LOCK options to ALTER TABLE later. However, we first need an example table to play with.


Always specify the ALGORITHM and LOCK options explicitly to avoid unpleasant surprises.

Creating an Example Table

For the purpose of the upcoming examples of instant schema changes, it is necessary to have a table to play with. The table will be the testtbl table in the my_schema schema. The table can be created and populated with sample data by using MySQL Shell – this allows us to take advantage of the support for scripting. The table will have one million rows.


To limit the amount of scrolling, the prompt in MySQL Shell has been changed to just show the mode (JS, PY, SQL).

JS> \connect root@localhost
Creating a session to 'root@localhost'
Please provide the password for 'root@localhost': ********
Save password for 'root@localhost'? [Y]es/[N]o/Ne[v]er (default No):
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 17 (X protocol)
Server version: 8.0.12 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

JS> \sql
Switching to SQL mode... Commands end with ;

Query OK, 1 row affected (0.0509 sec)

SQL> \use my_schema
Default schema set to `my_schema`.
Fetching table and column names from `my_schema` for auto-completion... Press ^C to stop.

 ...   id int unsigned NOT NULL auto_increment,
 ...   val varchar(36) NOT NULL,
 ...   PRIMARY KEY (id)
 ... ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.0497 sec)
SQL> \py
Switching to Python mode...

Py> \use my_schema
Default schema `my_schema` accessible through db.

Py> db.testtbl

Py> import uuid

Py> for i in range(1000):
...     session.start_transaction()
...     stmt = db.testtbl.insert("val")
...     for j in range(1000):
...         stmt = stmt.values(uuid.uuid1().hex)
...     stmt.execute()
...     session.commit()
Query OK, 0 rows affected (0.0029 sec)

Py> db.testtbl.select("COUNT(*)")
| COUNT(*) |
|  1000000 |
1 row in set (0.2476 sec)

Note how this takes advantage of the feature in MySQL Shell where the db object in JavaScript and Python mode has the tables as properties. Thus it is possible to refer to the testtbl table in the my_schema schema as db.testtbl after using \use my_schema to read the database objects. The data is inserted in 1000 batches of 1000 rows using a double loop.

Now it is time to play with schema changes.

Schema Changes Examples

For comparison, let’s first add a column using the old algorithms. First, add the column val2 using the copying algorithm – note this is not supported online, so a shared lock is taken:

 ...       ADD COLUMN val2 varchar(36) DEFAULT NULL,
Query OK, 1000000 rows affected (5.3952 sec)


The test is made on a laptop – the timings themselves are not relevant, but they can be used to compare the time taken for the three algorithms.

So that took around 5.4 seconds. That is not too bad, but we can do much better than that. Let’s try the in-place algorithm. Not only will that require less work (though still a fair bit), it is also possible to perform the change online.

 ...       ADD COLUMN val3 varchar(36) DEFAULT NULL,
Query OK, 0 rows affected (1.7511 sec)

So, this is a factor three faster despite the table having an extra column compared to the original table. And it is possible to execute DML statements. That is a big win. But let’s move on to the finale: instant ALTER TABLE:

 ...       ADD COLUMN val4 varchar(36) DEFAULT NULL,
Query OK, 0 rows affected (0.0490 sec)

Wow – that’s less than 5/100 of a second – that’s instant enough for me. Note hos the LOCK option is not specified here. The instant algorithm is always as lock free as possible, so the LOCK option is not allowed.

There is a little more to it than this. As mentioned by Bin Su, there are some trade offs such as the row size if not checked upfront. So, in some cases it can be useful to know whether a table contains instantly added columns. This can be seen through the innodb_tables view in the Information Schema on the data dictionary:

Py> schema = session.get_schema("information_schema")
Query OK, 1000 items affected (0.0091 sec)

Py> inno_tables = schema.get_table("INNODB_TABLES")
Query OK, 0 rows affected (0.0002 sec)

Py> inno_tables.select("NAME", "INSTANT_COLS").where("INSTANT_COLS > 0")
| NAME              | INSTANT_COLS |
| my_schema/testtbl |            4 |
1 row in set (0.0405 sec)

Now what is that? 4 columns? But we only added one of the columns using the instant algorithm. What the INSTANT_COLS column shows is how many columns existed before the first instant column was added. In the example the columns id, val, val2, and val3 existed, then val4 was added using the instant algorithm. For tables that have never had any columns added instantly, the value of INSTANT_COLS is 0.

Want to Read More?

This blog has just been an introduction to the new MySQL 8.0.12 feature of instant schema changes. I will recommend you to read Bin Xu’s blog as well as the documentation in the MySQL reference manual to fully understand the feature:


The new feature allowing instant schema changes is a great way to avoid time consuming operations. Particularly in replication setups where the replicated statement will block while it applies, the different between an online and an instant schema change is like night and day.

MySQL 8.0.12 can be downloaded from MySQL Community Downloads or you can install it through one of the “native installers” such as MySQL Installer for Microsoft Windows or one of our Linux repositories (Yum, APT, SUSE). If you are a customer and need the commercial version of MySQL Server, I will recommend you to download it from Patches & Updates in My Oracle Support (MOS); there is also a 30 days trial version available from Oracle Software Delivery Cloud.

Please do not hesitate to provide feedback. Bugs and feature requests can be logged in the MySQL Bugs database.

Which Character Set Should You Use in MySQL?

MySQL supports a long list of character sets, and one of the strong points are that you can use different character sets per column. This is very flexible, but which character set should you use?

If you do not want to read the whole blog the summary is: Use utf8mb4, particularly in MySQL 8.0 with one of the UCA 9.0.0 based collations.

Before I reach this conclusion, let’s start out looking at what a character set is.

What is a Character Set?

Characters are wonderful things that can take all kinds of shapes ranging from the familiar (for English speakers at least) Latin characters as those used to write this blog, over Arabic (مرحبا بالعالم) and Asian characters such as simplified Chinese (你好,世界) to emojis (❤️🐬). This is all cool, but computers do not deal in characters, they work in bits, so how are all of these wonderful characters represented?

The answer is that character sets are used to encode the characters into bits. Given the number of known characters (Unicode 11.0 contains 137374 characters). Just to cover the possibility of those characters, 18 bits are needed (so effectively 3 bytes). That is significant.

In the good old days when computers were new, it was mostly English that was used, and using 18 bits per character was just not feasible. A popular home computer in the 1980s was the Commodore 64 was an 8-bit computer with a total of 64kB of RAM – you just could not get away with using multiple bytes per character.

In the early days a popular character set was ASCII which uses 7 bits to encode each character. This allows for 128 unique characters which is ok for plain English but does not even allow for all of the Western European accented characters to be included. One answer to that are the Latin character sets, such as ISO 8859-1 (Latin 1) and Windows-1252. These use 8 bits and thus supports 256 characters for each character set. Together they allow complete coverage for the languages using the Latin letters, but you have to choose the right one for the language you are writing, and in all cases 8 bit does not help much if you need to write something in Chinese that has thousands of characters.

The current solution is to use Unicode characters. The Unicode Consortium defines code points for the characters included. As mentioned, in Unicode 11.0 there are just over 137000 characters. This covers most usages (yes, there are actually still characters missing). There is a bit more to it than that, however. One thing is the code points defined by the Unicode Consortium, but these still need bits representations for computers. There are several solutions to that with UTF-8 being the most commonly used nowadays.

UTF-8 is a variable width encoding that uses one to four bytes to represent a Unicode code point. Part of the success of UTF-8 is that the characters used in ASCII have the same encoding in UTF-8 – that is, if you have a document in ASCII, you can just say it is now in UTF-8 and all works well. That made it easy for the large number of ASCII based websites to migrate to use Unicode. The rest of the characters use one or more bytes. While the compatibility with ASCII has been great for adaptation and it helps keep the size of documents in English down, it also has some downsides. One is that some other languages uses more bytes that needed; another is that scanning a string is relatively expensive as you must decode each character to know where the next character begins.

Before moving on, let’s have a quick look at some actual encodings in different character sets, to see the difference (UCS2 is a two-byte fixed width Unicode character set):


The table shows how there clearly are some relations between these character sets, but also how only UTF-8 can represent all of the four test characters. So, from this comparison the strengths of UTF-8 is starting to show, but also the weaknesses. The character Å requires two bytes in UTF-8 (and UCS-2) but only one in Latin-1. The Chinese character 界 requires three bytes in UTF-8 but only two in UCS-2.

So, what does this mean in the context of MySQL? Let’s take a look.

MySQL Character Sets

Until MySQL 8.0 the default character set in MySQL was Latin-1 (named latin1). This was a convenient character set in many ways, for example it was fixed width, so finding the Nth character in a string was fast and it could store text for most Western European languages. However as discussed, Latin-1 is not what is used in this day and age – the World has moved on to UTF-8. So, in MySQL 8.0 the change was made to make utf8mb4 the default character set.

Stop a minute – what is utf8mb4? How does that differ from UTF-8 that was discussed in the previous section? Well, it is the same thing. Unfortunately, when UTF-8 was first implemented in MySQL, it was limited to three bytes (in MySQL 8.0 called utf8mb3). This predates my involvement with MySQL, but a guess for this choice is that it is related to internal temporary tables which in MySQL 5.7 and earlier uses the MEMORY storage engine when possible. Internal temporary tables are for example used to store the result of subquery and for sorting. The MEMORY storage engine only supports fix width columns, so a varchar(10) column would be treated as a char(10) column in an in-memory internal temporary table. With utf8mb4 that would mean 40 bytes, with the choice of a 3-byte implementation it would mean 30 bytes. Furthermore, until the emergence of emojis, it was rarely required to use more than three bytes in UTF-8.

Anyway, MySQL has support for a wide range of character sets to suite your specific need. In MySQL 8 there are a total of 41 character sets to choose from. The details of the character sets can be found in the information_schema.CHARACTER_SETS table:

mysql-sql> SELECT *
             FROM information_schema.CHARACTER_SETS
| armscii8           | armscii8_general_ci  | ARMSCII-8 Armenian              |      1 |
| ascii              | ascii_general_ci     | US ASCII                        |      1 |
| big5               | big5_chinese_ci      | Big5 Traditional Chinese        |      2 |
| binary             | binary               | Binary pseudo charset           |      1 |
| cp1250             | cp1250_general_ci    | Windows Central European        |      1 |
| cp1251             | cp1251_general_ci    | Windows Cyrillic                |      1 |
| cp1256             | cp1256_general_ci    | Windows Arabic                  |      1 |
| cp1257             | cp1257_general_ci    | Windows Baltic                  |      1 |
| cp850              | cp850_general_ci     | DOS West European               |      1 |
| cp852              | cp852_general_ci     | DOS Central European            |      1 |
| cp866              | cp866_general_ci     | DOS Russian                     |      1 |
| cp932              | cp932_japanese_ci    | SJIS for Windows Japanese       |      2 |
| dec8               | dec8_swedish_ci      | DEC West European               |      1 |
| eucjpms            | eucjpms_japanese_ci  | UJIS for Windows Japanese       |      3 |
| euckr              | euckr_korean_ci      | EUC-KR Korean                   |      2 |
| gb18030            | gb18030_chinese_ci   | China National Standard GB18030 |      4 |
| gb2312             | gb2312_chinese_ci    | GB2312 Simplified Chinese       |      2 |
| gbk                | gbk_chinese_ci       | GBK Simplified Chinese          |      2 |
| geostd8            | geostd8_general_ci   | GEOSTD8 Georgian                |      1 |
| greek              | greek_general_ci     | ISO 8859-7 Greek                |      1 |
| hebrew             | hebrew_general_ci    | ISO 8859-8 Hebrew               |      1 |
| hp8                | hp8_english_ci       | HP West European                |      1 |
| keybcs2            | keybcs2_general_ci   | DOS Kamenicky Czech-Slovak      |      1 |
| koi8r              | koi8r_general_ci     | KOI8-R Relcom Russian           |      1 |
| koi8u              | koi8u_general_ci     | KOI8-U Ukrainian                |      1 |
| latin1             | latin1_swedish_ci    | cp1252 West European            |      1 |
| latin2             | latin2_general_ci    | ISO 8859-2 Central European     |      1 |
| latin5             | latin5_turkish_ci    | ISO 8859-9 Turkish              |      1 |
| latin7             | latin7_general_ci    | ISO 8859-13 Baltic              |      1 |
| macce              | macce_general_ci     | Mac Central European            |      1 |
| macroman           | macroman_general_ci  | Mac West European               |      1 |
| sjis               | sjis_japanese_ci     | Shift-JIS Japanese              |      2 |
| swe7               | swe7_swedish_ci      | 7bit Swedish                    |      1 |
| tis620             | tis620_thai_ci       | TIS620 Thai                     |      1 |
| ucs2               | ucs2_general_ci      | UCS-2 Unicode                   |      2 |
| ujis               | ujis_japanese_ci     | EUC-JP Japanese                 |      3 |
| utf16              | utf16_general_ci     | UTF-16 Unicode                  |      4 |
| utf16le            | utf16le_general_ci   | UTF-16LE Unicode                |      4 |
| utf32              | utf32_general_ci     | UTF-32 Unicode                  |      4 |
| utf8               | utf8_general_ci      | UTF-8 Unicode                   |      3 |
| utf8mb4            | utf8mb4_0900_ai_ci   | UTF-8 Unicode                   |      4 |
41 rows in set (0.0010 sec)

Notice how there are several UTF and other Unicode character sets including utf8 and utf8m4. The utf8 character set is the 3-byte implementation. The name utf8 has now been deprecated in favour of utf8mb3 to make it specific that you are using an implementation that can at most use three bytes per character.

The table has four columns: The character set name, the default collation, a description, and the maximum number of bytes per character. The first and the two last of the columns are clear enough, but what is the collation? Let’s discuss that.

What is a Collation?

It is all well and good that you now know how to encode characters, but one of the main uses of databases is to compare data. Are two values the same or is one larger than the other? For numbers this is in general well defined (floating point comparisons and release numbers are examples where it is less clear), for example 10 is greater than 5.

However, for strings it is not that simple. Different languages have different rules for the order of the characters, and there are the questions whether an accented character should be considered the same as the base character and whether upper and lower case should be considered the same. An example is that in Danish and Norwegian, the character Ø comes before Å, but in Swedish Å comes before Ö (Ø and Ö are essentially the same letter).

The rules that define the order of the characters and whether to take accents and capitalization into account is called collations. There are many collations for the same character set. In MySQL 8.0 there are alone 73 collations to choose from for the utf8mb4 character set (more later about utf8mb4).

The collations that can be used with a given character set can be found in the information_schema.COLLATIONS table. For example, to see the 73 collations available for utf8mb4:

mysql-sql> > SELECT *
              FROM information_schema.COLLATIONS
             WHERE CHARACTER_SET_NAME = 'utf8mb4'
| utf8mb4_0900_ai_ci         | utf8mb4            | 255 | Yes        | Yes         |       0 | NO PAD        |
| utf8mb4_0900_as_ci         | utf8mb4            | 305 |            | Yes         |       0 | NO PAD        |
| utf8mb4_0900_as_cs         | utf8mb4            | 278 |            | Yes         |       0 | NO PAD        |
| utf8mb4_bin                | utf8mb4            |  46 |            | Yes         |       1 | PAD SPACE     |
| utf8mb4_croatian_ci        | utf8mb4            | 245 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_cs_0900_ai_ci      | utf8mb4            | 266 |            | Yes         |       0 | NO PAD        |
| utf8mb4_cs_0900_as_cs      | utf8mb4            | 289 |            | Yes         |       0 | NO PAD        |
| utf8mb4_czech_ci           | utf8mb4            | 234 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_danish_ci          | utf8mb4            | 235 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_da_0900_ai_ci      | utf8mb4            | 267 |            | Yes         |       0 | NO PAD        |
| utf8mb4_da_0900_as_cs      | utf8mb4            | 290 |            | Yes         |       0 | NO PAD        |
| utf8mb4_de_pb_0900_ai_ci   | utf8mb4            | 256 |            | Yes         |       0 | NO PAD        |
| utf8mb4_de_pb_0900_as_cs   | utf8mb4            | 279 |            | Yes         |       0 | NO PAD        |
| utf8mb4_eo_0900_ai_ci      | utf8mb4            | 273 |            | Yes         |       0 | NO PAD        |
| utf8mb4_eo_0900_as_cs      | utf8mb4            | 296 |            | Yes         |       0 | NO PAD        |
| utf8mb4_esperanto_ci       | utf8mb4            | 241 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_estonian_ci        | utf8mb4            | 230 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_es_0900_ai_ci      | utf8mb4            | 263 |            | Yes         |       0 | NO PAD        |
| utf8mb4_es_0900_as_cs      | utf8mb4            | 286 |            | Yes         |       0 | NO PAD        |
| utf8mb4_es_trad_0900_ai_ci | utf8mb4            | 270 |            | Yes         |       0 | NO PAD        |
| utf8mb4_es_trad_0900_as_cs | utf8mb4            | 293 |            | Yes         |       0 | NO PAD        |
| utf8mb4_et_0900_ai_ci      | utf8mb4            | 262 |            | Yes         |       0 | NO PAD        |
| utf8mb4_et_0900_as_cs      | utf8mb4            | 285 |            | Yes         |       0 | NO PAD        |
| utf8mb4_general_ci         | utf8mb4            |  45 |            | Yes         |       1 | PAD SPACE     |
| utf8mb4_german2_ci         | utf8mb4            | 244 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_hr_0900_ai_ci      | utf8mb4            | 275 |            | Yes         |       0 | NO PAD        |
| utf8mb4_hr_0900_as_cs      | utf8mb4            | 298 |            | Yes         |       0 | NO PAD        |
| utf8mb4_hungarian_ci       | utf8mb4            | 242 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_hu_0900_ai_ci      | utf8mb4            | 274 |            | Yes         |       0 | NO PAD        |
| utf8mb4_hu_0900_as_cs      | utf8mb4            | 297 |            | Yes         |       0 | NO PAD        |
| utf8mb4_icelandic_ci       | utf8mb4            | 225 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_is_0900_ai_ci      | utf8mb4            | 257 |            | Yes         |       0 | NO PAD        |
| utf8mb4_is_0900_as_cs      | utf8mb4            | 280 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ja_0900_as_cs      | utf8mb4            | 303 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ja_0900_as_cs_ks   | utf8mb4            | 304 |            | Yes         |      24 | NO PAD        |
| utf8mb4_latvian_ci         | utf8mb4            | 226 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_la_0900_ai_ci      | utf8mb4            | 271 |            | Yes         |       0 | NO PAD        |
| utf8mb4_la_0900_as_cs      | utf8mb4            | 294 |            | Yes         |       0 | NO PAD        |
| utf8mb4_lithuanian_ci      | utf8mb4            | 236 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_lt_0900_ai_ci      | utf8mb4            | 268 |            | Yes         |       0 | NO PAD        |
| utf8mb4_lt_0900_as_cs      | utf8mb4            | 291 |            | Yes         |       0 | NO PAD        |
| utf8mb4_lv_0900_ai_ci      | utf8mb4            | 258 |            | Yes         |       0 | NO PAD        |
| utf8mb4_lv_0900_as_cs      | utf8mb4            | 281 |            | Yes         |       0 | NO PAD        |
| utf8mb4_persian_ci         | utf8mb4            | 240 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_pl_0900_ai_ci      | utf8mb4            | 261 |            | Yes         |       0 | NO PAD        |
| utf8mb4_pl_0900_as_cs      | utf8mb4            | 284 |            | Yes         |       0 | NO PAD        |
| utf8mb4_polish_ci          | utf8mb4            | 229 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_romanian_ci        | utf8mb4            | 227 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_roman_ci           | utf8mb4            | 239 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_ro_0900_ai_ci      | utf8mb4            | 259 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ro_0900_as_cs      | utf8mb4            | 282 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ru_0900_ai_ci      | utf8mb4            | 306 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ru_0900_as_cs      | utf8mb4            | 307 |            | Yes         |       0 | NO PAD        |
| utf8mb4_sinhala_ci         | utf8mb4            | 243 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_sk_0900_ai_ci      | utf8mb4            | 269 |            | Yes         |       0 | NO PAD        |
| utf8mb4_sk_0900_as_cs      | utf8mb4            | 292 |            | Yes         |       0 | NO PAD        |
| utf8mb4_slovak_ci          | utf8mb4            | 237 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_slovenian_ci       | utf8mb4            | 228 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_sl_0900_ai_ci      | utf8mb4            | 260 |            | Yes         |       0 | NO PAD        |
| utf8mb4_sl_0900_as_cs      | utf8mb4            | 283 |            | Yes         |       0 | NO PAD        |
| utf8mb4_spanish2_ci        | utf8mb4            | 238 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_spanish_ci         | utf8mb4            | 231 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_sv_0900_ai_ci      | utf8mb4            | 264 |            | Yes         |       0 | NO PAD        |
| utf8mb4_sv_0900_as_cs      | utf8mb4            | 287 |            | Yes         |       0 | NO PAD        |
| utf8mb4_swedish_ci         | utf8mb4            | 232 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_tr_0900_ai_ci      | utf8mb4            | 265 |            | Yes         |       0 | NO PAD        |
| utf8mb4_tr_0900_as_cs      | utf8mb4            | 288 |            | Yes         |       0 | NO PAD        |
| utf8mb4_turkish_ci         | utf8mb4            | 233 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_unicode_520_ci     | utf8mb4            | 246 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_unicode_ci         | utf8mb4            | 224 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_vietnamese_ci      | utf8mb4            | 247 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_vi_0900_ai_ci      | utf8mb4            | 277 |            | Yes         |       0 | NO PAD        |
| utf8mb4_vi_0900_as_cs      | utf8mb4            | 300 |            | Yes         |       0 | NO PAD        |
73 rows in set (0.0023 sec)

The most interesting in this discussion is the name. As you can see there is a pattern to the names, and it is possible to identify up to four parts in the name

  • First the character set name.
  • Then which language it is for, for example ja for Japanese. A special “language” is binary which mean each byte is compared directly one by one.
  • Then whether it is a UCA 9.0.0 based collation. The UCA 9.0.0 based collations have 0900 in the name.
  • Then up to three modifiers depending on whether it is accent and/or case sensitive or insensitive.

There are a total of five accent and case modifiers:

  • ai: accent insensitive, ä is equal to a.
  • as: accent sensitive, ä is not equal to a.
  • ci: case insensitive, a is equal to A.
  • cs: case sensitive, a is not equal to A.
  • ks: kana sensitive (only for Japanese)

This brings us on to MySQL 8.0 and character sets and collations.

MySQL 8.0

Good job of making it this far. You are almost at the end now.

As part of the work to make utf8mb4 the default character set in MySQL 8.0, a large amount of work was put into making the MySQL work more efficiently with the UTF-8 character set and to make it more useful for the users. Some of the changes include:

  • The UCA 9.0.0 collations (with 0900 in the name) are new in MySQL 8.0. MySQL works faster with these new collations than the old collations.
  • In-memory internal temporary tables now use the new Temptable storage engine by default. It supports variable width columns making it much more efficient with UTF-8 than earlier versions of MySQL. From version 8.0.12 (just released) the Temptable storage engine also supports BLOBs.

Another thing to be aware of is that the new X DevAPI that for example allows NoSQL access to both the MySQL Document store with JSON documents and to SQL tables expect all query results to be UTF-8.

For these reasons it is recommended to use utf8mb4 with one of the new UCA 9.0.0 collations for most uses in MySQL 8.0. The default collation for utf8mb4 in MySQL 8.0 is utf8mb4_0900_ai_ci. This is a good collation for the general use cases, but feel free to use one of the more specific collations if that works better for your application. For example, the utf8mb4_ja_0900_as_cs or utf8mb4_ja_0900_as_cs_ks if you need correct Japanese comparisons (such as the fix for the infamous Sushi = Beer bug).

Character sets and collations are big topics. If you are interested to know more about MySQL 8.0 and character sets, there is a series of blogs on the MySQL Server Blog, for example:

Additionally, the MySQL reference manual has a chapter with 15 sections about character sets and collations: Chapter 10 Character Sets, Collations, Unicode.

Overview of the MySQL Server Architecture

Sometimes it can be useful to take a step back and look at the world from a bit larger distance than usual. So in this blog, I will take a look at the high level architecture of MySQL Server.


This is meant as a simplified overview and does not include all details.

Overview of the MySQL Server Architecture
Overview of the MySQL Server Architecture

For the discussion I will be referring to the the following figure that shows some of the features and plugins of MySQL. Orange boxes are available both for the community version and the commercial (enterprise) version, whereas red means the plugin is exclusive for the commercial version. Ellipsoid elements are plugins, whereas square (yes with round corners) elements indicate other features. The figure is based on MySQL 5.7.20 where the audit log can be used to block access to tables.
At the top there are the various connectors and APIs that can be used in the application to connect to MySQL. As the mysql command-line client uses the C API, this is included in this part of the figure as well. There are more connectors and APIs than I could fit into the figure; for the complete list see the documentation index.

The large darkish box represents the server. At the top there is the connection handling where there are two choices: use one thread per connection or use the commercial thread pool plugin. One thread per connection is the original way of handling new connections and as the name implies, it is simply creating a new thread for each new connection. The MySQL Enterprise Thread Pool will limit the overall number of threads by re-using the threads. This leads to better scalability particularly in the case of many connections executing short queries. After the initial thread handling, it is possible to send the connection through the optional Connection-Control plugin which will throttle the connections for an account, when there are multiple attempts to login with the wrong password.

Once the connection has been cleared to move ahead, the query is checked against the Query Cache if it is a SELECT statement, and the Query Cache is enabled. For most workloads the Query Cache will be an overhead that does not justify the potential gain. So, in practice it is recommended to completely disable it.


The Query Cache has been deprecated and is removed in MySQL 8.0.3 and later.

Next is the optional Query Rewrite plugin which allows the DBA to define which queries should be rewritten based on normalized queries like those in the Performance Schema digest summary table. For example, to change the query SELECT 130 to SELECT * FROM world.City WHERE ID = 130 – and the same for all other IDs, the following rule can be used:

INSERT INTO query_rewrite.rewrite_rules (pattern, replacement)
VALUES ('SELECT ?', 'SELECT * FROM world.City WHERE ID = ?');

CALL query_rewrite.flush_rewrite_rules();

This assumes the Query Rewrite plugin has already been installed.

It is not until now that the query actual reaches the parser (this will not happen if the query has a match in the Query Cache). After the parser has processed the query, it is known which schemas, tables, columns, etc. that will be required to answer the query, which means it is also possible to verify whether the user is allowed to use these objects.

After the parser, the MySQL Enterprise Firewall will be invoked if it is installed and enabled. The firewall compares the query’s digest to a whitelist of digests and if the query is in the whitelist. If the query is not in the whitelist it can be recorded or blocked (depending on configuration).

The last plugin is the MySQL Enterprise Audit plugin. In MySQL 5.7.20 and later, the audit log can be used to block access to a schema or a table in addition to log the access. This is the time where the plugin checks whether the access is allowed.

Finally the optimizer will determine the query plan and the query will be executed. The query execution will in almost all cases involve requesting data from or sending data to one of the storage engines. MySQL supports several storage engines through the pluggable storage engine architecture. Nowadays, the main storage engine is InnoDB, but several others exists including the NDBCluster storage engine that is used in MySQL NDB Cluster to execute queries through the SQL nodes (MySQL Server instances).

If You Want to Learn More

One nice way to investigate which order plugins and features are executed is to use the Performance Schema. For example to investigate the above features, enable the following instruments and consumers in the MySQL configuration file (or online executing UPDATE statements):

performance_schema_instrument                              = %=OFF
performance_schema_instrument                              = %alog%=ON
performance_schema_instrument                              = %firewall%=ON
performance_schema_instrument                              = %/Query_cache%=ON
performance_schema_instrument                              = %query cache%=ON
performance_schema_instrument                              = %rewriter%=ON
performance_schema_instrument                              = statement/%=ON
performance_schema_instrument                              = stage/%=ON
performance_schema_consumer_events_statements_history_long = ON
performance_schema_consumer_events_stages_current          = ON
performance_schema_consumer_events_stages_history_long     = ON
performance_schema_consumer_events_waits_current           = ON
performance_schema_consumer_events_waits_history_long      = ON

Then use two connections – one for monitoring and one for executing the query. Disable instrumentation for the monitoring connection:

mysql> CALL sys.ps_setup_disable_thread(CONNECTION_ID());
| summary           |
| Disabled 1 thread |
1 row in set (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

Determine the Performance Schema thread ID for the connection that will be executing queries:

         FROM performance_schema.threads
        WHERE PROCESSLIST_USER = 'testuser';
|        30 |
1 row in set (0.07 sec)

This assumes the two connections use different usernames. If that is not the case, then you will need to adjust the above query.

Optionally, to make it easier to find the events just for the test query, truncate all tables in the Performance Schema (this will lose all historical data, so is most appropriate for a test system):

mysql> CALL sys.ps_truncate_all_tables(false);
| summary             |
| Truncated 44 tables |
1 row in set (0.03 sec)

Query OK, 0 rows affected (0.03 sec)

Now execute the query – for example one that gets rewritten by the Query Rewriter plugin using the rule above:

mysql> SELECT 130;
| ID  | Name   | CountryCode | District        | Population |
| 130 | Sydney | AUS         | New South Wales |    3276207 |
1 row in set, 1 warning (0.08 sec)

| Level | Code | Message                                                                                             |
| Note  | 1105 | Query 'SELECT 130' rewritten to 'SELECT * FROM world.City WHERE ID = 130' by a query rewrite plugin |
1 row in set (0.00 sec)

Finally, the Performance Schema events can be retrieved:

mysql> (SELECT EVENT_ID, 'Statement' AS Type, EVENT_NAME FROM performance_schema.events_statements_history_long WHERE THREAD_ID = 30)
       (SELECT EVENT_ID, 'Stage' AS Type, EVENT_NAME FROM performance_schema.events_stages_history_long WHERE THREAD_ID = 30)
       (SELECT EVENT_ID, 'Wait' AS Type, EVENT_NAME FROM performance_schema.events_waits_history_long WHERE THREAD_ID = 30)
| EVENT_ID | Type      | EVENT_NAME                                             |
|        1 | Statement | statement/sql/select                                   |
|        2 | Stage     | stage/sql/starting                                     |
|        3 | Wait      | wait/synch/rwlock/rewriter/LOCK_plugin_rewriter_table_ |
|        4 | Wait      | wait/synch/rwlock/firewall/users_lock                  |
|        5 | Stage     | stage/sql/checking permissions                         |
|        6 | Stage     | stage/sql/Opening tables                               |
|        7 | Wait      | wait/synch/mutex/alog/LOCK_reopen                      |
|        8 | Wait      | wait/synch/mutex/alog/LOCK_block                       |
|        9 | Wait      | wait/synch/mutex/alog/LOCK_block                       |
|       10 | Stage     | stage/sql/init                                         |
|       11 | Stage     | stage/sql/System lock                                  |
|       12 | Stage     | stage/sql/optimizing                                   |
|       13 | Stage     | stage/sql/statistics                                   |
|       14 | Stage     | stage/sql/preparing                                    |
|       15 | Stage     | stage/sql/executing                                    |
|       16 | Stage     | stage/sql/Sending data                                 |
|       17 | Stage     | stage/sql/end                                          |
|       18 | Stage     | stage/sql/query end                                    |
|       19 | Stage     | stage/sql/closing tables                               |
|       20 | Stage     | stage/sql/freeing items                                |
|       21 | Stage     | stage/sql/cleaning up                                  |
|       22 | Statement | statement/sql/show_warnings                            |
|       23 | Stage     | stage/sql/starting                                     |
|       24 | Wait      | wait/synch/rwlock/rewriter/LOCK_plugin_rewriter_table_ |
|       25 | Wait      | wait/synch/rwlock/firewall/users_lock                  |
|       26 | Stage     | stage/sql/query end                                    |
|       27 | Stage     | stage/sql/closing tables                               |
|       28 | Stage     | stage/sql/freeing items                                |
|       29 | Stage     | stage/sql/cleaning up                                  |
29 rows in set (0.00 sec)

Have fun.

New Book: Pro MySQL NDB Cluster

It is with great pleasure, I can announce that a new book dedicated to MySQL NDB Cluster has just been released. The book Pro MySQL NDB Cluster is written by my colleague Mikiya Okuno and myself and is a nearly 700 pages deep dive into the world of MySQL NDB Cluster. The book is published by Apress.

Tip: There are several ways to cluster MySQL. This book is about the product MySQL Cluster (often called MySQL NDB Cluster to clarify which cluster it is). There is also MySQL InnoDB Cluster, clustering using replication, and clustering through operating or hardware features. Pro MySQL NDB Cluster is only about the former.

We very much hope you will enjoy the book. Feedback and questions are most welcome, for example on Twitter (@nippondanji and @JWKrogh).

Note: At the time of writing, only the eBook is available for purchase. A softcover version will follow as soon as it has been possible to print it; this can also be pre-ordered now. – Update: The softcover version of the book is now also available.

The book is divided into five parts and 20 chapters.

Part I – The Basics

The first part provides some background information on the various parts in MySQL NDB Cluster and how it works. The chapters are:

  • Chapter 1: Architecture and Core Concepts
  • Chapter 2: The Data Nodes

Part II – Installation and Configuration

The second part focuses on the installation and configuration related topics, including replication between clusters. The chapter are:

  • Chapter 3: System Planning
  • Chapter 4: Configuration
  • Chapter 5: Installation
  • Chapter 6: Replication

Part III – Daily Tasks and Maintenance

In the third part, the topics include tasks that is part of the daily routine as a database administrator plus a tutorial where the tasks discussed in parts II and III are handled through MySQL Cluster Manager. The chapters are:

  • Chapter 7: The NDB Management Client and Other NDB Utilities
  • Chapter 8: Backups and Restores
  • Chapter 9: Table Maintenance
  • Chapter 10: Restarts
  • Chapter 11: Upgrades and Downgrades
  • Chapter 12: Security Considerations
  • Chapter 13: MySQL Cluster Manager

Chapter IV – Monitoring and Troubleshooting

The fourth part continues with two topics that are also part of the daily routine: monitoring and troubleshooting. The chapters are:

  • Chapter 14: Monitoring Solutions and the Operating System
  • Chapter 15: Sources for Monitoring Data
  • Chapter 16: Monitoring MySQL NDB Cluster
  • Chapter 17: Typical Troubles and Solutions

Chapter V – Development and Performance Tuning

The final part covers topics that are related to development and getting the tuning the cluster with respect to performance. The chapters are:

  • Chapter 18: Developing Applications Using SQL with MySQL NDB Cluster
  • Chapter 19: MySQL NDB Cluster as a NoSQL Database
  • Chapter 20: MySQL NDB Cluster and Application Performance Tuning

InnoDB Locks Analysis: Why is Blocking Query NULL and How To Find More Information About the Blocking Transaction?

This post was originally published on the MySQL Support Team Blog at https://blogs.oracle.com/mysqlsupport/entry/innodb_locks_analysis_why_is on 14 April 2017.

Consider the scenario that you execute a query. You expect it to be fast – typically subsecond – but now it take not return until after 50 seconds (innodb_lock_wait_timeout seconds) and then it returns with an error:

mysql> UPDATE world.City SET Population = Population + 999 WHERE ID = 130;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

You continue to investigate the issue using the sys.innodb_lock_waits view or the underlying Information Schema tables (INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS).


The above Information Schema tables with lock and lock waits information have been moved to the Performance Schema in 8.0 as the data_locks and data_lock_waits tables. The sys schema view however works the same.

However, when you query the locks information, the blocking query is returned as NULL. What does that mean and how to proceed from that to get information about the blocking transaction?

Setting Up an Example

Before proceeding, lets set up an example which will be investigated later in the blog. The example can be set up as (do not disconnect Connection 1 when the queries have been executed):

Step 1 – Connection 1:

Query OK, 0 rows affected (0.00 sec)

Connection 1> UPDATE world.City SET Population = Population + 1 WHERE ID = 130;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Connection 1> UPDATE world.City SET Population = Population + 1 WHERE ID = 131;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Connection 1> UPDATE world.City SET Population = Population + 1 WHERE ID = 132;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Connection 1> UPDATE world.City SET Population = Population + 1 WHERE ID = 133;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Step 2 – Connection 2:

Connection 2> UPDATE world.City SET Population = Population + 999 WHERE ID = 130;

Step 3 – Connection 3:

The following output while Connection 2 is still blokcing from sys.innodb_lock_waits shows that the blocking query is NULL (slightly reformatted):

Connection 3> SELECT * FROM sys.innodb_lock_waits\G
*************************** 1. row ***************************
                wait_started: 2017-04-15 09:54:53
                    wait_age: 00:00:03
               wait_age_secs: 3
                locked_table: `world`.`City`
                locked_index: PRIMARY
                 locked_type: RECORD
              waiting_trx_id: 2827
         waiting_trx_started: 2017-04-15 09:54:53
             waiting_trx_age: 00:00:03
     waiting_trx_rows_locked: 1
   waiting_trx_rows_modified: 0
                 waiting_pid: 5
               waiting_query: UPDATE world.City SET Populati ... opulation + 999
                              WHERE ID = 130
             waiting_lock_id: 2827:24:6:41
           waiting_lock_mode: X
             blocking_trx_id: 2826
                blocking_pid: 3
              blocking_query: NULL
            blocking_lock_id: 2826:24:6:41
          blocking_lock_mode: X
        blocking_trx_started: 2017-04-15 09:54:46
            blocking_trx_age: 00:00:10
    blocking_trx_rows_locked: 4
  blocking_trx_rows_modified: 4
     sql_kill_blocking_query: KILL QUERY 3
sql_kill_blocking_connection: KILL 3
1 row in set, 3 warnings (0.00 sec)

Connection 3> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1681 Message: 'INFORMATION_SCHEMA.INNODB_LOCK_WAITS' is deprecated
         and will be removed in a future release.
*************************** 2. row ***************************
  Level: Warning
   Code: 1681 Message: 'INFORMATION_SCHEMA.INNODB_LOCKS' is deprecated and
         will be removed in a future release.
*************************** 3. row ***************************
  Level: Warning
   Code: 1681 Message: 'INFORMATION_SCHEMA.INNODB_LOCKS' is deprecated and
         will be removed in a future release. 3 rows in set (0.00 sec)

The warnings will only occur in the 5.7.14 and later as the InnoDB lock tables being moved to the Performance Schema in MySQL 8.0. It is recommended to use the sys.innodb_lock_waits view as that is updated accordingly in MySQL 8.0.

Investigating Idle Transactions

To investigate idle transactions, you need to use the Performance Schema to get this information. First determine the Performance Schema thread id for the blocking transaction. For this you need the blocking_pid, in the above example:

                blocking_pid: 3

and use this with the The threads Table table like:

Connection 3> SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID = 3;
|        28 |
1 row in set (0.00 sec)

For the following queries insert the thread id found above for the THREAD_ID = … where clauses.

To get the latest query executed, use the The events_statements_current Table table or the The session and x$session Views view:

                FROM performance_schema.events_statements_current
               WHERE THREAD_ID = 28;
| THREAD_ID | SQL_TEXT                                                         |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 133 |
1 row in set (0.00 sec)


Connection 3> SELECT * FROM sys.session WHERE thd_id = 28\G
*************************** 1. row ***************************
                thd_id: 28
               conn_id: 3
                  user: root@localhost
                    db: NULL
               command: Sleep
                 state: NULL
                  time: 447
     current_statement: NULL
     statement_latency: NULL
              progress: NULL
          lock_latency: 117.00 us
         rows_examined: 1
             rows_sent: 0
         rows_affected: 1
            tmp_tables: 0
       tmp_disk_tables: 0
             full_scan: NO
        last_statement: UPDATE world.City SET Population = Population + 1 WHERE ID = 133
last_statement_latency: 321.06 us
        current_memory: 0 bytes
             last_wait: NULL
     last_wait_latency: NULL
                source: NULL
           trx_latency: NULL
             trx_state: ACTIVE
        trx_autocommit: NO
                   pid: 7717
          program_name: mysql
1 row in set (0.08 sec)

In this case this does not explain why the lock is held as the last query update a different row then where the lock issue occurs. However if the events_statements_history consumer is enabled (it is by default in MySQL 5.7 and later), the The events_statements_history Table table will include the last 10 statements (by default) executed for the connection:

                FROM performance_schema.events_statements_history
               WHERE THREAD_ID = 28
               ORDER BY EVENT_ID;
| THREAD_ID | SQL_TEXT                                                         |
|        28 | SELECT DATABASE()                                                |
|        28 | NULL                                                             |
|        28 | show databases                                                   |
|        28 | show tables                                                      |
|        28 | START TRANSACTION                                                |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 130 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 131 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 132 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 133 |
9 rows in set (0.00 sec)

So now the history of the blocking transaction can be seen and it is possible to determine why the locking issue occur.


The history also includes some queries executed before the transaction started. These are not related to the locking issue.

If transaction monitoring is also enabled (only available in MySQL 5.7 and later), it is possible to get more information about the transaction and automatically limit the query of the history to the current transaction. Transaction monitoring is not enabled by default. To enable it, use:

mysql> UPDATE performance_schema.setup_consumers
          SET ENABLED = 'YES'
        WHERE NAME = 'events_transactions_current';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE performance_schema.setup_instruments
          SET ENABLED = 'YES'
        WHERE NAME = 'transaction';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0


This must be done before either of the transactions is started. Only transaction started after the transaction monitoring is enabled will be instrumented.

If the above was enabled before the blocking transaction started, you can get more details about the blocking transaction as:

Connection 3> SELECT *
                FROM performance_schema.events_transactions_current
               WHERE THREAD_ID = 28\G
*************************** 1. row ***************************
                      THREAD_ID: 28
                       EVENT_ID: 12
                   END_EVENT_ID: NULL
                     EVENT_NAME: transaction
                          STATE: ACTIVE
                         TRX_ID: NULL
                           GTID: AUTOMATIC
                  XID_FORMAT_ID: NULL
                      XID_GTRID: NULL
                      XID_BQUAL: NULL
                       XA_STATE: NULL
                         SOURCE: transaction.cc:209
                    TIMER_START: NULL
                      TIMER_END: NULL
                     TIMER_WAIT: NULL
                    ACCESS_MODE: READ WRITE
                     AUTOCOMMIT: NO
               NESTING_EVENT_ID: 11
1 row in set (0.00 sec)

And to get the statement history of the transaction:

Connection 3> SELECT t.THREAD_ID, s.SQL_TEXT
                FROM performance_schema.events_transactions_current t
                     INNER JOIN performance_schema.events_statements_history s
                             ON s.THREAD_ID = t.THREAD_ID
                            AND s.NESTING_EVENT_ID = t.EVENT_ID
               WHERE t.THREAD_ID = 28
               ORDER BY s.EVENT_ID;
| THREAD_ID | SQL_TEXT                                                         |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 130 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 131 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 132 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 133 |
4 rows in set (0.00 sec)

Working Around MySQL Cluster Push Down Limitations Using Subqueries

This post was originally published on the MySQL Support Team Blog at https://blogs.oracle.com/mysqlsupport/entry/working_around_mysql_cluster_push on 5 August 2016.

I worked on an issue recently where a query was too slow when executed in MySQL Cluster. The issue was that Cluster has some restrictions when it comes to push down conditions.

As an example of this, consider the following query using the employees sample database. The query takes a look at the average salary based on how many years the employee has been with the company. As the latest hire date in the database is in January 2000, the query uses 1 February 2000 as the reference date.

Initially the query performs like (performance is with two data nodes and all nodes in the same virtual machine on a laptop, so the timings are not necessarily representative of a production system, though the improvements should be repeatable):

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
| LengthOfService | NoEmployees | AvgSalary  |
|               0 |        1230 | 53051.4211 |
|               1 |        3667 | 54488.7478 |
|               2 |        5893 | 55670.2121 |
|               3 |        8305 | 57162.7269 |
|               4 |       10647 | 58557.9498 |
|               5 |       12804 | 60132.0652 |
|               6 |       15267 | 61769.0817 |
|               7 |       17540 | 62992.4075 |
|               8 |       19290 | 64468.6666 |
|               9 |       21737 | 66047.8462 |
|              10 |       24056 | 67671.1557 |
|              11 |       26488 | 68748.2062 |
|              12 |       28294 | 70409.9716 |
|              13 |       30249 | 72006.3509 |
|              14 |       32572 | 73478.7101 |
|              15 |         297 | 74331.7798 |
16 rows in set (23.89 sec)

The straight join is needed as the performance is better than leaving the join order up to the optimizer.

The schema for the two tables in use is:

mysql> SHOW CREATE TABLE employees\G
*************************** 1. row ***************************
       Table: employees
Create Table: CREATE TABLE `employees` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`),
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Why this poor performance? Looking at the EXPLAIN plan is the first step:

mysql> EXPLAIN EXTENDED SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService, COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary FROM salaries STRAIGHT_JOIN employees ON employees.emp_no = salaries.emp_no WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01' GROUP BY LengthOfService\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
         type: ALL
possible_keys: PRIMARY,emp_no
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2844047
     filtered: 100.00
        Extra: Using where with pushed condition ((`employees`.`salaries`.`from_date` <= '2000-02-01') and (`employees`.`salaries`.`to_date` >= '2000-02-01')); Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: employees.salaries.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set, 3 warnings (0.00 sec)

The EXPLAIN plan itself does not look bad – the index usage is as expected. However note the 3 warnings – one is the usual rewritten query after the application of rewriting and optimizer rules, but the two other gives more information why the performance is not what would be expected:

*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: Table 'salaries' is not pushable: GROUP BY cannot be done using index on grouped columns.
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: Table 'employees' is not pushable: GROUP BY cannot be done using index on grouped columns.
*************************** 3. row ***************************

Here it can be seen that the tables are not pushable, because they are involved in the GROUP BY.

This gave me an idea. A couple of years ago, I wrote a post on the MySQL Server Blog about MySQL Server Blog: Better Performance for JOINs Not Using Indexes. These materialized temporary tables can also include auto keys that can improve the join performance significantly. What if a similar tactics is used with the above query?

Changing the join on the employees table to join a subquery:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN (SELECT * FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
16 rows in set (6.64 sec)

That is more than a factor 3 improvement.

The new EXPLAIN plan is:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: salaries
         type: ALL
possible_keys: PRIMARY,emp_no
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2844047
        Extra: Using where with pushed condition; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ref
possible_keys: <auto_key0>
          key: <auto_key0>
      key_len: 4
          ref: employees.salaries.emp_no
         rows: 10
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: employees
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 300024
        Extra: NULL
3 rows in set (0.01 sec)

Note how <auto_key0> is used for the join on the derived table. This is what helps make the rewrite work.

Actually we can do a bit better. The above subquery selects all columns from the employees table even though only the emp_no and hire_date columns are used. So the next rewrite is:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN (SELECT emp_no, hire_date FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
16 rows in set (5.74 sec)

The improvement of only including the columns needed will vary. For example if the internal temporary table ends up being converted into an on-disk table because of the extra data, or a covering index no longer can be used can increase the importance of only choosing the columns needed.

A final slight improvement can be gained by also converting the salaries table into a subquery:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM (SELECT emp_no, salary
                 FROM salaries
                WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
              ) AS salaries
              STRAIGHT_JOIN (SELECT emp_no, hire_date FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        GROUP BY LengthOfService;
16 rows in set (5.36 sec)

The total speed up is from a little under 24 seconds to a little over 5 second or more than a factor 4.

Note: the above rewrite will not work out of the box in MySQL Cluster 7.5. The reason is that it is based on MySQL Server 5.7 where the optimizer can perform more advanced rewrites than it was possible on MySQL 5.6. So by default the above subqueries will be rewritten to normal joins and you end up with the original query again. To use the tactics of subqueries in Cluster 7.5, it is necessary first to disable the optimizer switch allowing the optimizer to rewrite the subqueries to normal joins:

mysql> SET optimizer_switch='derived_merge=off';

Who Holds the Metadata Lock? MySQL 5.7.3 Brings Help

In MySQL 5.5 metadata locks were introduced to ensure consistency for some operations that previously could cause problems. For example if a transaction SELECTed from a table and another connection changed the structure of the table before the transaction was committed; in that case the two operations would be written in reverse order to the binary log which could prevent replaying the binary log.

However there was no way to see who held metadata locks. SHOW PROCESSLIST would show who were waiting for metadata locks, and often you could guess who held it, but it wasn’t so always. Consider for example the following:

mysql> SELECT Id, db, Command, Time, State, Info FROM information_schema.PROCESSLIST;
| Id   | db                 | Command | Time | State                           | Info                                                                          |
|    1 | NULL               | Daemon  |    8 | Waiting for next activation     | NULL                                                                          |
|    2 | performance_schema | Query   |    0 | executing                       | SELECT Id, db, Command, Time, State, Info FROM information_schema.PROCESSLIST |
|    3 | world              | Sleep   |   39 |                                 | NULL                                                                          |
|    4 | world              | Query   |   35 | Waiting for table metadata lock | ALTER TABLE City COMMENT='Cities'                                             |
| 1111 | world              | Sleep   |   32 |                                 | NULL                                                                          |
| 1108 | world              | Sleep   |   47 |                                 | NULL                                                                          |
| 1113 | world              | Sleep   |   18 |                                 | NULL                                                                          |
| 1112 | world              | Connect |   23 | Waiting for table metadata lock | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| 1109 | world              | Sleep   |   44 |                                 | NULL                                                                          |
| 1114 | world              | Connect |    8 | Waiting for table metadata lock | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
10 rows in set (0.00 sec)

In this case it happens to be the process with Id = 3 that holds the metadata lock, but that is not obvious from the above output.

In MySQL 5.7.3 this has changed. There is a new Performance Schema table metadata_locks which will tell which metadata locks are held and which are pending. This can be used to answer the question for a case like the one above.

First instrumentation of metadata locks should be enabled. To do that you must enable the wait/lock/metadata/sql/mdl instrument in setup_instruments. Additionally the global_instrumentation consumer must be enabled in setup_consumers.

Currently the wait/lock/metadata/sql/mdl instrument is not enabled by default. I have created a feature request to consider enabling it by default, but obviously whether that will happen also depends on the performance impact. Update: The instrument is enabled by default in MySQL 8.0.

To enable metadata lock instrumentation:

mysql> UPDATE setup_consumers SET ENABLED = 'YES' WHERE NAME = 'global_instrumentation';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0


The global_instrumentation consumer is enabled by default.

With the metadata lock instrumentation enabled, it is now easy to answer who holds the metadata lock (I’ve excluded the connections own metalocks here as I’m only interested in the metadata lock contention going on between other queries):

    ->   FROM performance_schema.metadata_locks
    ->        INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | LOCK_TYPE           | LOCK_STATUS | THREAD_ID | PROCESSLIST_ID | PROCESSLIST_INFO                                                              |
| TABLE       | world         | City        | SHARED_READ         | GRANTED     |        22 |              3 | NULL                                                                          |
| GLOBAL      | NULL          | NULL        | INTENTION_EXCLUSIVE | GRANTED     |        23 |              4 | ALTER TABLE City COMMENT='Cities'                                             |
| SCHEMA      | world         | NULL        | INTENTION_EXCLUSIVE | GRANTED     |        23 |              4 | ALTER TABLE City COMMENT='Cities'                                             |
| TABLE       | world         | City        | SHARED_UPGRADABLE   | GRANTED     |        23 |              4 | ALTER TABLE City COMMENT='Cities'                                             |
| TABLE       | world         | City        | EXCLUSIVE           | PENDING     |        23 |              4 | ALTER TABLE City COMMENT='Cities'                                             |
| TABLE       | world         | view_city   | SHARED_READ         | GRANTED     |      1185 |           1166 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | City        | SHARED_READ         | PENDING     |      1185 |           1166 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | view_city   | SHARED_READ         | GRANTED     |      1186 |           1167 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | City        | SHARED_READ         | PENDING     |      1186 |           1167 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | view_city   | SHARED_READ         | GRANTED     |      1187 |           1168 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | City        | SHARED_READ         | PENDING     |      1187 |           1168 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | view_city   | SHARED_READ         | GRANTED     |      1188 |           1169 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | City        | SHARED_READ         | PENDING     |      1188 |           1169 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | view_city   | SHARED_READ         | GRANTED     |      1189 |           1170 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | City        | SHARED_READ         | PENDING     |      1189 |           1170 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | view_city   | SHARED_READ         | GRANTED     |      1190 |           1171 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | City        | SHARED_READ         | PENDING     |      1190 |           1171 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | view_city   | SHARED_READ         | GRANTED     |      1191 |           1172 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
| TABLE       | world         | City        | SHARED_READ         | PENDING     |      1191 |           1172 | SELECT COUNT(*) INTO @count FROM view_city WHERE CountryCode = in_CountryCode |
19 rows in set (0.00 sec)

So in this case there is a metadata lock GRANTED to process list id 3 whereas the other connections have a PENDING lock status for the metadata lock for the City table.

MySQL Connect 2013: ps_tools

During my talk and hands on labs at MySQL Connect 2013 I used several views, functions, and procedures from ps_helper and ps_tools.

The slides from the two talks can be downloaded from Thanks For Attending MySQL Connect or from the MySQL Connect 2013 Content Catalogue.

You can read more about ps_helper on Mark Leith’s blog and you can download ps_helper from github.

To get ps_tools, download ps_tools.tgz from here. Once unpacked there is a ps_tools_5x.sql for each of the versions supported. The tools presented at MySQL Connect were all based on MySQL 5.6 and 5.7. Note that several of the included tools are not particularly useful on their own but are more meant as utility functions for some of the other tools. The actual implementations are organised so they are in the subdirectory corresponding to the earliest version it applies to. For a few tools, such as ps_reset_setup(), there are multiple versions depending on the MySQL version it applies to.

Several of the tools have a help text at the top of the source file.

The main tools are:

Function ps_thread_id():

Returns the thread id for a given connection. Specify NULL to get the thread id for the current connection. For example:

mysql> SELECT ps_tools.ps_thread_id(8063);
| ps_tools.ps_thread_id(8063) |
|                        8113 |
1 row in set (0.00 sec)

View ps_setup_consumers:

Similar to performance_schema.setup_consumers, but includes a column to display whether the consumer is effectively enabled. See also slide 10 from CON 5282.

Function substr_count():

See also A Couple of Substring Functions: substr_count() and substr_by_delim().

Function substr_by_delim():

See also A Couple of Substring Functions: substr_count() and substr_by_delim().

Function ps_account_enabled():

Check whether a given account is enabled according to performance_schema.setup_actors. For example:

mysql> SELECT ps_tools.ps_account_enabled('localhost', 'root');
| ps_tools.ps_account_enabled('localhost', 'root') |
| YES                                              |
1 row in set (0.03 sec)

View ps_accounts_enabled:

Lists each account and whether it is enabled according to performance_schema.setup_actors.

Procedure ps_setup_tree_instruments():

Creates a tree displaying whether instruments are enabled. See Easier Overview of Current Performance Schema Setting for more details.

Procedure ps_setup_tree_actors_by_host():

Creates a tree displaying whether instruments are enabled. See Easier Overview of Current Performance Schema Setting for more details.

Procedure ps_setup_tree_actors_by_user():

Creates a tree displaying whether instruments are enabled. See Easier Overview of Current Performance Schema Setting for more details.

Procedure ps_setup_tree_consumers():

Creates a tree displaying whether instruments are enabled. See Easier Overview of Current Performance Schema Setting and slide 11 from CON 5282 for more details.

Procedure ps_dump_thread_stack():

This is the same as in ps_helper with one bug fix. See also slides 30-31 from CON 5282.

Procedure ps_enable_all():

Enable all consumers and instruments.

Procedure ps_reset_setup():

Reset consumers, instruments, actors, and threads to the defaults (without taking my.cnf into consideration).

View is_innodb_lock_waits:

Show information about waiting locks for InnoDB.

View is_engine_data:

Summary of the amount of data and indexes grouped by storage engine.

View ps_memory_by_thread_by_event_raw:

The amount of memory used grouped by thread and event without any formatting and ordering.

View ps_memory_current_by_thread_by_event:

The same as above but formatted output and ordered by current usage.

View ps_memory_high_by_thread_by_event:

The same as above but formatted output and ordered by the high watermark.

Procedure schema_unused_routines():

Lists unused stored procedures and functions.