Slides and Workbooks From Oracle OpenWorld & CodeOne

First of all, thanks to everyone who attended my sessions at the recent Oracle OpenWorld and Code One in San Francisco. It was a great privilege to be allowed to make a number of presentations.

All of the workbooks and scripts from the hands-on labs (HOL) and the slides from the talks have been made available at OpenWorld’s and Code One’s session catalogues. You can download the files by using the OpenWorld catalog searching for my name or the session number (see below). Click on the download icon for each of the presentations you are interested in.

Click on the download link in the Oracle OpenWorld session catalog to download the presentations.
Click on the download icon in the Oracle OpenWorld session catalog to download the presentations.

For the hands-on labs the downloadable file includes the workbook as well as the scripts related to the exercises. The workbook contains the instructions for setting up the system used for the exercises as well as the exercises themselves and some additionaly background information. For the talks, the download consists of a PDF version of the slides.

The three hands-on labs and three talks I had were:

  • DEV5957: Develop Python Applications with MySQL Connector/Python
    This covered MySQL Connector/Python in general from installation to best practices. The talk focused on using the connector with SQL tables using both SQL statements and NoSQL CRUD methods. If you are interested in how I installed MySQL Connector/Python on iPad (the screen shot on in the right hand side of slide showing the pip command), see my previous blog about installing MySQL Connector/Python in Pythonista 3.
  • DEV5959: Python and the MySQL Document Store
    This covered how to use MySQL Connector/Python (and a little of MySQL Shell in Python mode) with the MySQL JSON Document Store using the NoSQL API (the X DevAPI).
  • HOL1703: A Practical Introduction to the MySQL Document Store
    This hands-on lab used the MySQL Shell in Python mode to use the MySQL Document Store including an introduction to the CRUD methods. The lab also includes a comparison of implementing the same X DevAPI program using Python, Node.js, and Java.
  • HOL1706: Developing Modern Applications with the MySQL Document Store and NodeJS
    This lab is similar to HOL1703 except it mainly uses Node.js programs to use the MySQL Document Store.
  • HOL2986: Using MySQL Common Table Expressions and Window Functions
    An introduction to the new MySQL 8.0 query features: common table expressions (CTEs) and the window functions that can be used for analytic queries.
  • THT6703: NoSQL + SQL = MySQL
    A brief introduction to MySQL, MySQL 8, and how you can use it both as a NoSQL document store and a traditional SQL database.

Enjoy.

MySQL Server 8.0.13: Thanks for the 10 Facebook and Community Contributions

MySQL 8.0.13 was released this week. There are several exciting changes including functional indexes and using general expressions as the default value for your columns. So, I will recommend you to get MySQL 8.0.13 installed and try out the new features. You can read about changed in the release notes section of the MySQL documentation and in Geir’s release blog.

Thank you for the contributions to MySQL 8.0.13

However, what I would like to focus on in this blog is the external contributions that has been included in this release. There are five patches contributed by Facebook as well as five contributions from other MySQL users.

The patches contributed by Facebook are:

  • The MySQL client library now returns better error messages for OpenSSL errors. (Bug #27855668, Bug #90418)
  • The optimizer now supports a Skip Scan access method that enables range access to be used in previously inapplicable situations to improve query performance. For more information, see Skip Scan Range Access Method. (Bug #26976512, Bug #88103)
  • A new Performance Schema stage, waiting for handler commit, is available to detect threads going through transaction commit. (Bug #27855592, Bug #90417)
  • For mysqldump --tables output, file names now always include a .txt or .sql suffix, even for file names that already contain a dot. Thanks to Facebook for the contribution. (Bug #28380961, Bug #91745)
  • Failure to create a temporary table during a MyISAM query could cause a server exit. Thanks to Facebook for the patch. (Bug #27724519, Bug #90145)

Other contributions are:

  • Previously, file I/O performed in the I/O cache in the mysys library was not instrumented, affecting in particular file I/O statistics reported by the Performance Schema about the binary log index file. Now, this I/O is instrumented and Performance Schema statistics are accurate. Thanks to Yura Sorokin for the contribution. (Bug #27788907, Bug #90264)
  • Performance for locating user account entries in the in-memory privilege structures has been improved. Thanks to Eric Herman for the contribution. (Bug #27772506, Bug #90244)
  • InnoDB: A helper class was introduced to improve performance associated with reading from secondary keys when there are multiple versions of the same row. Thanks to Domas Mituzas for the contribution. (Bug #25540277, Bug #84958)
  • Replication: When the binlog_group_commit_sync_delay system variable is set to a wait time to delay synchronization of transactions to disk, and the binlog_group_commit_sync_no_delay_count system variable is also set to a number of transactions, the MySQL server exits the wait procedure if the specified number of transactions is reached before the specified wait time is reached. The server manages this process by checking on the transaction count after a delta of one tenth of the time specified by binlog_group_commit_sync_delay has elapsed, then subtracting that interval from the remaining wait time. If rounding during calculation of the delta meant that the wait time was not a multiple of the delta, the final subtraction of the delta from the remaining wait time would cause the value to be negative, and therefore to wrap to the maximum wait time, making the commit hang. The data type for the remaining wait time has now been changed so that the value does not wrap in this situation, and the commit can proceed when the original wait time has elapsed. Thanks to Yan Huang for the contribution. (Bug #28091735, Bug #91055)
  • Replication: In code for replication slave reporting, a rare error situation raised an assertion in debug builds, but in release builds, returned leaving a mutex locked. The mutex is now unlocked before returning in this situation. Thanks to Zsolt Parragi for the patch. (Bug #27448019, Bug #89421)

Thanks a lot for the contributions.

MySQL 8: Performance Schema Digests Improvements

Since MySQL 5.6, the digest feature of the MySQL Performance Schema has provided a convenient and effective way to obtain statistics of queries based on their normalized form. The feature works so well that it has almost completely (from my experience) replaced the connector extensions and proxy for collecting query statistics for the Query Analyzer (Quan) in MySQL Enterprise Monitor (MEM).

MySQL 8 adds further improvements to the digest feature in the Performance Schema including a sample query with statistics for each digest, percentile information, and a histogram summary. This blog will explore these new features.

The MySQL Enterprise Monitor Query Analyzer
MySQL Enterprise Monitor is one of the main users of the Performance Schema digests for its Query Analyzer.

Let’s start out looking at the the good old summary by digest table.

Query Sample

The base table for digest summary information is the events_statements_summary_by_digest table. This has been around since MySQL 5.6. In MySQL 8.0 it has been extended with six columns of which three have data related to a sample query will be examined in this section.

The three sample columns are:

  • QUERY_SAMPLE_TEXT: An actual example of a query.
  • QUERY_SAMPLE_SEEN: When the sample query was seen.
  • QUERY_SAMPLE_TIMER_WAIT: How long time the sample query took to execute (in picoseconds).

As an example consider the query SELECT * FROM world.city WHERE id = <value>. The sample information for that query as well as the digest and digest text (normalized query) may look like:

mysql> SELECT DIGEST, DIGEST_TEXT, QUERY_SAMPLE_TEXT, QUERY_SAMPLE_SEEN,
              sys.format_time(QUERY_SAMPLE_TIMER_WAIT) AS SampleTimerWait
         FROM performance_schema.events_statements_summary_by_digest
        WHERE DIGEST_TEXT LIKE '%`world` . `city`%'\G
*************************** 1. row ***************************
           DIGEST: 9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366
      DIGEST_TEXT: SELECT * FROM `world` . `city` WHERE `ID` = ?
QUERY_SAMPLE_TEXT: SELECT * FROM world.city WHERE ID = 130
QUERY_SAMPLE_SEEN: 2018-10-09 17:19:20.500944
  SampleTimerWait: 17.34 ms
1 row in set (0.00 sec)

There are a few things to note here:

  • The digest in MySQL 8 is a sha256 hash whereas in 5.6 and 5.7 it was an md5 hash.
  • The digest text is similar to the normalized query that the mysqldumpslow script can generate for queries in the slow query log; just that the Performance Schema uses a question mark as a placeholder.
  • The QUERY_SAMPLE_SEEN value is in the system time zone.
  • The sys.format_time() function is in the query used to convert the picoseconds to a human readable value.

The maximum length of the sample text is set with the performance_schema_max_sql_text_length option. The default is 1024 bytes. It is the same option that is used for the SQL_TEXT columns in the statement events tables. It requires a restart of MySQL to change the value. Since the query texts are stored in several contexts and some of the Performance Schema tables can have thousands of rows, do take care not to increase it beyond what you have memory for.

How is the sample query chosen? The sample is the slowest example of a query with the given digest. If the performance_schema_max_digest_sample_age option is set to a non-zero value (the default is 60 seconds) and the existing sample is older than the specified value, it will always be replaced.

The events_statements_summary_by_digest also has another set of new columns: percentile information.

Percentile Information

Since the beginning, the events_statements_summary_by_digest table has included some statistical information about the query times for a given digest: the minimum, average, maximum, and total query time. In MySQL 8 this has been extended to include information about the 95th, 99th, and 99.9th percentile. The information is available in the QUANTILE_95, QUANTILE_99, and QUANTILE_999 column respectively. All of the values are in picoseconds.

What does the new columns mean? Based on the histogram information of the query (see the next section), MySQL calculates a high estimate of the query time. For a given digest, 95% of the executed queries are expected to be faster than the query time given by QUANTILE_95. Similar for the two other columns.

As an example consider the same digest as before:

mysql> SELECT DIGEST, DIGEST_TEXT, QUERY_SAMPLE_TEXT,
              sys.format_time(SUM_TIMER_WAIT) AS SumTimerWait,
              sys.format_time(MIN_TIMER_WAIT) AS MinTimerWait,
              sys.format_time(AVG_TIMER_WAIT) AS AvgTimerWait,
              sys.format_time(MAX_TIMER_WAIT) AS MaxTimerWait,
              sys.format_time(QUANTILE_95) AS Quantile95,
              sys.format_time(QUANTILE_99) AS Quantile99,
              sys.format_time(QUANTILE_999) AS Quantile999
         FROM performance_schema.events_statements_summary_by_digest
        WHERE DIGEST_TEXT LIKE '%`world` . `city`%'\G
*************************** 1. row ***************************
           DIGEST: 9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366
      DIGEST_TEXT: SELECT * FROM `world` . `city` WHERE `ID` = ?
QUERY_SAMPLE_TEXT: SELECT * FROM world.city WHERE ID = 130
     SumTimerWait: 692.77 ms
     MinTimerWait: 50.32 us
     AvgTimerWait: 68.92 us
     MaxTimerWait: 17.34 ms
       Quantile95: 104.71 us
       Quantile99: 165.96 us
      Quantile999: 363.08 us
1 row in set (0.00 sec)

Having the 95th, 99th, and 99.9th percentile helps predict the performance of a query and show the spread of the query times. Even more information about the spread can be found using the new family member: histograms.

Histograms

Histograms is a way to put the query execution times into buckets, so it is possible to see how the query execution times spread. This can for example be useful to see how evenly the query time is. The average query time may be fine, but if that is based on some queries executing super fast and others very slow, it will still result in unhappy users and customers.

The MAX_TIMER_WAIT column of the events_statements_summary_by_digest table discussed this far shows the high watermark, but it does not say whether it is a single outlier or a result of general varying query times. The histograms give the answer to this.

Using the query digest from earlier in the blog, the histogram information for the query can be found in the events_statements_histogram_by_digest table like:

mysql> SELECT BUCKET_NUMBER,
              sys.format_time(BUCKET_TIMER_LOW) AS TimerLow,
              sys.format_time(BUCKET_TIMER_HIGH) AS TimerHigh,
              COUNT_BUCKET, COUNT_BUCKET_AND_LOWER, BUCKET_QUANTILE
         FROM performance_schema.events_statements_histogram_by_digest
        WHERE DIGEST = '9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366'
              AND COUNT_BUCKET > 0
        ORDER BY BUCKET_NUMBER;
+---------------+-----------+-----------+--------------+------------------------+-----------------+
| BUCKET_NUMBER | TimerLow  | TimerHigh | COUNT_BUCKET | COUNT_BUCKET_AND_LOWER | BUCKET_QUANTILE |
+---------------+-----------+-----------+--------------+------------------------+-----------------+
|            36 | 50.12 us  | 52.48 us  |          524 |                    524 |        0.052400 |
|            37 | 52.48 us  | 54.95 us  |         2641 |                   3165 |        0.316500 |
|            38 | 54.95 us  | 57.54 us  |          310 |                   3475 |        0.347500 |
|            39 | 57.54 us  | 60.26 us  |          105 |                   3580 |        0.358000 |
|            40 | 60.26 us  | 63.10 us  |           48 |                   3628 |        0.362800 |
|            41 | 63.10 us  | 66.07 us  |         3694 |                   7322 |        0.732200 |
|            42 | 66.07 us  | 69.18 us  |          611 |                   7933 |        0.793300 |
|            43 | 69.18 us  | 72.44 us  |          236 |                   8169 |        0.816900 |
|            44 | 72.44 us  | 75.86 us  |          207 |                   8376 |        0.837600 |
|            45 | 75.86 us  | 79.43 us  |          177 |                   8553 |        0.855300 |
|            46 | 79.43 us  | 83.18 us  |          236 |                   8789 |        0.878900 |
|            47 | 83.18 us  | 87.10 us  |          186 |                   8975 |        0.897500 |
|            48 | 87.10 us  | 91.20 us  |          203 |                   9178 |        0.917800 |
|            49 | 91.20 us  | 95.50 us  |          116 |                   9294 |        0.929400 |
|            50 | 95.50 us  | 100.00 us |          135 |                   9429 |        0.942900 |
|            51 | 100.00 us | 104.71 us |          105 |                   9534 |        0.953400 |
|            52 | 104.71 us | 109.65 us |           65 |                   9599 |        0.959900 |
|            53 | 109.65 us | 114.82 us |           65 |                   9664 |        0.966400 |
|            54 | 114.82 us | 120.23 us |           59 |                   9723 |        0.972300 |
|            55 | 120.23 us | 125.89 us |           40 |                   9763 |        0.976300 |
|            56 | 125.89 us | 131.83 us |           34 |                   9797 |        0.979700 |
|            57 | 131.83 us | 138.04 us |           33 |                   9830 |        0.983000 |
|            58 | 138.04 us | 144.54 us |           27 |                   9857 |        0.985700 |
|            59 | 144.54 us | 151.36 us |           16 |                   9873 |        0.987300 |
|            60 | 151.36 us | 158.49 us |           25 |                   9898 |        0.989800 |
|            61 | 158.49 us | 165.96 us |           20 |                   9918 |        0.991800 |
|            62 | 165.96 us | 173.78 us |            9 |                   9927 |        0.992700 |
|            63 | 173.78 us | 181.97 us |           11 |                   9938 |        0.993800 |
|            64 | 181.97 us | 190.55 us |           11 |                   9949 |        0.994900 |
|            65 | 190.55 us | 199.53 us |            4 |                   9953 |        0.995300 |
|            66 | 199.53 us | 208.93 us |            6 |                   9959 |        0.995900 |
|            67 | 208.93 us | 218.78 us |            6 |                   9965 |        0.996500 |
|            68 | 218.78 us | 229.09 us |            6 |                   9971 |        0.997100 |
|            69 | 229.09 us | 239.88 us |            3 |                   9974 |        0.997400 |
|            70 | 239.88 us | 251.19 us |            2 |                   9976 |        0.997600 |
|            71 | 251.19 us | 263.03 us |            2 |                   9978 |        0.997800 |
|            72 | 263.03 us | 275.42 us |            2 |                   9980 |        0.998000 |
|            73 | 275.42 us | 288.40 us |            4 |                   9984 |        0.998400 |
|            74 | 288.40 us | 302.00 us |            2 |                   9986 |        0.998600 |
|            75 | 302.00 us | 316.23 us |            2 |                   9988 |        0.998800 |
|            76 | 316.23 us | 331.13 us |            1 |                   9989 |        0.998900 |
|            78 | 346.74 us | 363.08 us |            3 |                   9992 |        0.999200 |
|            79 | 363.08 us | 380.19 us |            2 |                   9994 |        0.999400 |
|            80 | 380.19 us | 398.11 us |            1 |                   9995 |        0.999500 |
|            83 | 436.52 us | 457.09 us |            1 |                   9996 |        0.999600 |
|           100 | 954.99 us | 1.00 ms   |            1 |                   9997 |        0.999700 |
|           101 | 1.00 ms   | 1.05 ms   |            1 |                   9998 |        0.999800 |
|           121 | 2.51 ms   | 2.63 ms   |            1 |                   9999 |        0.999900 |
|           162 | 16.60 ms  | 17.38 ms  |            1 |                  10000 |        1.000000 |
+---------------+-----------+-----------+--------------+------------------------+-----------------+
49 rows in set (0.02 sec)

In this example, 3694 times (the COUNT_BUCKET column) when the query were executed, the query time was between 63.10 microseconds and 66.07 microseconds, so the execution time matched the interval of bucket number 41. There has been at total of 7322 executions (the COUNT_BUCKET_AND_LOWER column) of the query with a query time of 66.07 microseconds or less. This means that 73.22% (the BUCKET_QUANTILE column) of the queries have a query time of 66.07 microseconds or less.

In addition to the shown columns, there is SCHEMA_NAME and DIGEST (which together with BUCKET_NUMBER form a unique key). For each digest there are 450 buckets with the width of the bucket (in terms of difference between the low and high timers) gradually becoming larger and larger. The first, middle, and last five buckets are:

mysql> SELECT BUCKET_NUMBER,
              sys.format_time(BUCKET_TIMER_LOW) AS TimerLow,
              sys.format_time(BUCKET_TIMER_HIGH) AS TimerHigh
         FROM performance_schema.events_statements_histogram_by_digest
        WHERE DIGEST = '9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366'
              AND (BUCKET_NUMBER < 5 OR BUCKET_NUMBER > 444 OR BUCKET_NUMBER BETWEEN 223 AND 227);
+---------------+-----------+-----------+
| BUCKET_NUMBER | TimerLow  | TimerHigh |
+---------------+-----------+-----------+
|             0 | 0 ps      | 10.00 us  |
|             1 | 10.00 us  | 10.47 us  |
|             2 | 10.47 us  | 10.96 us  |
|             3 | 10.96 us  | 11.48 us  |
|             4 | 11.48 us  | 12.02 us  |
|           223 | 275.42 ms | 288.40 ms |
|           224 | 288.40 ms | 302.00 ms |
|           225 | 302.00 ms | 316.23 ms |
|           226 | 316.23 ms | 331.13 ms |
|           227 | 331.13 ms | 346.74 ms |
|           445 | 2.11 h    | 2.21 h    |
|           446 | 2.21 h    | 2.31 h    |
|           447 | 2.31 h    | 2.42 h    |
|           448 | 2.42 h    | 2.53 h    |
|           449 | 2.53 h    | 30.50 w   |
+---------------+-----------+-----------+
15 rows in set (0.02 sec)

The bucket thresholds are fixed and thus the same for all digests. There is also a global histogram in the events_statements_histogram_global.

This includes the introduction to the new Performance Schema digest features. As monitoring tools start to use this information, it will help create a better monitoring experience. Particularly the histograms will benefit from being shown as graphs.

Replication Monitoring with the Performance Schema

The traditional way to monitor replication in MySQL is the SHOW SLAVE STATUS command. However as it will be shown, it has its limitations and in MySQL 5.7 and 8.0 the MySQL developers have started to implement the information as Performance Schema tables. This has several advantages including better monitoring of the replication delay in MySQL 8.0. This blog discusses why SHOW SLAVE STATUS should be replaced with the Performance Schema tables.

The Setup

The replication setup that will be used for the examples in this blog can be seen in the following figure.

Replication Setup with Multi-Source and Chained Replication
Replication Setup with Multi-Source and Chained Replication

There are two source instances (replication masters). Source 1 replicates to the Relay instance (i.e. it acts both as a replica and source in the setup). The Relay instance replicates to the Replica instance which also replicates from Source 2. That is, the Replica instance uses multi-source replication to replicate from the Source 1Relay chain as well as directly from Source 2.

This blog will use the Replica instance to demonstrate SHOW SLAVE STATUS and Performance Schema replication tables.

SHOW SLAVE STATUS

The SHOW SLAVE STATUS command has been around since the addition of replication to MySQL. Thus it is familiar to all database administrators who have been working with replication. It is also very simple to use, and it is easy to remember the syntax. So far so good. However, it also has some limitations.

Let’s take a look at how the output of SHOW SLAVE STATUS looks in the setup described above:

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: replication
                  Master_Port: 3308
                Connect_Retry: 60
              Master_Log_File: binlog.000001
          Read_Master_Log_Pos: 61422958
               Relay_Log_File: relaylog-relay.000005
                Relay_Log_Pos: 59921651
        Relay_Master_Log_File: binlog.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: sakila.%
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 59921443
              Relay_Log_Space: 61423581
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 49
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 11
                  Master_UUID: 7616e9d1-c868-11e8-92f0-080027effed8
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 81c777c0-c86f-11e8-9031-080027effed8:28-81
            Executed_Gtid_Set: 2165e6e2-c870-11e8-8818-080027effed8:1-39,
81c777c0-c86f-11e8-9031-080027effed8:1-80
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: relay
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
*************************** 2. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: replication
                  Master_Port: 3307
                Connect_Retry: 60
              Master_Log_File: binlog.000002
          Read_Master_Log_Pos: 366288
               Relay_Log_File: relaylog-source_2.000005
                Relay_Log_Pos: 181612
        Relay_Master_Log_File: binlog.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: sakila.%
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 366288
              Relay_Log_Space: 182074
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: 2165e6e2-c870-11e8-8818-080027effed8
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 2165e6e2-c870-11e8-8818-080027effed8:28-39
            Executed_Gtid_Set: 2165e6e2-c870-11e8-8818-080027effed8:1-39,
81c777c0-c86f-11e8-9031-080027effed8:1-80
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: source_2
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
2 rows in set (0.22 sec)

The first thought: that’s a lot of lines of output. That is one of the issues – there is no way to limit the output. So summarize some of the issues with SHOW SLAVE STATUS:

  • There is no support for filter conditions or choosing which columns to include, so the output is very verbose. In this case with two replication channels, all available data for both channels are always included.
  • The order of the columns reflects the order they were added rather than how they logically belong together. Over the years many new columns have been added as new features have been added or the feature has been changed from an option configured in my.cnf to an option configured with CHANGE MASTER TO. For example the channel name is the fourth column from the end even if it would be more natural to have it as the first column (as it’s it the “primary key” of the output).
  • For multi-threaded appliers (replication_slave_workers > 1) there is no information for the individual workers.
  • Information related to the connection (I/O thread) and applier (SQL thread) as well configuration and status are mixed.
  • What does Seconds_behind_master mean? For the source_2 channel (the direct replication from Source 2 it is relatively easy to understand, but for the relay channel is it relative to Source 1 (yes) or to Replica (no)? More about this later.

To look at what can be done to solve these limitations, let’s look at the Performance Schema.

Performance Schema Replication Tables

To overcome these limitations, MySQL 5.7 introduced a series of replication tables in the Performance Schema. These have been extended in MySQL 8.0 to make them even more useful. One of the advantages of using the Performance Schema is that queries are executed as regular SELECT statements with the usual support for choosing columns and manipulating them and to apply a WHERE clause. First, let’s take a look at which replication related tables that are available.

Overview of Tables

As of MySQL 8.0.12 there are 11 replication related tables. The tables are:

  • log_status: This table is new in MySQL 8 and provides information about the binary log, relay log, and InnoDB redo log in a consistent manner (i.e. all values are for the same point in time).
  • Applier:
  • Connection:
    • replication_connection_configuration: The configuration of each of the replication channels.
    • replication_connection_status: The status of the replication channels. In MySQL 8 this includes information about the timestamps showing when the currently queuing transaction was originally committed, when it was committed on the immediate source instance, and when it was written to the relay log. This makes it possible to describe much more accurately what the replication delay is.
  • Group Replication:

The Group Replication tables will not be discussed further.

Since the information from SHOW SLAVE STATUS has been split up into several tables, it can be useful to take a look at how the information map.

Old Versus New

The following table shows how to get from a column in the SHOW SLAVE STATUS output to a table and column in the Performance Schema. The channel name is present in all of the Performance Schema replication tables (it’s the primary key or part of it). The replication filters and rewrite rules are split into two tables. The I/O and SQL thread states can be found in the performance_schema.threads by joining using the THREAD_ID column for the corresponding threads.

SHOW SLAVE STATUSPerformance Schema
ColumnTableColumn
Slave_IO_StatethreadsPROCESSLIST_STATE
Master_Hostreplication_connection_configurationHOST
Master_Userreplication_connection_configurationUSER
Master_Portreplication_connection_configurationPORT
Connect_Retryreplication_connection_configurationCONNECTION_RETRY_INTERVAL
Master_Log_File
Read_Master_Log_Pos
Relay_Log_Filelog_statusREPLICATION->>'$.channels[*].relay_log_file'
Relay_Log_Poslog_statusREPLICATION->>'$.channels[*].relay_log_position'
Relay_Master_Log_File
Slave_IO_Runningreplication_connection_statusSERVICE_STATE
Slave_SQL_Runningreplication_applier_status_by_coordinatorSERVICE_STATE
Replicate_Do_DBreplication_applier_filters
replication_applier_global_filters
Replicate_Ignore_DBreplication_applier_filters
replication_applier_global_filters
Replicate_Do_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Ignore_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Wild_Do_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Wild_Ignore_Tablereplication_applier_filtersreplication_applier_global_filters
Last_Errno
Last_Error
Skip_Counter
Exec_Master_Log_Pos
Relay_Log_Space
Until_Condition
Until_Log_File
Until_Log_Pos
Master_SSL_Allowedreplication_connection_configurationSSL_ALLOWED
Master_SSL_CA_Filereplication_connection_configurationSSL_CA_FILE
Master_SSL_CA_Pathreplication_connection_configurationSSL_CA_PATH
Master_SSL_Certreplication_connection_configurationSSL_CERTIFICATE
Master_SSL_Cipherreplication_connection_configurationSSL_CIPHER
Master_SSL_Keyreplication_connection_configurationSSL_KEY
Seconds_Behind_Master
Master_SSL_Verify_Server_Certreplication_connection_configurationSSL_VERIFY_SERVER_CERTIFICATE
Last_IO_Errnoreplication_connection_statusLAST_ERROR_NUMBER
Last_IO_Errorreplication_connection_statusLAST_ERROR_MESSAGE
Last_SQL_Errnoreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_NUMBER
Last_SQL_Errorreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_MESSAGE
Replicate_Ignore_Server_Ids
Master_Server_Id
Master_UUIDreplication_connection_statusSOURCE_UUID
Master_Info_File
SQL_Delayreplication_applier_configurationDESIRED_DELAY
SQL_Remaining_Delayreplication_applier_statusREMAINING_DELAY
Slave_SQL_Running_StatethreadsPROCESSLIST_STATE
Master_Retry_Countreplication_connection_configurationCONNECTION_RETRY_COUNT
Master_Bindreplication_connection_configurationNETWORK_INTERFACE
Last_IO_Error_Timestampreplication_connection_statusLAST_ERROR_TIMESTAMP
Last_SQL_Error_Timestampreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_TIMESTAMP
Master_SSL_Crlreplication_connection_configurationSSL_CRL_FILE
Master_SSL_Crlpathreplication_connection_configurationSSL_CRL_PATH
Retrieved_Gtid_Setreplication_connection_statusRECEIVED_TRANSACTION_SET
Executed_Gtid_Set
Auto_Positionreplication_connection_configurationAUTO_POSITION
Replicate_Rewrite_DBreplication_applier_filters
replication_applier_global_filters
Channel_Name
Master_TLS_Versionreplication_connection_configurationTLS_VERSION
Master_public_key_pathreplication_connection_configurationPUBLIC_KEY_PATH
Get_master_public_keyreplication_connection_configurationGET_PUBLIC_KEY

As it can be seen, there are a few columns from SHOW SLAVE STATUS that do not have any corresponding tables and columns in the Performance Schema yet. One that probably is familiar to many as the main mean of monitoring the replication lag is the the Seconds_Behind_Master column.  This is no longer needed. It is now possible to get a better value using the timestamp columns in the replication_applier_status_by_coordinator, replication_applier_status_by_worker, and replication_connection_status tables. Talking about that, it is time to see the Performance Schema replication tables in action.

Examples

The rest of the blog shows example outputs each of the replication tables (except the ones related to Group Replication) in the Performance Schema. For some of the tables there is a short discussion following the output. The queries have been executed in rapid succession after the above SHOW SLAVE STATUS output was generated. As the outputs have been generated using separate queries, they do not correspond to the exact same point in time.

log_status

The log_status table shows the replication and engine log data so the data is consistent:

mysql> SELECT SERVER_UUID, JSON_PRETTY(LOCAL) AS LOCAL,
              JSON_PRETTY(REPLICATION) AS REPLICATION,
              STORAGE_ENGINES
         FROM log_status\G
*************************** 1. row ***************************
    SERVER_UUID: 302f0073-c869-11e8-95bb-080027effed8
          LOCAL: {
  "gtid_executed": "2165e6e2-c870-11e8-8818-080027effed8:1-39,\n81c777c0-c86f-11e8-9031-080027effed8:1-80",
  "binary_log_file": "binlog.000002",
  "binary_log_position": 60102708
}
    REPLICATION: {
  "channels": [
    {
      "channel_name": "relay",
      "relay_log_file": "relaylog-relay.000005",
      "relay_log_position": 61423166
    },
    {
      "channel_name": "source_2",
      "relay_log_file": "relaylog-source_2.000005",
      "relay_log_position": 181612
    }
  ]
}
STORAGE_ENGINES: {"InnoDB": {"LSN": 120287720, "LSN_checkpoint": 118919036}}
1 row in set (0.03 sec)

replication_applier_configuration

The replication_applier_configuration table shows the configuration of the applier threads:

mysql> SELECT * FROM replication_applier_configuration;
+--------------+---------------+
| CHANNEL_NAME | DESIRED_DELAY |
+--------------+---------------+
| relay        |             0 |
| source_2     |             0 |
+--------------+---------------+
2 rows in set (0.04 sec)

replication_applier_filters

The replication_applier_filters table shows the channel specific replication filters:

mysql> SELECT * FROM replication_applier_filters\G
*************************** 1. row ***************************
 CHANNEL_NAME: relay
  FILTER_NAME: REPLICATE_WILD_IGNORE_TABLE
  FILTER_RULE: world.%
CONFIGURED_BY: STARTUP_OPTIONS_FOR_CHANNEL
 ACTIVE_SINCE: 2018-10-05 20:49:48.185078
      COUNTER: 0
1 rows in set (0.18 sec)

There is one filter specifically for the relay channel: the channel will ignore changes to tables in the world schema and the filter was set using the the replicate_wild_ignore_table option in the MySQL configuration file.

replication_applier_global_filters

The replication_applier_global_filters table shows the replication filters that are shared for all channels:

mysql> SELECT * FROM replication_applier_global_filters\G
*************************** 1. row ***************************
  FILTER_NAME: REPLICATE_WILD_IGNORE_TABLE
  FILTER_RULE: sakila.%
CONFIGURED_BY: CHANGE_REPLICATION_FILTER
 ACTIVE_SINCE: 2018-10-05 20:48:54.341004
1 row in set (0.02 sec)

There is also one global replication filter. This has been set using the CHANGE REPLICATION FILTER statement.

replication_applier_status

The replication_applier_status table shows the overall status for the applier threads:

mysql> SELECT * FROM replication_applier_status;
+--------------+---------------+-----------------+----------------------------+
| CHANNEL_NAME | SERVICE_STATE | REMAINING_DELAY | COUNT_TRANSACTIONS_RETRIES |
+--------------+---------------+-----------------+----------------------------+
| relay        | ON            |            NULL |                          0 |
| source_2     | ON            |            NULL |                          0 |
+--------------+---------------+-----------------+----------------------------+
2 rows in set (0.05 sec)

replication_applier_status_by_coordinator

The replication_applier_status_by_coordinator table shows the status for the coordinator when multi-threaded appliers has been configured (slave_parallel_workers > 1):

mysql> SELECT * FROM replication_applier_status_by_coordinator\G
*************************** 1. row ***************************
                                         CHANNEL_NAME: relay
                                            THREAD_ID: 55
                                        SERVICE_STATE: ON
                                    LAST_ERROR_NUMBER: 0
                                   LAST_ERROR_MESSAGE: 
                                 LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_PROCESSED_TRANSACTION: 81c777c0-c86f-11e8-9031-080027effed8:81
 LAST_PROCESSED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-10-05 21:07:52.286116
LAST_PROCESSED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2018-10-05 21:08:10.692561
    LAST_PROCESSED_TRANSACTION_START_BUFFER_TIMESTAMP: 2018-10-05 21:08:10.843893
      LAST_PROCESSED_TRANSACTION_END_BUFFER_TIMESTAMP: 2018-10-05 21:08:11.142150
                               PROCESSING_TRANSACTION: 
     PROCESSING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
    PROCESSING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        PROCESSING_TRANSACTION_START_BUFFER_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 2. row ***************************
                                         CHANNEL_NAME: source_2
                                            THREAD_ID: 59
                                        SERVICE_STATE: ON
                                    LAST_ERROR_NUMBER: 0
                                   LAST_ERROR_MESSAGE: 
                                 LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_PROCESSED_TRANSACTION: 2165e6e2-c870-11e8-8818-080027effed8:39
 LAST_PROCESSED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-10-05 20:52:12.422129
LAST_PROCESSED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2018-10-05 20:52:12.422129
    LAST_PROCESSED_TRANSACTION_START_BUFFER_TIMESTAMP: 2018-10-05 20:52:13.010969
      LAST_PROCESSED_TRANSACTION_END_BUFFER_TIMESTAMP: 2018-10-05 20:52:13.519174
                               PROCESSING_TRANSACTION: 
     PROCESSING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
    PROCESSING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        PROCESSING_TRANSACTION_START_BUFFER_TIMESTAMP: 0000-00-00 00:00:00.000000
2 rows in set (0.06 sec)

This is an example of MySQL 8 has detailed information about the timings were for the various stages of the applied events. For example for the relay channel, it can be seen that for the last processed transaction, it took 18 seconds from the transaction was committed on Source 1 (original commit) until it was committed on Relay (immediate commit), but then it only took around half a second until the coordinate was done processing the transaction (i.e. sending it to a worker). Which brings us to the status by worker.

replication_applier_status_by_worker

The replication_applier_status_by_worker table shows the status for each worker thread:

mysql> SELECT * FROM replication_applier_status_by_worker\G
*************************** 1. row ***************************
                                           CHANNEL_NAME: relay
                                              WORKER_ID: 1
                                              THREAD_ID: 56
                                          SERVICE_STATE: ON
                                      LAST_ERROR_NUMBER: 0
                                     LAST_ERROR_MESSAGE: 
                                   LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                               LAST_APPLIED_TRANSACTION: 81c777c0-c86f-11e8-9031-080027effed8:80
     LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-10-05 21:07:27.721738
    LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2018-10-05 21:07:50.387138
         LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 2018-10-05 21:07:50.526808
           LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 2018-10-05 21:08:09.296713
                                   APPLYING_TRANSACTION: 81c777c0-c86f-11e8-9031-080027effed8:81
         APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-10-05 21:07:52.286116
        APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2018-10-05 21:08:10.692561
             APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2018-10-05 21:08:10.855825
                 LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
   LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
  LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                     APPLYING_TRANSACTION_RETRIES_COUNT: 0
       APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
      APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
    APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 2. row ***************************
                                           CHANNEL_NAME: relay
                                              WORKER_ID: 2
                                              THREAD_ID: 57
                                          SERVICE_STATE: ON
                                      LAST_ERROR_NUMBER: 0
                                     LAST_ERROR_MESSAGE: 
                                   LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                               LAST_APPLIED_TRANSACTION: 
     LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
    LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
         LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
           LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
                                   APPLYING_TRANSACTION: 
         APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
             APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
                 LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
   LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
  LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                     APPLYING_TRANSACTION_RETRIES_COUNT: 0
       APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
      APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
    APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 3. row ***************************
                                           CHANNEL_NAME: source_2
                                              WORKER_ID: 1
                                              THREAD_ID: 60
                                          SERVICE_STATE: ON
                                      LAST_ERROR_NUMBER: 0
                                     LAST_ERROR_MESSAGE: 
                                   LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                               LAST_APPLIED_TRANSACTION: 2165e6e2-c870-11e8-8818-080027effed8:39
     LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-10-05 20:52:12.422129
    LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2018-10-05 20:52:12.422129
         LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 2018-10-05 20:52:13.520916
           LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 2018-10-05 20:52:14.302957
                                   APPLYING_TRANSACTION: 
         APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
             APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
                 LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
   LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
  LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                     APPLYING_TRANSACTION_RETRIES_COUNT: 0
       APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
      APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
    APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 4. row ***************************
                                           CHANNEL_NAME: source_2
                                              WORKER_ID: 2
                                              THREAD_ID: 61
                                          SERVICE_STATE: ON
                                      LAST_ERROR_NUMBER: 0
                                     LAST_ERROR_MESSAGE: 
                                   LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                               LAST_APPLIED_TRANSACTION: 
     LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
    LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
         LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
           LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
                                   APPLYING_TRANSACTION: 
         APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
             APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
                 LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
   LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
  LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                     APPLYING_TRANSACTION_RETRIES_COUNT: 0
       APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
      APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE: 
    APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
4 rows in set (0.16 sec)

The timestamps for the relay channel’s workers (only one has been active as it can be seen) can be used to see that the last transaction took around 19 seconds to apply and it was committed also 19 seconds after it committed on the immediate source (the Relay instance).

You can compare this delay of 19 seconds with the 49 seconds claimed by Seconds_Behind_Master in the SHOW SLAVE STATUS output. Why the difference? Seconds_Behind_Master is really the time from the original source started to execute the current transaction until now. So that includes the time it took to execute the transaction not only on Source 1 but also on Relay and the time used until now on Replica.

replication_connection_configuration

The replication_connection_configuration table shows the configuration for each connection to the source of the replication:

mysql> SELECT * FROM replication_connection_configuration\G
*************************** 1. row ***************************
                 CHANNEL_NAME: relay
                         HOST: 127.0.0.1
                         PORT: 3308
                         USER: replication
            NETWORK_INTERFACE: 
                AUTO_POSITION: 1
                  SSL_ALLOWED: YES
                  SSL_CA_FILE: 
                  SSL_CA_PATH: 
              SSL_CERTIFICATE: 
                   SSL_CIPHER: 
                      SSL_KEY: 
SSL_VERIFY_SERVER_CERTIFICATE: NO
                 SSL_CRL_FILE: 
                 SSL_CRL_PATH: 
    CONNECTION_RETRY_INTERVAL: 60
       CONNECTION_RETRY_COUNT: 86400
           HEARTBEAT_INTERVAL: 30.000
                  TLS_VERSION: 
              PUBLIC_KEY_PATH: 
               GET_PUBLIC_KEY: NO
*************************** 2. row ***************************
                 CHANNEL_NAME: source_2
                         HOST: 127.0.0.1
                         PORT: 3307
                         USER: replication
            NETWORK_INTERFACE: 
                AUTO_POSITION: 1
                  SSL_ALLOWED: YES
                  SSL_CA_FILE: 
                  SSL_CA_PATH: 
              SSL_CERTIFICATE: 
                   SSL_CIPHER: 
                      SSL_KEY: 
SSL_VERIFY_SERVER_CERTIFICATE: NO
                 SSL_CRL_FILE: 
                 SSL_CRL_PATH: 
    CONNECTION_RETRY_INTERVAL: 60
       CONNECTION_RETRY_COUNT: 86400
           HEARTBEAT_INTERVAL: 30.000
                  TLS_VERSION: 
              PUBLIC_KEY_PATH: 
               GET_PUBLIC_KEY: NO
2 rows in set (0.01 sec)

replication_connection_status

The replication_connection_status table shows the status of each connection:

mysql> SELECT * FROM replication_connection_status\G
*************************** 1. row ***************************
                                      CHANNEL_NAME: relay
                                        GROUP_NAME: 
                                       SOURCE_UUID: 7616e9d1-c868-11e8-92f0-080027effed8
                                         THREAD_ID: 54
                                     SERVICE_STATE: ON
                         COUNT_RECEIVED_HEARTBEATS: 28
                          LAST_HEARTBEAT_TIMESTAMP: 2018-10-05 21:03:49.684895
                          RECEIVED_TRANSACTION_SET: 81c777c0-c86f-11e8-9031-080027effed8:28-81
                                 LAST_ERROR_NUMBER: 0
                                LAST_ERROR_MESSAGE: 
                              LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_QUEUED_TRANSACTION: 81c777c0-c86f-11e8-9031-080027effed8:81
 LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-10-05 21:07:52.286116
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2018-10-05 21:08:10.692561
     LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 2018-10-05 21:08:10.835992
       LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 2018-10-05 21:08:10.842133
                              QUEUEING_TRANSACTION: 
    QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
   QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 2. row ***************************
                                      CHANNEL_NAME: source_2
                                        GROUP_NAME: 
                                       SOURCE_UUID: 2165e6e2-c870-11e8-8818-080027effed8
                                         THREAD_ID: 58
                                     SERVICE_STATE: ON
                         COUNT_RECEIVED_HEARTBEATS: 47
                          LAST_HEARTBEAT_TIMESTAMP: 2018-10-05 21:08:12.589150
                          RECEIVED_TRANSACTION_SET: 2165e6e2-c870-11e8-8818-080027effed8:28-39
                                 LAST_ERROR_NUMBER: 0
                                LAST_ERROR_MESSAGE: 
                              LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_QUEUED_TRANSACTION: 2165e6e2-c870-11e8-8818-080027effed8:39
 LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2018-10-05 20:52:12.422129
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2018-10-05 20:52:12.422129
     LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 2018-10-05 20:52:12.486156
       LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 2018-10-05 20:52:12.486263
                              QUEUEING_TRANSACTION: 
    QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
   QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
2 rows in set (0.02 sec)

Similar to the applier timestamps, the connection timestamps can be used to check the lag caused by the connection threads fetching the transactions from its source’s binary log and writing it to its own relay log. For the relay channel it took around 0.14 second from the transaction was committed on the immediate source (the Relay instance) until the connection thread started to write the transaction to the relay log (LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP) and then further 0.007 second to complete the write.

Conclusion

The replication related tables in the Performance Schema, particularly the MySQL 8 version of them, provides a very useful way to get information about the replication configuration and status. As a support engineer myself, I look forward to have access to the new timestamp values, when I investigate replication delays.

There is still some work remaining such. For example, it could be useful to know when the transaction started on the original source. That way it is possible to compare the execution time between a source and a replica. However, the current information that is available is already a great improvement.

Shutdown and Restart Statements

There are various ways to shutdown MySQL. The traditional cross platform method is to use the shutdown command in the mysqladmin client. One drawback is that it requires shell access; another is that it cannot start MySQL again automatically. There are platform specific options that can perform a restart such as using systemctl on Linux or install MySQL as a service on Microsoft Windows. What I will look at here though is the built in support for stopping and restarting MySQL using SQL statements.

MySQL 5.7 added the SHUTDOWN statement which allows you to shut down MySQL using the MySQL command-line client or MySQL Shell. The command is straight forward to use:

The SHUTDOWN command available in MySQL 5.7 and later.
The SHUTDOWN command available in MySQL 5.7 and later.

You will need the SHUTDOWN privilege to use the statement – this is the same as it required to use mysqladmin to shutdown MySQL. There is one gotcha to be aware of with the SHUTDOWN statement: it only works with the old (traditional) MySQL protocol. If you attempt to use it when connected to MySQL using the new X Protocol, you get the error: ERROR: 3130: Command not supported by pluggable protocols as shown in the next example:

Executing SHUTDOWN when connected through the X Protocol causes error 3130.
Executing SHUTDOWN when connected through the X Protocol causes error 3130.

The RESTART statement, on the other hand, works through both protocols and also requires the SHUTDOWN privilege:

The RESTART command available in MySQL 8.0.
The RESTART command available in MySQL 8.0.

For the restart to work, it is necessary that MySQL has been started in presence of a “monitoring service”. This is the default on Microsoft Windows (to disable the monitoring service start MySQL with --no-monitor). On Linux the monitoring service can for example be systemd or mysqld_safe.

As an example of where the RESTART statement comes in handy is for MySQL Shell’s AdminAPI for administrating a MySQL InnoDB Cluster cluster. MySQL Shell can when connected to MySQL Server 8.0 use the new SET PERSIST syntax to make the required configuration changes and then use the RESTART statement to restart the instance to make non-dynamic configuration changes take effect.

The SHUTDOWN and RESTART statements may not be the most important changes in MySQL 5.7 and 8.0, but they can be handy to know of in some cases.

InnoDB Progress Information

MySQL has since version 5.7 had support for progress information for some queries. As promised in my previous post, I will here discuss how you can use that to get progress information for ALTER TABLE on InnoDB tables.

Background and Setup

Progress information is implemented through the Performance Schema using the stage events. In version 8.0.12 there are currently seven stages that can provide this information for  ALTER TABLE statements on InnoDB tables. In MySQL 8, it is easy to list the stages capable of reporting progress information by using the setup_instruments Performance Schema table:

mysql> SELECT NAME, ENABLED, TIMED
         FROM performance_schema.setup_instruments
        WHERE NAME LIKE 'stage/innodb/alter table%'
              AND PROPERTIES = 'progress';
+------------------------------------------------------+---------+-------+
| NAME                                                 | ENABLED | TIMED |
+------------------------------------------------------+---------+-------+
| stage/innodb/alter table (end)                       | YES     | YES   |
| stage/innodb/alter table (flush)                     | YES     | YES   |
| stage/innodb/alter table (insert)                    | YES     | YES   |
| stage/innodb/alter table (log apply index)           | YES     | YES   |
| stage/innodb/alter table (log apply table)           | YES     | YES   |
| stage/innodb/alter table (merge sort)                | YES     | YES   |
| stage/innodb/alter table (read PK and internal sort) | YES     | YES   |
+------------------------------------------------------+---------+-------+
7 rows in set (0.00 sec)

This also shows how the setup_instruments table in MySQL 8 has some additional information about the instruments such as properties and documentation (not included in the output). Adding this information is still work in progress.

MySQL 5.7 does not provide as easy a way to obtain the instruments providing progress information. Instead you need to consult the reference manual. However, the principle in using the feature is the same.

As you can see, all of the instruments are enabled and timed by default. What is not enabled by default, however, is the consumer that can make the information available:

mysql> SELECT NAME, ENABLED,
              sys.ps_is_consumer_enabled(NAME) AS EnabledWithHierarchy
         FROM performance_schema.setup_consumers
        WHERE NAME = 'events_stages_current';
+-----------------------+---------+----------------------+
| NAME                  | ENABLED | EnabledWithHierarchy |
+-----------------------+---------+----------------------+
| events_stages_current | NO      | NO                   |
+-----------------------+---------+----------------------+
1 row in set (0.01 sec)

Since the consumers form a hierarchical system, the sys schema function ps_is_consumer_enabled() is used to show whether the consumer is enabled taking the whole hierarchy into consideration.

In order to use the progress information, you need to enable the events_stages_current consumer. This is the consumer that is responsible for keeping the performance_schema.events_stages_current table up to date, i.e. record the current (or latest if there is no current stage) for each thread. With the default Performance Schema settings, the rest of the hierarchy is enabled. To enable event_stages_current and verify it will be consuming instruments, you can use the following queries:

mysql> UPDATE performance_schema.setup_consumers
          SET ENABLED = 'YES'
        WHERE NAME = 'events_stages_current';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT NAME, ENABLED,
              sys.ps_is_consumer_enabled(NAME) AS EnabledWithHierarchy
         FROM performance_schema.setup_consumers
        WHERE NAME = 'events_stages_current';
+-----------------------+---------+----------------------+
| NAME                  | ENABLED | EnabledWithHierarchy |
+-----------------------+---------+----------------------+
| events_stages_current | YES     | YES                  |
+-----------------------+---------+----------------------+
1 row in set (0.00 sec)

That is it. Now you can monitor the progress of the queries that uses the stages with progress information.

Warning

The more parts of the Performance Schema that is enabled and the more fine grained monitoring, the more overhead. Stages are not the worst with respect to overhead; nevertheless it is recommended you keep an eye on the affect of enabling the events_stages_current consumer.

Monitoring Progress

The base for monitoring the progress information is the performance_schema.events_stages_current table. There are two columns of interest for this discussion:

  • WORK_COMPLETED: The amount of work that is reported to have been completed.
  • WORK_ESTIMATED: The estimated amount of work that needs to be done.

For InnoDB ALTER TABLE the estimated amount of work is for the entire operation. That said, the estimate may be revised during the process, so it may happen that the if you calculate the percentage it decreases as time goes. However, in general the percentage (100% * WORK_COMPLETED/WORK_ESTIMATED) will increase steadily until the operation completes at 100%.

To learn more about how the progress information works, the following pages in the manual are recommended:

For now, let’s look at an example.

Example

For the example, the salaries table in the employees sample database will be used. The table is sufficiently large that it will be possible to query the progress while adding a column using the INPLACE algorithm. As discussed in MySQL 8.0.12: Instant ALTER TABLE, it is possible to add a column instantly, but for the purpose of this example, the INPLACE algorithm illustrates the progress information feature better. The query that will be executed is:

ALTER TABLE salaries ADD COLUMN new_col int NOT NULL DEFAULT 0, ALGORITHM=INPLACE;

The performance_schema.events_stages_current table can be joined with the performance_schema.events_statements_current to show the query and progress. For example:

mysql> SELECT stmt.THREAD_ID, stmt.SQL_TEXT, stage.EVENT_NAME AS State,
              stage.WORK_COMPLETED, stage.WORK_ESTIMATED,
              ROUND(100*stage.WORK_COMPLETED/stage.WORK_ESTIMATED, 2) AS CompletedPct
         FROM performance_schema.events_statements_current stmt
              INNER JOIN performance_schema.events_stages_current stage
                      ON stage.THREAD_ID = stmt.THREAD_ID
                         AND stage.NESTING_EVENT_ID = stmt.EVENT_ID\G
*************************** 1. row ***************************
     THREAD_ID: 63857
      SQL_TEXT: ALTER TABLE salaries ADD COLUMN new_col int NOT NULL DEFAULT 0, ALGORITHM=INPLACE
         State: stage/innodb/alter table (read PK and internal sort)
WORK_COMPLETED: 8906
WORK_ESTIMATED: 27351
  CompletedPct: 32.56
1 row in set (0.00 sec)

There is another way though. Instead of using the performance_schema.events_stages_current table directly, an easier way is to use the sys.session view. This is an advanced process list that includes much more information than the usual SHOW PROCESSLIST statement including progress information. The performance of sys.session has been improved with more than an order of magnitude in MySQL 8 by the addition of indexes to the Performance Schema tables making it highly useful.

Querying the sys.session view for sessions showing progress information while the ALTER TABLE is in progress returns an output similar to the following example:

mysql> SET @sys.statement_truncate_len = 85;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT thd_id, conn_id, db, command, state, current_statement,
              statement_latency, progress, current_memory, program_name
         FROM sys.session
        WHERE progress IS NOT NULL\G
*************************** 1. row ***************************
           thd_id: 63857
          conn_id: 63818
               db: employees
          command: Query
            state: alter table (merge sort)
current_statement: ALTER TABLE salaries ADD COLUMN new_col int NOT NULL DEFAULT 0, ALGORITHM=INPLACE
statement_latency: 4.22 s
         progress: 49.39
   current_memory: 464.27 KiB
     program_name: MySQLWorkbench
1 row in set (0.06 sec)

In the example, the @sys.statement_truncate_len user variable is set to 85. By default the sys schema ensures the current statement is at most 64 characters long. In order to avoid truncation in this case, the truncate length is increased to 85.

The example output shows that the progress is at around 49%. It is important to note that is an estimate and not an exact number. The ALTER TABLE is performing a merge sort at the time, and the query has been running for 4.22 seconds.

A couple of other interesting columns are included. It can be seen the connection is using 464KiB at the time. In MySQL 8 memory instrumentation is enabled by default (in MySQL 5.7 you need to enable it yourself). Additionally, the name of the program executing the query is MySQLWorkbench, that is the query originates from MySQL Workbench.

So, next time you plan a large ALTER TABLE operation, consider enabling the events_stages_current consumer, so you can follow the progress.

Configuring the MySQL Shell Prompt

With the introduction of MySQL Shell 8.0, the second major version of the new command-line tool for MySQL, a new and rich featured prompt was introduced. Unlike the prompt of the traditional mysql command-line client, it does not just say mysql> by default. Instead it comes in a colour coded spectacle.

The default prompt is great, but for one reason or another it may be that you want to change the prompt. Before getting to that, let’s take a look at the default prompt, so the starting point is clear.

The Default Prompt

An example of the default prompt can be seen in the screen shot below. As you can see, there are several parts to the prompt, each carrying its information.

MySQL Shell with the default font.
MySQL Shell with the default font.

There are six parts. From left to right, they are:

  • Status: Whether it is a production system or whether the connection is lost. This part is not included in the above screen shot.
  • MySQL: Just a reminder that you are working with a MySQL database.
  • Connection: Which host you are connected to (localhost), which port (33060 – to the X protocol port), and that SSL is being used.
  • Schema: The current default schema.
  • Mode: Whether you are using JavaScript (JS), Python (Py), or SQL (SQL) to enter commands.
  • End: As per tradition, the prompt ends with a >.

Depending on your current status one or more of the parts may be missing. For example, the configuration options will only be present, when you have an active connection to a MySQL Server instance.

The prompt works well on a black background and thus brightly coloured text as in the screen shot, but for some other background and text colours, it is not so – or you may simply want different colours to signify which whether you are connected to a development or production system. You may also find the prompt too verbose, if you are recording a video or writing training material. So, let’s move on and find out how the prompt is configured.

The Prompt Configuration

Since the prompt is not just a simple string, it is also somewhat more complex to configure it than just setting an option. The configuration is done in a JSON object stored in a file named prompt.json (by default – you can change this – more about that later).

The location of prompt.json depends on your operating system:

  • Linux and macOS: ~/.mysqlsh/prompt.json – that is in the .mysqlsh directory in the user’s home directory.
  • Microsoft Windows: %AppData%\MySQL\mysqlsh\prompt.json – that is in AppData\Roaming\MySQL\mysqlsh directory from the user’s home directory.

If the file does not exist, MySQL Shell falls back on a system default. For example, on Oracle Linux 7 installation, the file /usr/share/mysqlsh/prompt/prompt_256.json is used. This is also the template that is copied to %AppData%\MySQL\mysqlsh\prompt.json on Microsoft Windows 10 installation.

The MySQL Shell installation includes several templates that you can choose from. These are:

  • prompt_16.json: A coloured prompt limited to use 16/8 color ANSI colours and attributes.
  • prompt_256.json: The prompt uses 256 indexed colours. This is the one that are used by default both on Oracle Linux 7 and Microsoft Windows 10.
  • prompt_256inv.json: Similar to prompt_256.json, but with an “invisible” background colour (it just uses the same as for the terminal) and with different foreground colours.
  • prompt_256pl.json: Same as prompt_256.json but with extra symbols. This Powerline patched font such as the one that is installed with the Powerline project. This will add a padlock with the prompt when you use SSL to connect to MySQL and use “arrow” separators.
  • prompt_256pl+aw.json: Same as prompt_256pl.json but with “awesome symbols”. This additionally requires the awesome symbols to be included in the Powerline font.
  • prompt_classic.json: This is a very basic prompt that just shows mysql-js>, mysql-py>, or mysql-sql> based on the mode in use.
  • prompt_nocolor.json: Gives the full prompt information, but completely without colours. An example of a prompt is: MySQL [localhost+ ssl/world] JS>

These are templates that you can use as is or modify to suite yours needs and preferences. One way to pick a theme is to copy the template file into the location of your user’s prompt definition. The templates can be found in the prompt directory of the installation, for example:

  • Oracle Linux 7 RPM: /usr/share/mysqlsh/prompt/
  • Microsoft Windows: C:\Program Files\MySQL\MySQL Shell 8.0\share\mysqlsh\prompt

Another option is to define the MYSQLSH_PROMPT_THEME environment variable to point to the file you want to use. The value should be the full path to the file. This is particularly useful if you want to try the different template to see what works best for you. For example, to use the prompt_256inv.json template from the command prompt on Microsoft Windows:

C:\> set MYSQLSH_PROMPT_THEME=C:\Program Files\MySQL\MySQL Shell 8.0\share\mysqlsh\prompt\prompt_256inv.json

Which gives the prompt:

The prompt when using the prompt_256inv.json template.
The prompt when using the prompt_256inv.json template.

If none of the templates work for you, you can also dive in at the deep end of the pool and create your own configuration.

Creating Your Own Configuration

It is not completely trivial to create your own configuration, but if you use the template that is closest to the configuration you want as a starting point, it is not difficult either.

A good source of help to create the perfect prompt is also the README.prompt file that is located in the same directory as the template files. The README.prompt file contains the specification for the configuration.

Instead of going through the specification in details, let’s take a look at the prompt_256.json template and discuss some parts of it. Let’s start at the end of the file:

  "segments": [
    {
      "classes": ["disconnected%host%", "%is_production%"]
    },
    {
      "text": " My",
      "bg": 254,
      "fg": 23
    },
    {
      "separator": "",
      "text": "SQL ",
      "bg": 254,
      "fg": 166
    },
    {
      "classes": ["disconnected%host%", "%ssl%host%session%"],
      "shrink": "truncate_on_dot",
      "bg": 237,
      "fg": 15,
      "weight": 10,
      "padding" : 1
    },
    {
      "classes": ["noschema%schema%", "schema"],
      "bg": 242,
      "fg": 15,
      "shrink": "ellipsize",
      "weight": -1,
      "padding" : 1
    },
    {
      "classes": ["%Mode%"],
      "text": "%Mode%",
      "padding" : 1
    }
  ]
}

This is where the elements of the prompt is defined. There are a few things that is interesting to note here.

First, notice that there is an object with the classes disconnected%host% and %is_production%. The names inside the %s are variables defined in the same file or that comes from MySQL Shell itself (it has variables such as the host and port). For example, is_production is defined as:

  "variables" : {
    "is_production": {
        "match" : {
            "pattern": "*;%host%;*",
            "value": ";%env:PRODUCTION_SERVERS%;"
        },
        "if_true" : "production",
        "if_false" : ""
    },

So, a host is considered to be a production instance if it is included in the environment variable PRODUCTION_SERVERS. When there is a match, and additional element is inserted at the beginning of the prompt to make it clear, you are working on with a production system:

Connected to a production system.
Connected to a production system.

The second thing to note about the list of elements is that there are some special functions such as shrink which can be used to define how the text is kept relatively short. For example, the host uses truncate_on_dot, so only the part before the first dot in the hostname is displayed if the full hostname is too long. Alternatively ellipsize can be used to add … after the truncated value.

Third, the background and foreground colours are defined using the bg and fg elements respectively. This allows you to completely customize the prompt to your liking with respect to colours. The colour can be specified in one of the following ways:

  • By Name: There are a few colours that are known by name: black, red, green, yellow, blue, magenta, cyan, white.
  • By Index: A value between 0 and 255 (both inclusive) where 0 is black, 63 light blue, 127 magenta, 193 yellow, and 255 is white.
  • By RGB: Use a value in the #rrggbb format. Requires the terminal supports TrueColor colours.

Tip

If you want to do more than make a few tweaks to an existing template, read the README.prompt file to see the full specification including a list of supported attributes and built-in variables. These may change in the future as more features are added.

One group of built-in variables that deserve an example are the ones that in some way depend on the environment or the MySQL instance you are connected to. These are:

  • %env:varname%: This uses an environment variable. The way that it is determined whether you are connected to a production server is an example of how an environment variable
  • %sysvar:varname%: This uses the value of a global system variable from MySQL. That is, the value returned by SELECT @@global.varname.
  • %sessvar:varname%: Similar to the previous but using a session system variable.
  • %status:varname%: This uses the value of a global status variable from MySQL. That is, the value returned by SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'varname'.
  • %status:varname%: Similar to the previous, but using a session status variable.

If you for example want to include the MySQL version (of the instance you are connected to) in the prompt, you can add an element like:

    {
      "separator": "",
      "text": "%sysvar:version%",
      "bg": 250,
      "fg": 166
    },

The resulting prompt is:

Including the MySQL Server version in the prompt.
Including the MySQL Server version in the prompt.

What next? Now it is your turn to play with MySQL Shell. Enjoy.

MySQL 8.0.12: Instant ALTER TABLE

There are many nice changes included in the MySQL 8.0.12 release that were published a couple of days ago. One of the most exciting is the ability to make instant schema changes to tables. This blog will look into why I think that that is a stand-out change.

I will not go into details with the implementation other than noting that the new data dictionary in MySQL 8.0 has allowed for a nice implementation of the patch that was contributed by the Tencent Games DBA Team. If you are interested in learning more about the implementation, I will recommend you to read the blog by Bin Su (MySQL 8.0: InnoDB now supports Instant ADD COLUMN) and the worklog implementing the feature (WL#11250: Support Instant Add Column).

Contribution

Thanks to the Tencent Games DBA Team who contributed the patch for this feature.

Why Do We Need Instant ALTER TABLE?

As you may know, InnoDB has since version 5.6 supported online schema changes. So, a fair thought may be why that is no longer good enough. The thing is that while online schema changes are very nice and allows you to make changes to the data (DML statements) while the schema change is made, there are still some problems:

  • Online ALTER TABLE still requires a meta data lock for the duration of the operation. That is, it is not possible to make other schema changes until the ALTER TABLE has completed.
  • In replication setups, the SQL thread handling the online schema change will still block other changes coming through the replication channel from occurring. This means that an online schema change that takes an hour suddenly makes the replication lag an hour.
  • This is compounded in chained replication setups where the delay on the last instances in the topology is multiplied with the number of times the change has been replication. Suddenly this schema change that did not really affect the replication master becomes a major pain and delays the deployment of the application upgrade that relies on the schema changes.
  • Even though the change is online, it still is heavy on resources: disk for storing the extra copy of the table when the change cannot be made in-place, CPU and disk activity that may cause other queries to become slow, etc.
  • There is a limit to the amount of DML changes that can be made during the schema change. If you make too many changes, the schema change will fail.

So, in short, online schema changes are only the first stop. Instance schema changes is the future.

Which Changes Can Be Made Instantly?

While it would be great if all schema changes could be made instantly, unfortunately that is not the case. Just as not all schema changes can be made online, there are limitations to the new instant feature. In short, the changes that can be made instantly must only affect the metadata of the table. The metadata is stored in the data dictionary. The changes that can be made with the instant ALTER TABLE feature as per 8.0.12 are:

  • Adding a new column as the last column in the table.
  • Adding a generated virtual column.
  • Dropping a generated virtual column.
  • Setting a default value for an existing column.
  • Dropping the default value for an existing column.
  • Changing the list of values allowed for a column with the ENUM or SET data types. A requirement is that the storage size does not change for the column.
  • Change whether the index type is set explicitly for an existing index.

And who knows, maybe later the feature can be extended to cover more changes. There are also a few limitations that are good to be aware of:

  • The row format cannot be COMPRESSED.
  • The table cannot have a fulltext index.
  • Tables in the data dictionary cannot use the instant algorithm.
  • Temporary tables are not supported.

How to Ensure You are Using the Expected Algorithm?

One problem with schema changes is that here are different algorithms depending on the schema change. Currently there are three different algorithms:

  • INSTANT: the change completes very quickly (yes not quite instantly) as only the metadata in the data dictionary needs to be updated.
  • INPLACE: the changes are made within the existing table, i.e. a complete table copy is avoided.
  • COPY: the table is copied into a new copy with the new definition.

By default, MySQL chooses the algorithm doing the least work. That is, INSTANT if that is supported, otherwise INPLACE if that is supported, and finally COPY. Additionally, there is the concept of locking which can be set to either NONE, SHARED, or EXCLUSIVE.

So, how do you ensure you are not ending up with a copying ALTER TABLE taking exclusive locks when you thought the operation was going to be instant? The answer is to explicitly set the algorithm and lock type. That way, MySQL will throw an error if your schema change is not compatible with the requested algorithm. The same principle can be used to force a full rebuild of the table if you for example have a small table and don’t see a reason to worry about instantly added columns (see also more later).

I will give some examples of specifying the ALGORITHM and LOCK options to ALTER TABLE later. However, we first need an example table to play with.

Tip

Always specify the ALGORITHM and LOCK options explicitly to avoid unpleasant surprises.

Creating an Example Table

For the purpose of the upcoming examples of instant schema changes, it is necessary to have a table to play with. The table will be the testtbl table in the my_schema schema. The table can be created and populated with sample data by using MySQL Shell – this allows us to take advantage of the support for scripting. The table will have one million rows.

Note

To limit the amount of scrolling, the prompt in MySQL Shell has been changed to just show the mode (JS, PY, SQL).

JS> \connect root@localhost
Creating a session to 'root@localhost'
Please provide the password for 'root@localhost': ********
Save password for 'root@localhost'? [Y]es/[N]o/Ne[v]er (default No):
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 17 (X protocol)
Server version: 8.0.12 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

JS> \sql
Switching to SQL mode... Commands end with ;

SQL> CREATE SCHEMA my_schema;
Query OK, 1 row affected (0.0509 sec)

SQL> \use my_schema
Default schema set to `my_schema`.
Fetching table and column names from `my_schema` for auto-completion... Press ^C to stop.

SQL> CREATE TABLE testtbl (
 ...   id int unsigned NOT NULL auto_increment,
 ...   val varchar(36) NOT NULL,
 ...   PRIMARY KEY (id)
 ... ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.0497 sec)
SQL> \py
Switching to Python mode...

Py> \use my_schema
Default schema `my_schema` accessible through db.

Py> db.testtbl
<Table:testtbl>

Py> import uuid

Py> for i in range(1000):
...     session.start_transaction()
...     stmt = db.testtbl.insert("val")
...     for j in range(1000):
...         stmt = stmt.values(uuid.uuid1().hex)
...     stmt.execute()
...     session.commit()
...
Query OK, 0 rows affected (0.0029 sec)

Py> db.testtbl.select("COUNT(*)")
+----------+
| COUNT(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.2476 sec)

Note how this takes advantage of the feature in MySQL Shell where the db object in JavaScript and Python mode has the tables as properties. Thus it is possible to refer to the testtbl table in the my_schema schema as db.testtbl after using \use my_schema to read the database objects. The data is inserted in 1000 batches of 1000 rows using a double loop.

Now it is time to play with schema changes.

Schema Changes Examples

For comparison, let’s first add a column using the old algorithms. First, add the column val2 using the copying algorithm – note this is not supported online, so a shared lock is taken:

SQL> ALTER TABLE testtbl
 ...       ADD COLUMN val2 varchar(36) DEFAULT NULL,
 ...       ALGORITHM=COPY, LOCK=SHARED;
Query OK, 1000000 rows affected (5.3952 sec)

Note

The test is made on a laptop – the timings themselves are not relevant, but they can be used to compare the time taken for the three algorithms.

So that took around 5.4 seconds. That is not too bad, but we can do much better than that. Let’s try the in-place algorithm. Not only will that require less work (though still a fair bit), it is also possible to perform the change online.

SQL> ALTER TABLE testtbl
 ...       ADD COLUMN val3 varchar(36) DEFAULT NULL,
 ...       ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (1.7511 sec)

So, this is a factor three faster despite the table having an extra column compared to the original table. And it is possible to execute DML statements. That is a big win. But let’s move on to the finale: instant ALTER TABLE:

SQL> ALTER TABLE testtbl
 ...       ADD COLUMN val4 varchar(36) DEFAULT NULL,
 ...       ALGORITHM=INSTANT;
Query OK, 0 rows affected (0.0490 sec)

Wow – that’s less than 5/100 of a second – that’s instant enough for me. Note hos the LOCK option is not specified here. The instant algorithm is always as lock free as possible, so the LOCK option is not allowed.

There is a little more to it than this. As mentioned by Bin Su, there are some trade offs such as the row size if not checked upfront. So, in some cases it can be useful to know whether a table contains instantly added columns. This can be seen through the innodb_tables view in the Information Schema on the data dictionary:

Py> schema = session.get_schema("information_schema")
Query OK, 1000 items affected (0.0091 sec)

Py> inno_tables = schema.get_table("INNODB_TABLES")
Query OK, 0 rows affected (0.0002 sec)

Py> inno_tables.select("NAME", "INSTANT_COLS").where("INSTANT_COLS > 0")
+-------------------+--------------+
| NAME              | INSTANT_COLS |
+-------------------+--------------+
| my_schema/testtbl |            4 |
+-------------------+--------------+
1 row in set (0.0405 sec)

Now what is that? 4 columns? But we only added one of the columns using the instant algorithm. What the INSTANT_COLS column shows is how many columns existed before the first instant column was added. In the example the columns id, val, val2, and val3 existed, then val4 was added using the instant algorithm. For tables that have never had any columns added instantly, the value of INSTANT_COLS is 0.

Want to Read More?

This blog has just been an introduction to the new MySQL 8.0.12 feature of instant schema changes. I will recommend you to read Bin Xu’s blog as well as the documentation in the MySQL reference manual to fully understand the feature:

Conclusion

The new feature allowing instant schema changes is a great way to avoid time consuming operations. Particularly in replication setups where the replicated statement will block while it applies, the different between an online and an instant schema change is like night and day.

MySQL 8.0.12 can be downloaded from MySQL Community Downloads or you can install it through one of the “native installers” such as MySQL Installer for Microsoft Windows or one of our Linux repositories (Yum, APT, SUSE). If you are a customer and need the commercial version of MySQL Server, I will recommend you to download it from Patches & Updates in My Oracle Support (MOS); there is also a 30 days trial version available from Oracle Software Delivery Cloud.

Please do not hesitate to provide feedback. Bugs and feature requests can be logged in the MySQL Bugs database.

Which Character Set Should You Use in MySQL?

MySQL supports a long list of character sets, and one of the strong points are that you can use different character sets per column. This is very flexible, but which character set should you use?

If you do not want to read the whole blog the summary is: Use utf8mb4, particularly in MySQL 8.0 with one of the UCA 9.0.0 based collations.

Before I reach this conclusion, let’s start out looking at what a character set is.

What is a Character Set?

Characters are wonderful things that can take all kinds of shapes ranging from the familiar (for English speakers at least) Latin characters as those used to write this blog, over Arabic (مرحبا بالعالم) and Asian characters such as simplified Chinese (你好,世界) to emojis (❤️🐬). This is all cool, but computers do not deal in characters, they work in bits, so how are all of these wonderful characters represented?

The answer is that character sets are used to encode the characters into bits. Given the number of known characters (Unicode 11.0 contains 137374 characters). Just to cover the possibility of those characters, 18 bits are needed (so effectively 3 bytes). That is significant.

In the good old days when computers were new, it was mostly English that was used, and using 18 bits per character was just not feasible. A popular home computer in the 1980s was the Commodore 64 was an 8-bit computer with a total of 64kB of RAM – you just could not get away with using multiple bytes per character.

In the early days a popular character set was ASCII which uses 7 bits to encode each character. This allows for 128 unique characters which is ok for plain English but does not even allow for all of the Western European accented characters to be included. One answer to that are the Latin character sets, such as ISO 8859-1 (Latin 1) and Windows-1252. These use 8 bits and thus supports 256 characters for each character set. Together they allow complete coverage for the languages using the Latin letters, but you have to choose the right one for the language you are writing, and in all cases 8 bit does not help much if you need to write something in Chinese that has thousands of characters.

The current solution is to use Unicode characters. The Unicode Consortium defines code points for the characters included. As mentioned, in Unicode 11.0 there are just over 137000 characters. This covers most usages (yes, there are actually still characters missing). There is a bit more to it than that, however. One thing is the code points defined by the Unicode Consortium, but these still need bits representations for computers. There are several solutions to that with UTF-8 being the most commonly used nowadays.

UTF-8 is a variable width encoding that uses one to four bytes to represent a Unicode code point. Part of the success of UTF-8 is that the characters used in ASCII have the same encoding in UTF-8 – that is, if you have a document in ASCII, you can just say it is now in UTF-8 and all works well. That made it easy for the large number of ASCII based websites to migrate to use Unicode. The rest of the characters use one or more bytes. While the compatibility with ASCII has been great for adaptation and it helps keep the size of documents in English down, it also has some downsides. One is that some other languages uses more bytes that needed; another is that scanning a string is relatively expensive as you must decode each character to know where the next character begins.

Before moving on, let’s have a quick look at some actual encodings in different character sets, to see the difference (UCS2 is a two-byte fixed width Unicode character set):

CharacterASCIILatin-1UTF-8UCS-2
A4141410041
ÅC5C38500C5
E7958C754C
🐬F09F90AC

The table shows how there clearly are some relations between these character sets, but also how only UTF-8 can represent all of the four test characters. So, from this comparison the strengths of UTF-8 is starting to show, but also the weaknesses. The character Å requires two bytes in UTF-8 (and UCS-2) but only one in Latin-1. The Chinese character 界 requires three bytes in UTF-8 but only two in UCS-2.

So, what does this mean in the context of MySQL? Let’s take a look.

MySQL Character Sets

Until MySQL 8.0 the default character set in MySQL was Latin-1 (named latin1). This was a convenient character set in many ways, for example it was fixed width, so finding the Nth character in a string was fast and it could store text for most Western European languages. However as discussed, Latin-1 is not what is used in this day and age – the World has moved on to UTF-8. So, in MySQL 8.0 the change was made to make utf8mb4 the default character set.

Stop a minute – what is utf8mb4? How does that differ from UTF-8 that was discussed in the previous section? Well, it is the same thing. Unfortunately, when UTF-8 was first implemented in MySQL, it was limited to three bytes (in MySQL 8.0 called utf8mb3). This predates my involvement with MySQL, but a guess for this choice is that it is related to internal temporary tables which in MySQL 5.7 and earlier uses the MEMORY storage engine when possible. Internal temporary tables are for example used to store the result of subquery and for sorting. The MEMORY storage engine only supports fix width columns, so a varchar(10) column would be treated as a char(10) column in an in-memory internal temporary table. With utf8mb4 that would mean 40 bytes, with the choice of a 3-byte implementation it would mean 30 bytes. Furthermore, until the emergence of emojis, it was rarely required to use more than three bytes in UTF-8.

Anyway, MySQL has support for a wide range of character sets to suite your specific need. In MySQL 8 there are a total of 41 character sets to choose from. The details of the character sets can be found in the information_schema.CHARACTER_SETS table:

mysql-sql> SELECT *
             FROM information_schema.CHARACTER_SETS
            ORDER BY CHARACTER_SET_NAME;
+--------------------+----------------------+---------------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                     | MAXLEN |
+--------------------+----------------------+---------------------------------+--------+
| armscii8           | armscii8_general_ci  | ARMSCII-8 Armenian              |      1 |
| ascii              | ascii_general_ci     | US ASCII                        |      1 |
| big5               | big5_chinese_ci      | Big5 Traditional Chinese        |      2 |
| binary             | binary               | Binary pseudo charset           |      1 |
| cp1250             | cp1250_general_ci    | Windows Central European        |      1 |
| cp1251             | cp1251_general_ci    | Windows Cyrillic                |      1 |
| cp1256             | cp1256_general_ci    | Windows Arabic                  |      1 |
| cp1257             | cp1257_general_ci    | Windows Baltic                  |      1 |
| cp850              | cp850_general_ci     | DOS West European               |      1 |
| cp852              | cp852_general_ci     | DOS Central European            |      1 |
| cp866              | cp866_general_ci     | DOS Russian                     |      1 |
| cp932              | cp932_japanese_ci    | SJIS for Windows Japanese       |      2 |
| dec8               | dec8_swedish_ci      | DEC West European               |      1 |
| eucjpms            | eucjpms_japanese_ci  | UJIS for Windows Japanese       |      3 |
| euckr              | euckr_korean_ci      | EUC-KR Korean                   |      2 |
| gb18030            | gb18030_chinese_ci   | China National Standard GB18030 |      4 |
| gb2312             | gb2312_chinese_ci    | GB2312 Simplified Chinese       |      2 |
| gbk                | gbk_chinese_ci       | GBK Simplified Chinese          |      2 |
| geostd8            | geostd8_general_ci   | GEOSTD8 Georgian                |      1 |
| greek              | greek_general_ci     | ISO 8859-7 Greek                |      1 |
| hebrew             | hebrew_general_ci    | ISO 8859-8 Hebrew               |      1 |
| hp8                | hp8_english_ci       | HP West European                |      1 |
| keybcs2            | keybcs2_general_ci   | DOS Kamenicky Czech-Slovak      |      1 |
| koi8r              | koi8r_general_ci     | KOI8-R Relcom Russian           |      1 |
| koi8u              | koi8u_general_ci     | KOI8-U Ukrainian                |      1 |
| latin1             | latin1_swedish_ci    | cp1252 West European            |      1 |
| latin2             | latin2_general_ci    | ISO 8859-2 Central European     |      1 |
| latin5             | latin5_turkish_ci    | ISO 8859-9 Turkish              |      1 |
| latin7             | latin7_general_ci    | ISO 8859-13 Baltic              |      1 |
| macce              | macce_general_ci     | Mac Central European            |      1 |
| macroman           | macroman_general_ci  | Mac West European               |      1 |
| sjis               | sjis_japanese_ci     | Shift-JIS Japanese              |      2 |
| swe7               | swe7_swedish_ci      | 7bit Swedish                    |      1 |
| tis620             | tis620_thai_ci       | TIS620 Thai                     |      1 |
| ucs2               | ucs2_general_ci      | UCS-2 Unicode                   |      2 |
| ujis               | ujis_japanese_ci     | EUC-JP Japanese                 |      3 |
| utf16              | utf16_general_ci     | UTF-16 Unicode                  |      4 |
| utf16le            | utf16le_general_ci   | UTF-16LE Unicode                |      4 |
| utf32              | utf32_general_ci     | UTF-32 Unicode                  |      4 |
| utf8               | utf8_general_ci      | UTF-8 Unicode                   |      3 |
| utf8mb4            | utf8mb4_0900_ai_ci   | UTF-8 Unicode                   |      4 |
+--------------------+----------------------+---------------------------------+--------+
41 rows in set (0.0010 sec)

Notice how there are several UTF and other Unicode character sets including utf8 and utf8m4. The utf8 character set is the 3-byte implementation. The name utf8 has now been deprecated in favour of utf8mb3 to make it specific that you are using an implementation that can at most use three bytes per character.

The table has four columns: The character set name, the default collation, a description, and the maximum number of bytes per character. The first and the two last of the columns are clear enough, but what is the collation? Let’s discuss that.

What is a Collation?

It is all well and good that you now know how to encode characters, but one of the main uses of databases is to compare data. Are two values the same or is one larger than the other? For numbers this is in general well defined (floating point comparisons and release numbers are examples where it is less clear), for example 10 is greater than 5.

However, for strings it is not that simple. Different languages have different rules for the order of the characters, and there are the questions whether an accented character should be considered the same as the base character and whether upper and lower case should be considered the same. An example is that in Danish and Norwegian, the character Ø comes before Å, but in Swedish Å comes before Ö (Ø and Ö are essentially the same letter).

The rules that define the order of the characters and whether to take accents and capitalization into account is called collations. There are many collations for the same character set. In MySQL 8.0 there are alone 73 collations to choose from for the utf8mb4 character set (more later about utf8mb4).

The collations that can be used with a given character set can be found in the information_schema.COLLATIONS table. For example, to see the 73 collations available for utf8mb4:

mysql-sql> > SELECT *
              FROM information_schema.COLLATIONS
             WHERE CHARACTER_SET_NAME = 'utf8mb4'
             ORDER BY COLLATION_NAME;
+----------------------------+--------------------+-----+------------+-------------+---------+---------------+
| COLLATION_NAME             | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN | PAD_ATTRIBUTE |
+----------------------------+--------------------+-----+------------+-------------+---------+---------------+
| utf8mb4_0900_ai_ci         | utf8mb4            | 255 | Yes        | Yes         |       0 | NO PAD        |
| utf8mb4_0900_as_ci         | utf8mb4            | 305 |            | Yes         |       0 | NO PAD        |
| utf8mb4_0900_as_cs         | utf8mb4            | 278 |            | Yes         |       0 | NO PAD        |
| utf8mb4_bin                | utf8mb4            |  46 |            | Yes         |       1 | PAD SPACE     |
| utf8mb4_croatian_ci        | utf8mb4            | 245 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_cs_0900_ai_ci      | utf8mb4            | 266 |            | Yes         |       0 | NO PAD        |
| utf8mb4_cs_0900_as_cs      | utf8mb4            | 289 |            | Yes         |       0 | NO PAD        |
| utf8mb4_czech_ci           | utf8mb4            | 234 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_danish_ci          | utf8mb4            | 235 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_da_0900_ai_ci      | utf8mb4            | 267 |            | Yes         |       0 | NO PAD        |
| utf8mb4_da_0900_as_cs      | utf8mb4            | 290 |            | Yes         |       0 | NO PAD        |
| utf8mb4_de_pb_0900_ai_ci   | utf8mb4            | 256 |            | Yes         |       0 | NO PAD        |
| utf8mb4_de_pb_0900_as_cs   | utf8mb4            | 279 |            | Yes         |       0 | NO PAD        |
| utf8mb4_eo_0900_ai_ci      | utf8mb4            | 273 |            | Yes         |       0 | NO PAD        |
| utf8mb4_eo_0900_as_cs      | utf8mb4            | 296 |            | Yes         |       0 | NO PAD        |
| utf8mb4_esperanto_ci       | utf8mb4            | 241 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_estonian_ci        | utf8mb4            | 230 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_es_0900_ai_ci      | utf8mb4            | 263 |            | Yes         |       0 | NO PAD        |
| utf8mb4_es_0900_as_cs      | utf8mb4            | 286 |            | Yes         |       0 | NO PAD        |
| utf8mb4_es_trad_0900_ai_ci | utf8mb4            | 270 |            | Yes         |       0 | NO PAD        |
| utf8mb4_es_trad_0900_as_cs | utf8mb4            | 293 |            | Yes         |       0 | NO PAD        |
| utf8mb4_et_0900_ai_ci      | utf8mb4            | 262 |            | Yes         |       0 | NO PAD        |
| utf8mb4_et_0900_as_cs      | utf8mb4            | 285 |            | Yes         |       0 | NO PAD        |
| utf8mb4_general_ci         | utf8mb4            |  45 |            | Yes         |       1 | PAD SPACE     |
| utf8mb4_german2_ci         | utf8mb4            | 244 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_hr_0900_ai_ci      | utf8mb4            | 275 |            | Yes         |       0 | NO PAD        |
| utf8mb4_hr_0900_as_cs      | utf8mb4            | 298 |            | Yes         |       0 | NO PAD        |
| utf8mb4_hungarian_ci       | utf8mb4            | 242 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_hu_0900_ai_ci      | utf8mb4            | 274 |            | Yes         |       0 | NO PAD        |
| utf8mb4_hu_0900_as_cs      | utf8mb4            | 297 |            | Yes         |       0 | NO PAD        |
| utf8mb4_icelandic_ci       | utf8mb4            | 225 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_is_0900_ai_ci      | utf8mb4            | 257 |            | Yes         |       0 | NO PAD        |
| utf8mb4_is_0900_as_cs      | utf8mb4            | 280 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ja_0900_as_cs      | utf8mb4            | 303 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ja_0900_as_cs_ks   | utf8mb4            | 304 |            | Yes         |      24 | NO PAD        |
| utf8mb4_latvian_ci         | utf8mb4            | 226 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_la_0900_ai_ci      | utf8mb4            | 271 |            | Yes         |       0 | NO PAD        |
| utf8mb4_la_0900_as_cs      | utf8mb4            | 294 |            | Yes         |       0 | NO PAD        |
| utf8mb4_lithuanian_ci      | utf8mb4            | 236 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_lt_0900_ai_ci      | utf8mb4            | 268 |            | Yes         |       0 | NO PAD        |
| utf8mb4_lt_0900_as_cs      | utf8mb4            | 291 |            | Yes         |       0 | NO PAD        |
| utf8mb4_lv_0900_ai_ci      | utf8mb4            | 258 |            | Yes         |       0 | NO PAD        |
| utf8mb4_lv_0900_as_cs      | utf8mb4            | 281 |            | Yes         |       0 | NO PAD        |
| utf8mb4_persian_ci         | utf8mb4            | 240 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_pl_0900_ai_ci      | utf8mb4            | 261 |            | Yes         |       0 | NO PAD        |
| utf8mb4_pl_0900_as_cs      | utf8mb4            | 284 |            | Yes         |       0 | NO PAD        |
| utf8mb4_polish_ci          | utf8mb4            | 229 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_romanian_ci        | utf8mb4            | 227 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_roman_ci           | utf8mb4            | 239 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_ro_0900_ai_ci      | utf8mb4            | 259 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ro_0900_as_cs      | utf8mb4            | 282 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ru_0900_ai_ci      | utf8mb4            | 306 |            | Yes         |       0 | NO PAD        |
| utf8mb4_ru_0900_as_cs      | utf8mb4            | 307 |            | Yes         |       0 | NO PAD        |
| utf8mb4_sinhala_ci         | utf8mb4            | 243 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_sk_0900_ai_ci      | utf8mb4            | 269 |            | Yes         |       0 | NO PAD        |
| utf8mb4_sk_0900_as_cs      | utf8mb4            | 292 |            | Yes         |       0 | NO PAD        |
| utf8mb4_slovak_ci          | utf8mb4            | 237 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_slovenian_ci       | utf8mb4            | 228 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_sl_0900_ai_ci      | utf8mb4            | 260 |            | Yes         |       0 | NO PAD        |
| utf8mb4_sl_0900_as_cs      | utf8mb4            | 283 |            | Yes         |       0 | NO PAD        |
| utf8mb4_spanish2_ci        | utf8mb4            | 238 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_spanish_ci         | utf8mb4            | 231 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_sv_0900_ai_ci      | utf8mb4            | 264 |            | Yes         |       0 | NO PAD        |
| utf8mb4_sv_0900_as_cs      | utf8mb4            | 287 |            | Yes         |       0 | NO PAD        |
| utf8mb4_swedish_ci         | utf8mb4            | 232 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_tr_0900_ai_ci      | utf8mb4            | 265 |            | Yes         |       0 | NO PAD        |
| utf8mb4_tr_0900_as_cs      | utf8mb4            | 288 |            | Yes         |       0 | NO PAD        |
| utf8mb4_turkish_ci         | utf8mb4            | 233 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_unicode_520_ci     | utf8mb4            | 246 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_unicode_ci         | utf8mb4            | 224 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_vietnamese_ci      | utf8mb4            | 247 |            | Yes         |       8 | PAD SPACE     |
| utf8mb4_vi_0900_ai_ci      | utf8mb4            | 277 |            | Yes         |       0 | NO PAD        |
| utf8mb4_vi_0900_as_cs      | utf8mb4            | 300 |            | Yes         |       0 | NO PAD        |
+----------------------------+--------------------+-----+------------+-------------+---------+---------------+
73 rows in set (0.0023 sec)

The most interesting in this discussion is the name. As you can see there is a pattern to the names, and it is possible to identify up to four parts in the name

  • First the character set name.
  • Then which language it is for, for example ja for Japanese. A special “language” is binary which mean each byte is compared directly one by one.
  • Then whether it is a UCA 9.0.0 based collation. The UCA 9.0.0 based collations have 0900 in the name.
  • Then up to three modifiers depending on whether it is accent and/or case sensitive or insensitive.

There are a total of five accent and case modifiers:

  • ai: accent insensitive, ä is equal to a.
  • as: accent sensitive, ä is not equal to a.
  • ci: case insensitive, a is equal to A.
  • cs: case sensitive, a is not equal to A.
  • ks: kana sensitive (only for Japanese)

This brings us on to MySQL 8.0 and character sets and collations.

MySQL 8.0

Good job of making it this far. You are almost at the end now.

As part of the work to make utf8mb4 the default character set in MySQL 8.0, a large amount of work was put into making the MySQL work more efficiently with the UTF-8 character set and to make it more useful for the users. Some of the changes include:

  • The UCA 9.0.0 collations (with 0900 in the name) are new in MySQL 8.0. MySQL works faster with these new collations than the old collations.
  • In-memory internal temporary tables now use the new Temptable storage engine by default. It supports variable width columns making it much more efficient with UTF-8 than earlier versions of MySQL. From version 8.0.12 (just released) the Temptable storage engine also supports BLOBs.

Another thing to be aware of is that the new X DevAPI that for example allows NoSQL access to both the MySQL Document store with JSON documents and to SQL tables expect all query results to be UTF-8.

For these reasons it is recommended to use utf8mb4 with one of the new UCA 9.0.0 collations for most uses in MySQL 8.0. The default collation for utf8mb4 in MySQL 8.0 is utf8mb4_0900_ai_ci. This is a good collation for the general use cases, but feel free to use one of the more specific collations if that works better for your application. For example, the utf8mb4_ja_0900_as_cs or utf8mb4_ja_0900_as_cs_ks if you need correct Japanese comparisons (such as the fix for the infamous Sushi = Beer bug).

Character sets and collations are big topics. If you are interested to know more about MySQL 8.0 and character sets, there is a series of blogs on the MySQL Server Blog, for example:

Additionally, the MySQL reference manual has a chapter with 15 sections about character sets and collations: Chapter 10 Character Sets, Collations, Unicode.