Since MySQL 5.6, the digest feature of the MySQL Performance Schema has provided a convenient and effective way to obtain statistics of queries based on their normalized form. The feature works so well that it has almost completely (from my experience) replaced the connector extensions and proxy for collecting query statistics for the Query Analyzer (Quan) in MySQL Enterprise Monitor (MEM).
MySQL 8 adds further improvements to the digest feature in the Performance Schema including a sample query with statistics for each digest, percentile information, and a histogram summary. This blog will explore these new features.
Let’s start out looking at the the good old summary by digest table.
The base table for digest summary information is the events_statements_summary_by_digest table. This has been around since MySQL 5.6. In MySQL 8.0 it has been extended with six columns of which three have data related to a sample query will be examined in this section.
The three sample columns are:
QUERY_SAMPLE_TEXT: An actual example of a query.
QUERY_SAMPLE_SEEN: When the sample query was seen.
QUERY_SAMPLE_TIMER_WAIT: How long time the sample query took to execute (in picoseconds).
As an example consider the query SELECT * FROM world.city WHERE id = <value>. The sample information for that query as well as the digest and digest text (normalized query) may look like:
The digest in MySQL 8 is a sha256 hash whereas in 5.6 and 5.7 it was an md5 hash.
The digest text is similar to the normalized query that the mysqldumpslow script can generate for queries in the slow query log; just that the Performance Schema uses a question mark as a placeholder.
The QUERY_SAMPLE_SEEN value is in the system time zone.
The sys.format_time() function is in the query used to convert the picoseconds to a human readable value.
The maximum length of the sample text is set with the performance_schema_max_sql_text_length option. The default is 1024 bytes. It is the same option that is used for the SQL_TEXT columns in the statement events tables. It requires a restart of MySQL to change the value. Since the query texts are stored in several contexts and some of the Performance Schema tables can have thousands of rows, do take care not to increase it beyond what you have memory for.
How is the sample query chosen? The sample is the slowest example of a query with the given digest. If the performance_schema_max_digest_sample_age option is set to a non-zero value (the default is 60 seconds) and the existing sample is older than the specified value, it will always be replaced.
The events_statements_summary_by_digest also has another set of new columns: percentile information.
Since the beginning, the events_statements_summary_by_digest table has included some statistical information about the query times for a given digest: the minimum, average, maximum, and total query time. In MySQL 8 this has been extended to include information about the 95th, 99th, and 99.9th percentile. The information is available in the QUANTILE_95, QUANTILE_99, and QUANTILE_999 column respectively. All of the values are in picoseconds.
What does the new columns mean? Based on the histogram information of the query (see the next section), MySQL calculates a high estimate of the query time. For a given digest, 95% of the executed queries are expected to be faster than the query time given by QUANTILE_95. Similar for the two other columns.
Having the 95th, 99th, and 99.9th percentile helps predict the performance of a query and show the spread of the query times. Even more information about the spread can be found using the new family member: histograms.
Histograms is a way to put the query execution times into buckets, so it is possible to see how the query execution times spread. This can for example be useful to see how evenly the query time is. The average query time may be fine, but if that is based on some queries executing super fast and others very slow, it will still result in unhappy users and customers.
The MAX_TIMER_WAIT column of the events_statements_summary_by_digest table discussed this far shows the high watermark, but it does not say whether it is a single outlier or a result of general varying query times. The histograms give the answer to this.
In this example, 3694 times (the COUNT_BUCKET column) when the query were executed, the query time was between 63.10 microseconds and 66.07 microseconds, so the execution time matched the interval of bucket number 41. There has been at total of 7322 executions (the COUNT_BUCKET_AND_LOWER column) of the query with a query time of 66.07 microseconds or less. This means that 73.22% (the BUCKET_QUANTILE column) of the queries have a query time of 66.07 microseconds or less.
In addition to the shown columns, there is SCHEMA_NAME and DIGEST (which together with BUCKET_NUMBER form a unique key). For each digest there are 450 buckets with the width of the bucket (in terms of difference between the low and high timers) gradually becoming larger and larger. The first, middle, and last five buckets are:
The bucket thresholds are fixed and thus the same for all digests. There is also a global histogram in the events_statements_histogram_global.
This includes the introduction to the new Performance Schema digest features. As monitoring tools start to use this information, it will help create a better monitoring experience. Particularly the histograms will benefit from being shown as graphs.
The traditional way to monitor replication in MySQL is the SHOW SLAVE STATUS command. However as it will be shown, it has its limitations and in MySQL 5.7 and 8.0 the MySQL developers have started to implement the information as Performance Schema tables. This has several advantages including better monitoring of the replication delay in MySQL 8.0. This blog discusses why SHOW SLAVE STATUS should be replaced with the Performance Schema tables.
The replication setup that will be used for the examples in this blog can be seen in the following figure.
There are two source instances (replication masters). Source 1 replicates to the Relay instance (i.e. it acts both as a replica and source in the setup). The Relay instance replicates to the Replica instance which also replicates from Source 2. That is, the Replica instance uses multi-source replication to replicate from the Source 1 → Relay chain as well as directly from Source 2.
This blog will use the Replica instance to demonstrate SHOW SLAVE STATUS and Performance Schema replication tables.
SHOW SLAVE STATUS
The SHOW SLAVE STATUS command has been around since the addition of replication to MySQL. Thus it is familiar to all database administrators who have been working with replication. It is also very simple to use, and it is easy to remember the syntax. So far so good. However, it also has some limitations.
Let’s take a look at how the output of SHOW SLAVE STATUS looks in the setup described above:
The first thought: that’s a lot of lines of output. That is one of the issues – there is no way to limit the output. So summarize some of the issues with SHOW SLAVE STATUS:
There is no support for filter conditions or choosing which columns to include, so the output is very verbose. In this case with two replication channels, all available data for both channels are always included.
The order of the columns reflects the order they were added rather than how they logically belong together. Over the years many new columns have been added as new features have been added or the feature has been changed from an option configured in my.cnf to an option configured with CHANGE MASTER TO. For example the channel name is the fourth column from the end even if it would be more natural to have it as the first column (as it’s it the “primary key” of the output).
For multi-threaded appliers (replication_slave_workers > 1) there is no information for the individual workers.
Information related to the connection (I/O thread) and applier (SQL thread) as well configuration and status are mixed.
What does Seconds_behind_master mean? For the source_2 channel (the direct replication from Source 2 it is relatively easy to understand, but for the relay channel is it relative to Source 1 (yes) or to Replica (no)? More about this later.
To look at what can be done to solve these limitations, let’s look at the Performance Schema.
Performance Schema Replication Tables
To overcome these limitations, MySQL 5.7 introduced a series of replication tables in the Performance Schema. These have been extended in MySQL 8.0 to make them even more useful. One of the advantages of using the Performance Schema is that queries are executed as regular SELECT statements with the usual support for choosing columns and manipulating them and to apply a WHERE clause. First, let’s take a look at which replication related tables that are available.
Overview of Tables
As of MySQL 8.0.12 there are 11 replication related tables. The tables are:
log_status: This table is new in MySQL 8 and provides information about the binary log, relay log, and InnoDB redo log in a consistent manner (i.e. all values are for the same point in time).
replication_applier_configuration: This table shows the configuration of the applier threads for each replication channel. Currently the only setting is the configured replication delay.
replication_applier_status: This table shows the replication filters for each replication channel. The information includes the service state, remaining delay, and number of transaction retries.
replication_applier_status_by_coordinator: For multi-threaded replicas this table shows the status of the thread that manages the actual worker threads. In MySQL 8 this includes several timestamps to give detailed knowledge of the replication delay.
replication_applier_status_by_worker: The status of each worker thread (for single-threaded replication there is one per channel). In MySQL 8 this includes several timestamps to give detailed knowledge of the replication delay.
replication_connection_status: The status of the replication channels. In MySQL 8 this includes information about the timestamps showing when the currently queuing transaction was originally committed, when it was committed on the immediate source instance, and when it was written to the relay log. This makes it possible to describe much more accurately what the replication delay is.
The Group Replication tables will not be discussed further.
Since the information from SHOW SLAVE STATUS has been split up into several tables, it can be useful to take a look at how the information map.
Old Versus New
The following table shows how to get from a column in the SHOW SLAVE STATUS output to a table and column in the Performance Schema. The channel name is present in all of the Performance Schema replication tables (it’s the primary key or part of it). The replication filters and rewrite rules are split into two tables. The I/O and SQL thread states can be found in the performance_schema.threads by joining using the THREAD_ID column for the corresponding threads.
As it can be seen, there are a few columns from SHOW SLAVE STATUS that do not have any corresponding tables and columns in the Performance Schema yet. One that probably is familiar to many as the main mean of monitoring the replication lag is the the Seconds_Behind_Master column. This is no longer needed. It is now possible to get a better value using the timestamp columns in the replication_applier_status_by_coordinator, replication_applier_status_by_worker, and replication_connection_status tables. Talking about that, it is time to see the Performance Schema replication tables in action.
The rest of the blog shows example outputs each of the replication tables (except the ones related to Group Replication) in the Performance Schema. For some of the tables there is a short discussion following the output. The queries have been executed in rapid succession after the above SHOW SLAVE STATUS output was generated. As the outputs have been generated using separate queries, they do not correspond to the exact same point in time.
The log_status table shows the replication and engine log data so the data is consistent:
There is one filter specifically for the relay channel: the channel will ignore changes to tables in the world schema and the filter was set using the the replicate_wild_ignore_table option in the MySQL configuration file.
The replication_applier_global_filters table shows the replication filters that are shared for all channels:
This is an example of MySQL 8 has detailed information about the timings were for the various stages of the applied events. For example for the relay channel, it can be seen that for the last processed transaction, it took 18 seconds from the transaction was committed on Source 1 (original commit) until it was committed on Relay (immediate commit), but then it only took around half a second until the coordinate was done processing the transaction (i.e. sending it to a worker). Which brings us to the status by worker.
The replication_applier_status_by_worker table shows the status for each worker thread:
The timestamps for the relay channel’s workers (only one has been active as it can be seen) can be used to see that the last transaction took around 19 seconds to apply and it was committed also 19 seconds after it committed on the immediate source (the Relay instance).
You can compare this delay of 19 seconds with the 49 seconds claimed by Seconds_Behind_Master in the SHOW SLAVE STATUS output. Why the difference? Seconds_Behind_Master is really the time from the original source started to execute the current transaction until now. So that includes the time it took to execute the transaction not only on Source 1 but also on Relay and the time used until now on Replica.
The replication_connection_configuration table shows the configuration for each connection to the source of the replication:
Similar to the applier timestamps, the connection timestamps can be used to check the lag caused by the connection threads fetching the transactions from its source’s binary log and writing it to its own relay log. For the relay channel it took around 0.14 second from the transaction was committed on the immediate source (the Relay instance) until the connection thread started to write the transaction to the relay log (LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP) and then further 0.007 second to complete the write.
The replication related tables in the Performance Schema, particularly the MySQL 8 version of them, provides a very useful way to get information about the replication configuration and status. As a support engineer myself, I look forward to have access to the new timestamp values, when I investigate replication delays.
There is still some work remaining such. For example, it could be useful to know when the transaction started on the original source. That way it is possible to compare the execution time between a source and a replica. However, the current information that is available is already a great improvement.
There are various ways to shutdown MySQL. The traditional cross platform method is to use the shutdown command in the mysqladmin client. One drawback is that it requires shell access; another is that it cannot start MySQL again automatically. There are platform specific options that can perform a restart such as using systemctl on Linux or install MySQL as a service on Microsoft Windows. What I will look at here though is the built in support for stopping and restarting MySQL using SQL statements.
MySQL 5.7 added the SHUTDOWN statement which allows you to shut down MySQL using the MySQL command-line client or MySQL Shell. The command is straight forward to use:
You will need the SHUTDOWN privilege to use the statement – this is the same as it required to use mysqladmin to shutdown MySQL. There is one gotcha to be aware of with the SHUTDOWN statement: it only works with the old (traditional) MySQL protocol. If you attempt to use it when connected to MySQL using the new X Protocol, you get the error: ERROR: 3130: Command not supported by pluggable protocols as shown in the next example:
The RESTART statement, on the other hand, works through both protocols and also requires the SHUTDOWN privilege:
For the restart to work, it is necessary that MySQL has been started in presence of a “monitoring service”. This is the default on Microsoft Windows (to disable the monitoring service start MySQL with --no-monitor). On Linux the monitoring service can for example be systemd or mysqld_safe.
As an example of where the RESTART statement comes in handy is for MySQL Shell’s AdminAPI for administrating a MySQL InnoDB Cluster cluster. MySQL Shell can when connected to MySQL Server 8.0 use the new SET PERSIST syntax to make the required configuration changes and then use the RESTART statement to restart the instance to make non-dynamic configuration changes take effect.
The SHUTDOWN and RESTART statements may not be the most important changes in MySQL 5.7 and 8.0, but they can be handy to know of in some cases.