Replication Monitoring with the Performance Schema

The traditional way to monitor replication in MySQL is the SHOW SLAVE STATUS command. However as it will be shown, it has its limitations and in MySQL 5.7 and 8.0 the MySQL developers have started to implement the information as Performance Schema tables. This has several advantages including better monitoring of the replication delay in MySQL 8.0. This blog discusses why SHOW SLAVE STATUS should be replaced with the Performance Schema tables.

The Setup

The replication setup that will be used for the examples in this blog can be seen in the following figure.

Replication Setup with Multi-Source and Chained Replication
Replication Setup with Multi-Source and Chained Replication

There are two source instances (replication masters). Source 1 replicates to the Relay instance (i.e. it acts both as a replica and source in the setup). The Relay instance replicates to the Replica instance which also replicates from Source 2. That is, the Replica instance uses multi-source replication to replicate from the Source 1Relay chain as well as directly from Source 2.

This blog will use the Replica instance to demonstrate SHOW SLAVE STATUS and Performance Schema replication tables.

SHOW SLAVE STATUS

The SHOW SLAVE STATUS command has been around since the addition of replication to MySQL. Thus it is familiar to all database administrators who have been working with replication. It is also very simple to use, and it is easy to remember the syntax. So far so good. However, it also has some limitations.

Let’s take a look at how the output of SHOW SLAVE STATUS looks in the setup described above:

The first thought: that’s a lot of lines of output. That is one of the issues – there is no way to limit the output. So summarize some of the issues with SHOW SLAVE STATUS:

  • There is no support for filter conditions or choosing which columns to include, so the output is very verbose. In this case with two replication channels, all available data for both channels are always included.
  • The order of the columns reflects the order they were added rather than how they logically belong together. Over the years many new columns have been added as new features have been added or the feature has been changed from an option configured in my.cnf to an option configured with CHANGE MASTER TO. For example the channel name is the fourth column from the end even if it would be more natural to have it as the first column (as it’s it the “primary key” of the output).
  • For multi-threaded appliers (replication_slave_workers > 1) there is no information for the individual workers.
  • Information related to the connection (I/O thread) and applier (SQL thread) as well configuration and status are mixed.
  • What does Seconds_behind_master mean? For the source_2 channel (the direct replication from Source 2 it is relatively easy to understand, but for the relay channel is it relative to Source 1 (yes) or to Replica (no)? More about this later.

To look at what can be done to solve these limitations, let’s look at the Performance Schema.

Performance Schema Replication Tables

To overcome these limitations, MySQL 5.7 introduced a series of replication tables in the Performance Schema. These have been extended in MySQL 8.0 to make them even more useful. One of the advantages of using the Performance Schema is that queries are executed as regular SELECT statements with the usual support for choosing columns and manipulating them and to apply a WHERE clause. First, let’s take a look at which replication related tables that are available.

Overview of Tables

As of MySQL 8.0.12 there are 11 replication related tables. The tables are:

  • log_status: This table is new in MySQL 8 and provides information about the binary log, relay log, and InnoDB redo log in a consistent manner (i.e. all values are for the same point in time).
  • Applier:
  • Connection:
    • replication_connection_configuration: The configuration of each of the replication channels.
    • replication_connection_status: The status of the replication channels. In MySQL 8 this includes information about the timestamps showing when the currently queuing transaction was originally committed, when it was committed on the immediate source instance, and when it was written to the relay log. This makes it possible to describe much more accurately what the replication delay is.
  • Group Replication:

The Group Replication tables will not be discussed further.

Since the information from SHOW SLAVE STATUS has been split up into several tables, it can be useful to take a look at how the information map.

Old Versus New

The following table shows how to get from a column in the SHOW SLAVE STATUS output to a table and column in the Performance Schema. The channel name is present in all of the Performance Schema replication tables (it’s the primary key or part of it). The replication filters and rewrite rules are split into two tables. The I/O and SQL thread states can be found in the performance_schema.threads by joining using the THREAD_ID column for the corresponding threads.

SHOW SLAVE STATUSPerformance Schema
ColumnTableColumn
Slave_IO_StatethreadsPROCESSLIST_STATE
Master_Hostreplication_connection_configurationHOST
Master_Userreplication_connection_configurationUSER
Master_Portreplication_connection_configurationPORT
Connect_Retryreplication_connection_configurationCONNECTION_RETRY_INTERVAL
Master_Log_File
Read_Master_Log_Pos
Relay_Log_Filelog_statusREPLICATION->>'$.channels[*].relay_log_file'
Relay_Log_Poslog_statusREPLICATION->>'$.channels[*].relay_log_position'
Relay_Master_Log_File
Slave_IO_Runningreplication_connection_statusSERVICE_STATE
Slave_SQL_Runningreplication_applier_status_by_coordinatorSERVICE_STATE
Replicate_Do_DBreplication_applier_filters
replication_applier_global_filters
Replicate_Ignore_DBreplication_applier_filters
replication_applier_global_filters
Replicate_Do_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Ignore_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Wild_Do_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Wild_Ignore_Tablereplication_applier_filtersreplication_applier_global_filters
Last_Errno
Last_Error
Skip_Counter
Exec_Master_Log_Pos
Relay_Log_Space
Until_Condition
Until_Log_File
Until_Log_Pos
Master_SSL_Allowedreplication_connection_configurationSSL_ALLOWED
Master_SSL_CA_Filereplication_connection_configurationSSL_CA_FILE
Master_SSL_CA_Pathreplication_connection_configurationSSL_CA_PATH
Master_SSL_Certreplication_connection_configurationSSL_CERTIFICATE
Master_SSL_Cipherreplication_connection_configurationSSL_CIPHER
Master_SSL_Keyreplication_connection_configurationSSL_KEY
Seconds_Behind_Master
Master_SSL_Verify_Server_Certreplication_connection_configurationSSL_VERIFY_SERVER_CERTIFICATE
Last_IO_Errnoreplication_connection_statusLAST_ERROR_NUMBER
Last_IO_Errorreplication_connection_statusLAST_ERROR_MESSAGE
Last_SQL_Errnoreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_NUMBER
Last_SQL_Errorreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_MESSAGE
Replicate_Ignore_Server_Ids
Master_Server_Id
Master_UUIDreplication_connection_statusSOURCE_UUID
Master_Info_File
SQL_Delayreplication_applier_configurationDESIRED_DELAY
SQL_Remaining_Delayreplication_applier_statusREMAINING_DELAY
Slave_SQL_Running_StatethreadsPROCESSLIST_STATE
Master_Retry_Countreplication_connection_configurationCONNECTION_RETRY_COUNT
Master_Bindreplication_connection_configurationNETWORK_INTERFACE
Last_IO_Error_Timestampreplication_connection_statusLAST_ERROR_TIMESTAMP
Last_SQL_Error_Timestampreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_TIMESTAMP
Master_SSL_Crlreplication_connection_configurationSSL_CRL_FILE
Master_SSL_Crlpathreplication_connection_configurationSSL_CRL_PATH
Retrieved_Gtid_Setreplication_connection_statusRECEIVED_TRANSACTION_SET
Executed_Gtid_Set
Auto_Positionreplication_connection_configurationAUTO_POSITION
Replicate_Rewrite_DBreplication_applier_filters
replication_applier_global_filters
Channel_Name
Master_TLS_Versionreplication_connection_configurationTLS_VERSION
Master_public_key_pathreplication_connection_configurationPUBLIC_KEY_PATH
Get_master_public_keyreplication_connection_configurationGET_PUBLIC_KEY

As it can be seen, there are a few columns from SHOW SLAVE STATUS that do not have any corresponding tables and columns in the Performance Schema yet. One that probably is familiar to many as the main mean of monitoring the replication lag is the the Seconds_Behind_Master column.  This is no longer needed. It is now possible to get a better value using the timestamp columns in the replication_applier_status_by_coordinator, replication_applier_status_by_worker, and replication_connection_status tables. Talking about that, it is time to see the Performance Schema replication tables in action.

Examples

The rest of the blog shows example outputs each of the replication tables (except the ones related to Group Replication) in the Performance Schema. For some of the tables there is a short discussion following the output. The queries have been executed in rapid succession after the above SHOW SLAVE STATUS output was generated. As the outputs have been generated using separate queries, they do not correspond to the exact same point in time.

log_status

The log_status table shows the replication and engine log data so the data is consistent:

replication_applier_configuration

The replication_applier_configuration table shows the configuration of the applier threads:

replication_applier_filters

The replication_applier_filters table shows the channel specific replication filters:

There is one filter specifically for the relay channel: the channel will ignore changes to tables in the world schema and the filter was set using the the replicate_wild_ignore_table option in the MySQL configuration file.

replication_applier_global_filters

The replication_applier_global_filters table shows the replication filters that are shared for all channels:

There is also one global replication filter. This has been set using the CHANGE REPLICATION FILTER statement.

replication_applier_status

The replication_applier_status table shows the overall status for the applier threads:

replication_applier_status_by_coordinator

The replication_applier_status_by_coordinator table shows the status for the coordinator when multi-threaded appliers has been configured (slave_parallel_workers > 1):

This is an example of MySQL 8 has detailed information about the timings were for the various stages of the applied events. For example for the relay channel, it can be seen that for the last processed transaction, it took 18 seconds from the transaction was committed on Source 1 (original commit) until it was committed on Relay (immediate commit), but then it only took around half a second until the coordinate was done processing the transaction (i.e. sending it to a worker). Which brings us to the status by worker.

replication_applier_status_by_worker

The replication_applier_status_by_worker table shows the status for each worker thread:

The timestamps for the relay channel’s workers (only one has been active as it can be seen) can be used to see that the last transaction took around 19 seconds to apply and it was committed also 19 seconds after it committed on the immediate source (the Relay instance).

You can compare this delay of 19 seconds with the 49 seconds claimed by Seconds_Behind_Master in the SHOW SLAVE STATUS output. Why the difference? Seconds_Behind_Master is really the time from the original source started to execute the current transaction until now. So that includes the time it took to execute the transaction not only on Source 1 but also on Relay and the time used until now on Replica.

replication_connection_configuration

The replication_connection_configuration table shows the configuration for each connection to the source of the replication:

replication_connection_status

The replication_connection_status table shows the status of each connection:

Similar to the applier timestamps, the connection timestamps can be used to check the lag caused by the connection threads fetching the transactions from its source’s binary log and writing it to its own relay log. For the relay channel it took around 0.14 second from the transaction was committed on the immediate source (the Relay instance) until the connection thread started to write the transaction to the relay log (LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP) and then further 0.007 second to complete the write.

Conclusion

The replication related tables in the Performance Schema, particularly the MySQL 8 version of them, provides a very useful way to get information about the replication configuration and status. As a support engineer myself, I look forward to have access to the new timestamp values, when I investigate replication delays.

There is still some work remaining such. For example, it could be useful to know when the transaction started on the original source. That way it is possible to compare the execution time between a source and a replica. However, the current information that is available is already a great improvement.

Shutdown and Restart Statements

There are various ways to shutdown MySQL. The traditional cross platform method is to use the shutdown command in the mysqladmin client. One drawback is that it requires shell access; another is that it cannot start MySQL again automatically. There are platform specific options that can perform a restart such as using systemctl on Linux or install MySQL as a service on Microsoft Windows. What I will look at here though is the built in support for stopping and restarting MySQL using SQL statements.

Stop Sign
Photo by Michael Mroczek on Unsplash

MySQL 5.7 added the SHUTDOWN statement which allows you to shut down MySQL using the MySQL command-line client or MySQL Shell. The command is straight forward to use:

The SHUTDOWN command available in MySQL 5.7 and later.
The SHUTDOWN command available in MySQL 5.7 and later.

You will need the SHUTDOWN privilege to use the statement – this is the same as it required to use mysqladmin to shutdown MySQL. There is one gotcha to be aware of with the SHUTDOWN statement: it only works with the old (traditional) MySQL protocol. If you attempt to use it when connected to MySQL using the new X Protocol, you get the error: ERROR: 3130: Command not supported by pluggable protocols as shown in the next example:

Executing SHUTDOWN when connected through the X Protocol causes error 3130.
Executing SHUTDOWN when connected through the X Protocol causes error 3130.

The RESTART statement, on the other hand, works through both protocols and also requires the SHUTDOWN privilege:

The RESTART command available in MySQL 8.0.
The RESTART command available in MySQL 8.0.

For the restart to work, it is necessary that MySQL has been started in presence of a “monitoring service”. This is the default on Microsoft Windows (to disable the monitoring service start MySQL with --no-monitor). On Linux the monitoring service can for example be systemd or mysqld_safe.

As an example of where the RESTART statement comes in handy is for MySQL Shell’s AdminAPI for administrating a MySQL InnoDB Cluster cluster. MySQL Shell can when connected to MySQL Server 8.0 use the new SET PERSIST syntax to make the required configuration changes and then use the RESTART statement to restart the instance to make non-dynamic configuration changes take effect.

The SHUTDOWN and RESTART statements may not be the most important changes in MySQL 5.7 and 8.0, but they can be handy to know of in some cases.

Overview of the MySQL Server Architecture

Sometimes it can be useful to take a step back and look at the world from a bit larger distance than usual. So in this blog, I will take a look at the high level architecture of MySQL Server.

Info
This is meant as a simplified overview and does not include all details.
Overview of the MySQL Server Architecture
Overview of the MySQL Server Architecture

For the discussion I will be referring to the the following figure that shows some of the features and plugins of MySQL. Orange boxes are available both for the community version and the commercial (enterprise) version, whereas red means the plugin is exclusive for the commercial version. Ellipsoid elements are plugins, whereas square (yes with round corners) elements indicate other features. The figure is based on MySQL 5.7.20 where the audit log can be used to block access to tables.
At the top there are the various connectors and APIs that can be used in the application to connect to MySQL. As the mysql command-line client uses the C API, this is included in this part of the figure as well. There are more connectors and APIs than I could fit into the figure; for the complete list see the documentation index.

The large darkish box represents the server. At the top there is the connection handling where there are two choices: use one thread per connection or use the commercial thread pool plugin. One thread per connection is the original way of handling new connections and as the name implies, it is simply creating a new thread for each new connection. The MySQL Enterprise Thread Pool will limit the overall number of threads by re-using the threads. This leads to better scalability particularly in the case of many connections executing short queries. After the initial thread handling, it is possible to send the connection through the optional Connection-Control plugin which will throttle the connections for an account, when there are multiple attempts to login with the wrong password.

Once the connection has been cleared to move ahead, the query is checked against the Query Cache if it is a SELECT statement, and the Query Cache is enabled. For most workloads the Query Cache will be an overhead that does not justify the potential gain. So, in practice it is recommended to completely disable it.

Info
Note: The Query Cache has been deprecated and is removed in MySQL 8.0.3 and later.

Next is the optional Query Rewrite plugin which allows the DBA to define which queries should be rewritten based on normalized queries like those in the Performance Schema digest summary table. For example, to change the query SELECT 130 to SELECT * FROM world.City WHERE ID = 130 – and the same for all other IDs, the following rule can be used:

This assumes the Query Rewrite plugin has already been installed.

It is not until now that the query actual reaches the parser (this will not happen if the query has a match in the Query Cache). After the parser has processed the query, it is known which schemas, tables, columns, etc. that will be required to answer the query, which means it is also possible to verify whether the user is allowed to use these objects.

After the parser, the MySQL Enterprise Firewall will be invoked if it is installed and enabled. The firewall compares the query’s digest to a whitelist of digests and if the query is in the whitelist. If the query is not in the whitelist it can be recorded or blocked (depending on configuration).

The last plugin is the MySQL Enterprise Audit plugin. In MySQL 5.7.20 and later, the audit log can be used to block access to a schema or a table in addition to log the access. This is the time where the plugin checks whether the access is allowed.

Finally the optimizer will determine the query plan and the query will be executed. The query execution will in almost all cases involve requesting data from or sending data to one of the storage engines. MySQL supports several storage engines through the pluggable storage engine architecture. Nowadays, the main storage engine is InnoDB, but several others exists including the NDBCluster storage engine that is used in MySQL NDB Cluster to execute queries through the SQL nodes (MySQL Server instances).

If You Want to Learn More

One nice way to investigate which order plugins and features are executed is to use the Performance Schema. For example to investigate the above features, enable the following instruments and consumers in the MySQL configuration file (or online executing UPDATE statements):

Then use two connections – one for monitoring and one for executing the query. Disable instrumentation for the monitoring connection:

Determine the Performance Schema thread ID for the connection that will be executing queries:

This assumes the two connections use different usernames. If that is not the case, then you will need to adjust the above query.

Optionally, to make it easier to find the events just for the test query, truncate all tables in the Performance Schema (this will lose all historical data, so is most appropriate for a test system):

Now execute the query – for example one that gets rewritten by the Query Rewriter plugin using the rule above:

Finally, the Performance Schema events can be retrieved:

Have fun.

InnoDB Locks Analysis: Why is Blocking Query NULL and How To Find More Information About the Blocking Transaction?

This post was originally published on the MySQL Support Team Blog at https://blogs.oracle.com/mysqlsupport/entry/innodb_locks_analysis_why_is on 14 April 2017.

Consider the scenario that you execute a query. You expect it to be fast – typically subsecond – but now it take not return until after 50 seconds (innodb_lock_wait_timeout seconds) and then it returns with an error:

You continue to investigate the issue using the sys.innodb_lock_waits view or the underlying Information Schema tables (INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS).

Note: The above Information Schema tables with lock and lock waits information have been moved to the Performance Schema in 8.0 as the data_locks and data_lock_waits tables. The sys schema view however works the same.

However, when you query the locks information, the blocking query is returned as NULL. What does that mean and how to proceed from that to get information about the blocking transaction?

Setting Up an Example

Before proceeding, lets set up an example which will be investigated later in the blog. The example can be set up as (do not disconnect Connection 1 when the queries have been executed):

  1. Connection 1:
  2. Connection 2 (blocks for innodb_lock_wait_timeout seconds):
  3. The following output while Connection 2 is still blocking from sys.innodb_lock_waits shows that the blocking query is NULL (slightly reformatted):

    The warnings will only occur in the 5.7.14 and later as the InnoDB lock tables being moved to the Performance Schema in MySQL 8.0. It is recommended to use the sys.innodb_lock_waits view as that is updated accordingly in MySQL 8.0.

Investigating Idle Transactions

To investigate idle transactions, you need to use the Performance Schema to get this information. First determine the Performance Schema thread id for the blocking transaction. For this you need the blocking_pid, in the above example:

and use this with the The threads Table table like:

For the following queries insert the thread id found above for the THREAD_ID = … where clauses.

To get the latest query executed, use the The events_statements_current Table table or the The session and x$session Views view:

or:

In this case this does not explain why the lock is held as the last query update a different row then where the lock issue occurs. However if the events_statements_history consumer is enabled (it is by default in MySQL 5.7 and later), the The events_statements_history Table table will include the last 10 statements (by default) executed for the connection:

So now the history of the blocking transaction can be seen and it is possible to determine why the locking issue occur.

Note: The history also includes some queries executed before the transaction started. These are not related to the locking issue.

If transaction monitoring is also enabled (only available in MySQL 5.7 and later), it is possible to get more information about the transaction and automatically limit the query of the history to the current transaction. Transaction monitoring is not enabled by default. To enable it, use:

Note: This must be done before either of the transactions is started. Only transaction started after the transaction monitoring is enabled will be instrumented.

If the above was enabled before the blocking transaction started, you can get more details about the blocking transaction as:

And to get the statement history of the transaction:

 

Who Holds the Metadata Lock? MySQL 5.7.3 Brings Help

In MySQL 5.5 metadata locks were introduced to ensure consistency for some operations that previously could cause problems. For example if a transaction SELECTed from a table and another connection changed the structure of the table before the transaction was committed; in that case the two operations would be written in reverse order to the binary log which could prevent replaying the binary log.

However there was no way to see who held metadata locks. SHOW PROCESSLIST would show who were waiting for metadata locks, and often you could guess who held it, but it wasn’t so always. Consider for example the following:

In this case it happens to be the process with Id = 3 that holds the metadata lock, but that is not obvious from the above output.

In MySQL 5.7.3 this has changed. There is a new Performance Schema table metadata_locks which will tell which metadata locks are held and which are pending. This can be used to answer the question for a case like the one above.

First instrumentation of metadata locks should be enabled. To do that you must enable the wait/lock/metadata/sql/mdl instrument in setup_instruments. Additionally the global_instrumentation consumer must be enabled in setup_consumers.

Currently the wait/lock/metadata/sql/mdl instrument is not enabled by default. I have created a feature request to consider enabling it by default, but obviously whether that will happen also depends on the performance impact.

To enable metadata lock instrumentation:

Note: The global_instrumentation consumer is enabled by default.

With the metadata lock instrumentation enabled, it is now easy to answer who holds the metadata lock (I’ve excluded the connections own metalocks here as I’m only interested in the metadata lock contention going on between other queries):

So in this case there is a metadata lock GRANTED to process list id 3 whereas the other connections have a PENDING lock status for the metadata lock for the City table.