MySQL 8: Performance Schema Digests Improvements

Since MySQL 5.6, the digest feature of the MySQL Performance Schema has provided a convenient and effective way to obtain statistics of queries based on their normalized form. The feature works so well that it has almost completely (from my experience) replaced the connector extensions and proxy for collecting query statistics for the Query Analyzer (Quan) in MySQL Enterprise Monitor (MEM).

MySQL 8 adds further improvements to the digest feature in the Performance Schema including a sample query with statistics for each digest, percentile information, and a histogram summary. This blog will explore these new features.

The MySQL Enterprise Monitor Query Analyzer
MySQL Enterprise Monitor is one of the main users of the Performance Schema digests for its Query Analyzer.

Let’s start out looking at the the good old summary by digest table.

Query Sample

The base table for digest summary information is the events_statements_summary_by_digest table. This has been around since MySQL 5.6. In MySQL 8.0 it has been extended with six columns of which three have data related to a sample query will be examined in this section.

The three sample columns are:

  • QUERY_SAMPLE_TEXT: An actual example of a query.
  • QUERY_SAMPLE_SEEN: When the sample query was seen.
  • QUERY_SAMPLE_TIMER_WAIT: How long time the sample query took to execute (in picoseconds).

As an example consider the query SELECT * FROM world.city WHERE id = <value>. The sample information for that query as well as the digest and digest text (normalized query) may look like:

There are a few things to note here:

  • The digest in MySQL 8 is a sha256 hash whereas in 5.6 and 5.7 it was an md5 hash.
  • The digest text is similar to the normalized query that the mysqldumpslow script can generate for queries in the slow query log; just that the Performance Schema uses a question mark as a placeholder.
  • The QUERY_SAMPLE_SEEN value is in the system time zone.
  • The sys.format_time() function is in the query used to convert the picoseconds to a human readable value.

The maximum length of the sample text is set with the performance_schema_max_sql_text_length option. The default is 1024 bytes. It is the same option that is used for the SQL_TEXT columns in the statement events tables. It requires a restart of MySQL to change the value. Since the query texts are stored in several contexts and some of the Performance Schema tables can have thousands of rows, do take care not to increase it beyond what you have memory for.

How is the sample query chosen? The sample is the slowest example of a query with the given digest. If the performance_schema_max_digest_sample_age option is set to a non-zero value (the default is 60 seconds) and the existing sample is older than the specified value, it will always be replaced.

The events_statements_summary_by_digest also has another set of new columns: percentile information.

Percentile Information

Since the beginning, the events_statements_summary_by_digest table has included some statistical information about the query times for a given digest: the minimum, average, maximum, and total query time. In MySQL 8 this has been extended to include information about the 95th, 99th, and 99.9th percentile. The information is available in the QUANTILE_95, QUANTILE_99, and QUANTILE_999 column respectively. All of the values are in picoseconds.

What does the new columns mean? Based on the histogram information of the query (see the next section), MySQL calculates a high estimate of the query time. For a given digest, 95% of the executed queries are expected to be faster than the query time given by QUANTILE_95. Similar for the two other columns.

As an example consider the same digest as before:

Having the 95th, 99th, and 99.9th percentile helps predict the performance of a query and show the spread of the query times. Even more information about the spread can be found using the new family member: histograms.

Histograms

Histograms is a way to put the query execution times into buckets, so it is possible to see how the query execution times spread. This can for example be useful to see how evenly the query time is. The average query time may be fine, but if that is based on some queries executing super fast and others very slow, it will still result in unhappy users and customers.

The MAX_TIMER_WAIT column of the events_statements_summary_by_digest table discussed this far shows the high watermark, but it does not say whether it is a single outlier or a result of general varying query times. The histograms give the answer to this.

Using the query digest from earlier in the blog, the histogram information for the query can be found in the events_statements_histogram_by_digest table like:

In this example, 3694 times (the COUNT_BUCKET column) when the query were executed, the query time was between 63.10 microseconds and 66.07 microseconds, so the execution time matched the interval of bucket number 41. There has been at total of 7322 executions (the COUNT_BUCKET_AND_LOWER column) of the query with a query time of 66.07 microseconds or less. This means that 73.22% (the BUCKET_QUANTILE column) of the queries have a query time of 66.07 microseconds or less.

In addition to the shown columns, there is SCHEMA_NAME and DIGEST (which together with BUCKET_NUMBER form a unique key). For each digest there are 450 buckets with the width of the bucket (in terms of difference between the low and high timers) gradually becoming larger and larger. The first, middle, and last five buckets are:

The bucket thresholds are fixed and thus the same for all digests. There is also a global histogram in the events_statements_histogram_global.

This includes the introduction to the new Performance Schema digest features. As monitoring tools start to use this information, it will help create a better monitoring experience. Particularly the histograms will benefit from being shown as graphs.

One thought on “MySQL 8: Performance Schema Digests Improvements

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.