MySQL 8: Performance Schema Digests Improvements
Since MySQL 5.6, the digest feature of the MySQL Performance Schema has provided a convenient and effective way to obtain statistics of queries based on their normalized form. The feature works so well that it has almost completely (from my experience) replaced the connector extensions and proxy for collecting query statistics for the Query Analyzer (Quan) in MySQL Enterprise Monitor (MEM).
MySQL 8 adds further improvements to the digest feature in the Performance Schema including a sample query with statistics for each digest, percentile information, and a histogram summary. This blog will explore these new features.
Let's start out looking at the the good old summary by digest table.
Query Sample
The base table for digest summary information is the events_statements_summary_by_digest
table. This has been around since MySQL 5.6. In MySQL 8.0 it has been extended with six columns of which three have data related to a sample query will be examined in this section.
The three sample columns are:
QUERY_SAMPLE_TEXT
: An actual example of a query.QUERY_SAMPLE_SEEN
: When the sample query was seen.QUERY_SAMPLE_TIMER_WAIT
: How long time the sample query took to execute (in picoseconds).
As an example consider the query SELECT * FROM world.city WHERE id = <value>
. The sample information for that query as well as the digest and digest text (normalized query) may look like:
mysql> SELECT DIGEST, DIGEST_TEXT, QUERY_SAMPLE_TEXT, QUERY_SAMPLE_SEEN,
sys.format_time(QUERY_SAMPLE_TIMER_WAIT) AS SampleTimerWait
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT LIKE '%`world` . `city`%'\G
*************************** 1. row ***************************
DIGEST: 9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366
DIGEST_TEXT: SELECT * FROM `world` . `city` WHERE `ID` = ?
QUERY_SAMPLE_TEXT: SELECT * FROM world.city WHERE ID = 130
QUERY_SAMPLE_SEEN: 2018-10-09 17:19:20.500944
SampleTimerWait: 17.34 ms
1 row in set (0.00 sec)
There are a few things to note here:
- The digest in MySQL 8 is a sha256 hash whereas in 5.6 and 5.7 it was an md5 hash.
- The digest text is similar to the normalized query that the
mysqldumpslow
script can generate for queries in the slow query log; just that the Performance Schema uses a question mark as a placeholder. - The
QUERY_SAMPLE_SEEN
value is in the system time zone. - The
sys.format_time()
function is in the query used to convert the picoseconds to a human readable value.
The maximum length of the sample text is set with the performance_schema_max_sql_text_length option. The default is 1024 bytes. It is the same option that is used for the SQL_TEXT
columns in the statement events tables. It requires a restart of MySQL to change the value. Since the query texts are stored in several contexts and some of the Performance Schema tables can have thousands of rows, do take care not to increase it beyond what you have memory for.
How is the sample query chosen? The sample is the slowest example of a query with the given digest. If the performance_schema_max_digest_sample_age option is set to a non-zero value (the default is 60 seconds) and the existing sample is older than the specified value, it will always be replaced.
The events_statements_summary_by_digest
also has another set of new columns: percentile information.
Percentile Information
Since the beginning, the events_statements_summary_by_digest
table has included some statistical information about the query times for a given digest: the minimum, average, maximum, and total query time. In MySQL 8 this has been extended to include information about the 95th, 99th, and 99.9th percentile. The information is available in the QUANTILE_95
, QUANTILE_99
, and QUANTILE_999
column respectively. All of the values are in picoseconds.
What does the new columns mean? Based on the histogram information of the query (see the next section), MySQL calculates a high estimate of the query time. For a given digest, 95% of the executed queries are expected to be faster than the query time given by QUANTILE_95
. Similar for the two other columns.
As an example consider the same digest as before:
mysql> SELECT DIGEST, DIGEST_TEXT, QUERY_SAMPLE_TEXT,
sys.format_time(SUM_TIMER_WAIT) AS SumTimerWait,
sys.format_time(MIN_TIMER_WAIT) AS MinTimerWait,
sys.format_time(AVG_TIMER_WAIT) AS AvgTimerWait,
sys.format_time(MAX_TIMER_WAIT) AS MaxTimerWait,
sys.format_time(QUANTILE_95) AS Quantile95,
sys.format_time(QUANTILE_99) AS Quantile99,
sys.format_time(QUANTILE_999) AS Quantile999
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT LIKE '%`world` . `city`%'\G
*************************** 1. row ***************************
DIGEST: 9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366
DIGEST_TEXT: SELECT * FROM `world` . `city` WHERE `ID` = ?
QUERY_SAMPLE_TEXT: SELECT * FROM world.city WHERE ID = 130
SumTimerWait: 692.77 ms
MinTimerWait: 50.32 us
AvgTimerWait: 68.92 us
MaxTimerWait: 17.34 ms
Quantile95: 104.71 us
Quantile99: 165.96 us
Quantile999: 363.08 us
1 row in set (0.00 sec)
Having the 95th, 99th, and 99.9th percentile helps predict the performance of a query and show the spread of the query times. Even more information about the spread can be found using the new family member: histograms.
Histograms
Histograms is a way to put the query execution times into buckets, so it is possible to see how the query execution times spread. This can for example be useful to see how evenly the query time is. The average query time may be fine, but if that is based on some queries executing super fast and others very slow, it will still result in unhappy users and customers.
The MAX_TIMER_WAIT
column of the events_statements_summary_by_digest
table discussed this far shows the high watermark, but it does not say whether it is a single outlier or a result of general varying query times. The histograms give the answer to this.
Using the query digest from earlier in the blog, the histogram information for the query can be found in the events_statements_histogram_by_digest
table like:
mysql> SELECT BUCKET_NUMBER,
sys.format_time(BUCKET_TIMER_LOW) AS TimerLow,
sys.format_time(BUCKET_TIMER_HIGH) AS TimerHigh,
COUNT_BUCKET, COUNT_BUCKET_AND_LOWER, BUCKET_QUANTILE
FROM performance_schema.events_statements_histogram_by_digest
WHERE DIGEST = '9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366'
AND COUNT_BUCKET > 0
ORDER BY BUCKET_NUMBER;
+---------------+-----------+-----------+--------------+------------------------+-----------------+
| BUCKET_NUMBER | TimerLow | TimerHigh | COUNT_BUCKET | COUNT_BUCKET_AND_LOWER | BUCKET_QUANTILE |
+---------------+-----------+-----------+--------------+------------------------+-----------------+
| 36 | 50.12 us | 52.48 us | 524 | 524 | 0.052400 |
| 37 | 52.48 us | 54.95 us | 2641 | 3165 | 0.316500 |
| 38 | 54.95 us | 57.54 us | 310 | 3475 | 0.347500 |
| 39 | 57.54 us | 60.26 us | 105 | 3580 | 0.358000 |
| 40 | 60.26 us | 63.10 us | 48 | 3628 | 0.362800 |
| 41 | 63.10 us | 66.07 us | 3694 | 7322 | 0.732200 |
| 42 | 66.07 us | 69.18 us | 611 | 7933 | 0.793300 |
| 43 | 69.18 us | 72.44 us | 236 | 8169 | 0.816900 |
| 44 | 72.44 us | 75.86 us | 207 | 8376 | 0.837600 |
| 45 | 75.86 us | 79.43 us | 177 | 8553 | 0.855300 |
| 46 | 79.43 us | 83.18 us | 236 | 8789 | 0.878900 |
| 47 | 83.18 us | 87.10 us | 186 | 8975 | 0.897500 |
| 48 | 87.10 us | 91.20 us | 203 | 9178 | 0.917800 |
| 49 | 91.20 us | 95.50 us | 116 | 9294 | 0.929400 |
| 50 | 95.50 us | 100.00 us | 135 | 9429 | 0.942900 |
| 51 | 100.00 us | 104.71 us | 105 | 9534 | 0.953400 |
| 52 | 104.71 us | 109.65 us | 65 | 9599 | 0.959900 |
| 53 | 109.65 us | 114.82 us | 65 | 9664 | 0.966400 |
| 54 | 114.82 us | 120.23 us | 59 | 9723 | 0.972300 |
| 55 | 120.23 us | 125.89 us | 40 | 9763 | 0.976300 |
| 56 | 125.89 us | 131.83 us | 34 | 9797 | 0.979700 |
| 57 | 131.83 us | 138.04 us | 33 | 9830 | 0.983000 |
| 58 | 138.04 us | 144.54 us | 27 | 9857 | 0.985700 |
| 59 | 144.54 us | 151.36 us | 16 | 9873 | 0.987300 |
| 60 | 151.36 us | 158.49 us | 25 | 9898 | 0.989800 |
| 61 | 158.49 us | 165.96 us | 20 | 9918 | 0.991800 |
| 62 | 165.96 us | 173.78 us | 9 | 9927 | 0.992700 |
| 63 | 173.78 us | 181.97 us | 11 | 9938 | 0.993800 |
| 64 | 181.97 us | 190.55 us | 11 | 9949 | 0.994900 |
| 65 | 190.55 us | 199.53 us | 4 | 9953 | 0.995300 |
| 66 | 199.53 us | 208.93 us | 6 | 9959 | 0.995900 |
| 67 | 208.93 us | 218.78 us | 6 | 9965 | 0.996500 |
| 68 | 218.78 us | 229.09 us | 6 | 9971 | 0.997100 |
| 69 | 229.09 us | 239.88 us | 3 | 9974 | 0.997400 |
| 70 | 239.88 us | 251.19 us | 2 | 9976 | 0.997600 |
| 71 | 251.19 us | 263.03 us | 2 | 9978 | 0.997800 |
| 72 | 263.03 us | 275.42 us | 2 | 9980 | 0.998000 |
| 73 | 275.42 us | 288.40 us | 4 | 9984 | 0.998400 |
| 74 | 288.40 us | 302.00 us | 2 | 9986 | 0.998600 |
| 75 | 302.00 us | 316.23 us | 2 | 9988 | 0.998800 |
| 76 | 316.23 us | 331.13 us | 1 | 9989 | 0.998900 |
| 78 | 346.74 us | 363.08 us | 3 | 9992 | 0.999200 |
| 79 | 363.08 us | 380.19 us | 2 | 9994 | 0.999400 |
| 80 | 380.19 us | 398.11 us | 1 | 9995 | 0.999500 |
| 83 | 436.52 us | 457.09 us | 1 | 9996 | 0.999600 |
| 100 | 954.99 us | 1.00 ms | 1 | 9997 | 0.999700 |
| 101 | 1.00 ms | 1.05 ms | 1 | 9998 | 0.999800 |
| 121 | 2.51 ms | 2.63 ms | 1 | 9999 | 0.999900 |
| 162 | 16.60 ms | 17.38 ms | 1 | 10000 | 1.000000 |
+---------------+-----------+-----------+--------------+------------------------+-----------------+
49 rows in set (0.02 sec)
In this example, 3694 times (the COUNT_BUCKET
column) when the query were executed, the query time was between 63.10 microseconds and 66.07 microseconds, so the execution time matched the interval of bucket number 41. There has been at total of 7322 executions (the COUNT_BUCKET_AND_LOWER
column) of the query with a query time of 66.07 microseconds or less. This means that 73.22% (the BUCKET_QUANTILE
column) of the queries have a query time of 66.07 microseconds or less.
In addition to the shown columns, there is SCHEMA_NAME
and DIGEST
(which together with BUCKET_NUMBER
form a unique key). For each digest there are 450 buckets with the width of the bucket (in terms of difference between the low and high timers) gradually becoming larger and larger. The first, middle, and last five buckets are:
mysql> SELECT BUCKET_NUMBER,
sys.format_time(BUCKET_TIMER_LOW) AS TimerLow,
sys.format_time(BUCKET_TIMER_HIGH) AS TimerHigh
FROM performance_schema.events_statements_histogram_by_digest
WHERE DIGEST = '9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366'
AND (BUCKET_NUMBER < 5 OR BUCKET_NUMBER > 444 OR BUCKET_NUMBER BETWEEN 223 AND 227);
+---------------+-----------+-----------+
| BUCKET_NUMBER | TimerLow | TimerHigh |
+---------------+-----------+-----------+
| 0 | 0 ps | 10.00 us |
| 1 | 10.00 us | 10.47 us |
| 2 | 10.47 us | 10.96 us |
| 3 | 10.96 us | 11.48 us |
| 4 | 11.48 us | 12.02 us |
| 223 | 275.42 ms | 288.40 ms |
| 224 | 288.40 ms | 302.00 ms |
| 225 | 302.00 ms | 316.23 ms |
| 226 | 316.23 ms | 331.13 ms |
| 227 | 331.13 ms | 346.74 ms |
| 445 | 2.11 h | 2.21 h |
| 446 | 2.21 h | 2.31 h |
| 447 | 2.31 h | 2.42 h |
| 448 | 2.42 h | 2.53 h |
| 449 | 2.53 h | 30.50 w |
+---------------+-----------+-----------+
15 rows in set (0.02 sec)
The bucket thresholds are fixed and thus the same for all digests. There is also a global histogram in the events_statements_histogram_global
.
This includes the introduction to the new Performance Schema digest features. As monitoring tools start to use this information, it will help create a better monitoring experience. Particularly the histograms will benefit from being shown as graphs.
0 Comments on “MySQL 8: Performance Schema Digests Improvements”