Tracking Foreign Keys

The other day, I was reading a blog by Magnus Hagander about tracking foreign keys throughout a schema in PostgreSQL. I thought it was a good idea, so I decided to look at how you can track foreign key in MySQL.

MySQL Workbench EER Diagram

The way I decided to do it was to start out with a table, then find all tables referencing the table by a foreign key. From this basic (and simple as it will be shown) query, it is possible to create a chain of relations. The key table for the queries is information_schema.KEY_COLUMN_USAGE which has information about all the foreign key relationships.

However, before getting that far, it is necessary to have some tables to work with.

Sample Schema

A small schema, but with relatively complex foreign keys relations, can be created with the following statements:

DROP SCHEMA IF EXISTS db1;
CREATE SCHEMA db1;

CREATE TABLE db1.a (
   a_id int,
   PRIMARY KEY (a_id)
);

CREATE TABLE db1.b (
   b_id1 int,
   b_id2 int,
   a_id int,
   PRIMARY KEY (b_id1, b_id2),
   FOREIGN KEY (a_id) REFERENCES a (a_id)
);

CREATE TABLE db1.c (
   c_id int,
   b_id1 int,
   b_id2 int,
   PRIMARY KEY (c_id),
   FOREIGN KEY (b_id1, b_id2) REFERENCES b (b_id1, b_id2)
);

CREATE TABLE db1.d (
   d_id int,
   a_id int,
   c_id int,
   PRIMARY KEY (d_id),
   FOREIGN KEY (a_id) REFERENCES a (a_id),
   FOREIGN KEY (c_id) REFERENCES c (c_id)
);

CREATE TABLE db1.e (
   e_id int,
   a_id int,
   PRIMARY KEY (e_id),
   FOREIGN KEY (a_id) REFERENCES a (a_id)
);

CREATE TABLE db1.f (
   f_id int,
   b_id1 int,
   b_id2 int,
   c_id int,
   PRIMARY KEY (f_id),
   FOREIGN KEY (b_id1, b_id2)
   REFERENCES b (b_id1, b_id2),
   FOREIGN KEY (c_id) REFERENCES c (c_id)
);

The base table is a. There are three tables, b, d, and e, with a direct foreign key to table a.  Tables c and f in turn references table b through a two column foreign key, and tables d and f references table c. So there are quite a few paths to get to table a from another table. Before looking at how the relationship can be found from the information_schema.KEY_COLUMN_USAGE, it is worth creating a visual representation of the schema.

MySQL Workbench EER Diagrams

A long standing feature of MySQL Workbench is its ability to create enhanced entity–relationship (EER) diagrams. This shows a box with information about the columns and indexes of each table in a schema. Additionally there are lines connecting tables related by foreign keys. So, an EER diagram includes what we are looking for – the chain of foreign keys.

You can create an ERR diagram by clicking on Database and then choose Reverse Engineer … from the menu in MySQL Workbench:

Choose Reverse Engineer in the MySQL Workbench menu.
Choose Reverse Engineer in the MySQL Workbench menu.

Alternatively use CTRL + R. You can do this from the homepage or from a database connection. Either way, you are taken to the connection options screen where you can choose an already defined connection or create a new one – this works the same as when you connect to a MySQL instance to execute queries:

Specify the connections options to create to the MySQL instance with the schema for the EER diagram.
Specify the connections options to create to the MySQL instance with the schema for the EER diagram.

When you continue, MySQL Workbench will connect to MySQL Server and get a list of the schemas available. Once you get to the Select Schemas page, you can choose the schema or schemas you want to create the EER diagram for. In this case choose the db1 schema (unless you created the tables in another schema):

Choose the schemas to import.
Choose the schemas to import.

For this example, you can use the defaults for the rest of the screens. On the Select Objects screen, you can optionally choose to select a subset of tables for the diagram. On the same screen, you choose whether you want to place the imported objects into a diagram (enabled by default); you want to do that for this example.

Tip

If MySQL Workbench crashes when creating the diagram, try open Edit → Configuration… → Modelling in the menu and check the Force use of software based rendering for EER diagrams option.

At the end, you have the diagram. You can move the tables around to place them as you like. One example of the diagram is:

MySQL Workbench EER Diagram
MySQL Workbench EER Diagram

This makes it easy to see the relations between the tables.

But what do you do, if you want to analyze the relationship in a program or for some other reason have the relationships in a text format? Let’s look at that.

Querying the Foreign Key Relationship

As mentioned, the base table for looking at foreign key relations is the information_schema.KEY_COLUMN_USAGE table. It has the following definition:

mysql> DESC information_schema.KEY_COLUMN_USAGE;
+-------------------------------+------------------+------+-----+---------+-------+
| Field                         | Type             | Null | Key | Default | Extra |
+-------------------------------+------------------+------+-----+---------+-------+
| CONSTRAINT_CATALOG            | varchar(64)      | YES  |     | NULL    |       |
| CONSTRAINT_SCHEMA             | varchar(64)      | YES  |     | NULL    |       |
| CONSTRAINT_NAME               | varchar(64)      | NO   |     | NULL    |       |
| TABLE_CATALOG                 | varchar(64)      | YES  |     | NULL    |       |
| TABLE_SCHEMA                  | varchar(64)      | YES  |     | NULL    |       |
| TABLE_NAME                    | varchar(64)      | YES  |     | NULL    |       |
| COLUMN_NAME                   | varchar(64)      | YES  |     | NULL    |       |
| ORDINAL_POSITION              | int(10) unsigned | NO   |     | NULL    |       |
| POSITION_IN_UNIQUE_CONSTRAINT | binary(0)        | YES  |     | NULL    |       |
| REFERENCED_TABLE_SCHEMA       | binary(0)        | YES  |     | NULL    |       |
| REFERENCED_TABLE_NAME         | binary(0)        | YES  |     | NULL    |       |
| REFERENCED_COLUMN_NAME        | binary(0)        | YES  |     | NULL    |       |
+-------------------------------+------------------+------+-----+---------+-------+
12 rows in set (0.01 sec)

In MySQL 8.0 this is a view on the new data dictionary, so effectively a plain InnoDB query and it is fast to query. In MySQL 5.7 and earlier, querying it requires opening the tables which can be slow and all tables must be opened. If you have many tables and they are not cached in the table caches yet, querying KEY_COLUMN_USAGE can be slow and impact the general performance.

Basic Query – Single Column per Foreign Key

The three columns prefixed with REFERENCED_ contains the information about a foreign key. For example, for the tables used in this blog, if you want to know which tables have a direct foreign key to table a in the db1 schema, you can query KEY_COLUMN_USAGE with a WHERE clause on REFERENCED_TABLE_SCHEMA and REFERENCED_TABLE_NAME like:

mysql> SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME,
              REFERENCED_TABLE_SCHEMA, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
         FROM information_schema.KEY_COLUMN_USAGE
        WHERE REFERENCED_TABLE_SCHEMA = 'db1'
              AND REFERENCED_TABLE_NAME = 'a';
+--------------+------------+-------------+-------------------------+-----------------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | REFERENCED_TABLE_SCHEMA | REFERENCED_TABLE_NAME | REFERENCED_COLUMN_NAME |
+--------------+------------+-------------+-------------------------+-----------------------+------------------------+
| db1          | b          | a_id        | db1                     | a                     | a_id                   |
| db1          | d          | a_id        | db1                     | a                     | a_id                   |
| db1          | e          | a_id        | db1                     | a                     | a_id                   |
+--------------+------------+-------------+-------------------------+-----------------------+------------------------+
3 rows in set (0.01 sec)

So, the tables b, d, and e has a foreign key to a_id in the db1.a table, and the column name for each of the three tables is also called a_id. This is just as expected.

The query works great for finding the immediate relations where the foreign key only includes a single column. However, for cases where there are multiple columns in the foreign key, there will be two rows for each referencing table. So what to do?

Basis Query – Multiple Columns per Foreign Key

To avoid having one row per column in a multi-column foreign key, you need to perform an aggregation. You can for example use the GROUP_CONCAT() to generate a single value combining the column names. In MySQL 8.0, you can also consider creating a JSON array by using the JSON_ARRAYAGG() function:

mysql> SELECT TABLE_SCHEMA, TABLE_NAME,
              JSON_ARRAYAGG(COLUMN_NAME) ColumnNames
         FROM information_schema.KEY_COLUMN_USAGE
        WHERE REFERENCED_TABLE_SCHEMA = 'db1'
              AND REFERENCED_TABLE_NAME = 'b'
        GROUP BY TABLE_SCHEMA, TABLE_NAME;
+--------------+------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | ColumnNames        |
+--------------+------------+--------------------+
| db1          | c          | ["b_id1", "b_id2"] |
| db1          | f          | ["b_id1", "b_id2"] |
+--------------+------------+--------------------+
2 rows in set (0.01 sec)

This queries the foreign keys to the b tables. The c and f tables have a foreign key using the b_id1 and b_id2 columns.

This query result also means that the c and f tables are related to the a table through the b table. Would it not be great, if there was a single query that could provide the foreign key chains? Well, in MySQL 8 you can get this using a common table expression (CTE).

Querying Foreign Key Chains – Step by Step

Tip

If you are just interested in the final query, skip to the next subsection.

The query will use a recursive common table expression. This requires a seed query and a recursive query (that works on the rows generated in the previous iteration). A good seed query is similar to what we had for the basis query. However, to make it possible to aggregate all of the steps in the chain, the chain will be generated as a JSON array with each part of the chain being a JSON object. The seed query becomes:

mysql> SELECT 1 AS Level, TABLE_SCHEMA, TABLE_NAME,
              JSON_ARRAY(
                  JSON_OBJECT('Level', 1,
                              'TableSchema', TABLE_SCHEMA,
                              'TableName', TABLE_NAME,
                              'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                  )
              ) AS ReferenceChain
         FROM information_schema.KEY_COLUMN_USAGE
        WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'a'
        GROUP BY TABLE_SCHEMA, TABLE_NAME;
+-------+--------------+------------+---------------------------------------------------------------------------------+
| Level | TABLE_SCHEMA | TABLE_NAME | ReferenceChain                                                                  |
+-------+--------------+------------+---------------------------------------------------------------------------------+
|     1 | db1          | b          | [{"Level": 1, "TableName": "b", "ColumnNames": ["a_id"], "TableSchema": "db1"}] |
|     1 | db1          | d          | [{"Level": 1, "TableName": "d", "ColumnNames": ["a_id"], "TableSchema": "db1"}] |
|     1 | db1          | e          | [{"Level": 1, "TableName": "e", "ColumnNames": ["a_id"], "TableSchema": "db1"}] |
+-------+--------------+------------+---------------------------------------------------------------------------------+
3 rows in set (0.01 sec)

Now, you can take each of these relations and look for tables having a foreign key to them, and so forth. That is the recursive part of the query. There is one complication though: GROUP BY is not allowed in the recursive part. The workaround is to use a subquery:

mysql> WITH RECURSIVE `reference` (Level, TABLE_SCHEMA, TABLE_NAME, ReferenceChain) AS (
            SELECT 1, TABLE_SCHEMA, TABLE_NAME,
                   JSON_ARRAY(
                        JSON_OBJECT('Level', 1,
                                    'TableSchema', TABLE_SCHEMA,
                                    'TableName', TABLE_NAME,
                                    'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                        )
                   )
              FROM information_schema.KEY_COLUMN_USAGE
             WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'a'
             GROUP BY TABLE_SCHEMA, TABLE_NAME
             UNION
            SELECT r.Level+1, ko.TABLE_SCHEMA, ko.TABLE_NAME,
                   JSON_ARRAY_APPEND(
                       ReferenceChain, '$',
                       (SELECT JSON_OBJECT('Level', r.Level+1,
                                           'TableSchema', TABLE_SCHEMA,
                                           'TableName', TABLE_NAME,
                                           'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                               )
                          FROM information_schema.KEY_COLUMN_USAGE ki
                         WHERE ki.TABLE_SCHEMA = ko.TABLE_SCHEMA
                               AND TABLE_NAME = ko.TABLE_NAME
                               AND ki.REFERENCED_TABLE_SCHEMA = ko.REFERENCED_TABLE_SCHEMA
                               AND ki.REFERENCED_TABLE_NAME = ko.REFERENCED_TABLE_NAME)
                   )
              FROM `reference` r
                   INNER JOIN information_schema.KEY_COLUMN_USAGE ko ON
                         ko.REFERENCED_TABLE_SCHEMA = r.TABLE_SCHEMA
                         AND ko.REFERENCED_TABLE_NAME = r.TABLE_NAME
       )
       SELECT Level, TABLE_SCHEMA, TABLE_NAME,
              JSON_PRETTY(ReferenceChain) AS ReferenceChain
         FROM `reference`\G
*************************** 1. row ***************************
         Level: 1
  TABLE_SCHEMA: db1
    TABLE_NAME: b
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 2. row ***************************
         Level: 1
  TABLE_SCHEMA: db1
    TABLE_NAME: d
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "d",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 3. row ***************************
         Level: 1
  TABLE_SCHEMA: db1
    TABLE_NAME: e
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "e",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 4. row ***************************
         Level: 2
  TABLE_SCHEMA: db1
    TABLE_NAME: c
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  }
]
*************************** 5. row ***************************
         Level: 2
  TABLE_SCHEMA: db1
    TABLE_NAME: f
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "f",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  }
]
*************************** 6. row ***************************
         Level: 3
  TABLE_SCHEMA: db1
    TABLE_NAME: d
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 3,
    "TableName": "d",
    "ColumnNames": [
      "c_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 7. row ***************************
         Level: 3
  TABLE_SCHEMA: db1
    TABLE_NAME: f
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 3,
    "TableName": "f",
    "ColumnNames": [
      "c_id"
    ],
    "TableSchema": "db1"
  }
]
7 rows in set (0.02 sec)

Here the ARRAY_APPEND() function is used to add the next part of the chain to ReferenceChain. The query relies on that the UNION is a UNION DISTINCT by default, so for the cases where there are two columns in the foreign key, the second (duplicate) row is automatically filtered out. For the main query, JSON_PRETTY() is used to make it easier to read the JSON document. If you are using the query in an application, this is not needed.

You can stop here. The result is correct. However, you may think there are more rows than you would expect. For example the chain a → b is there on its own (1st row) even though there are also tables with foreign keys to b. If you want to include subchains in the result, then you are all set. If you want to filter chains out that are part of another chain, a little more work is needed.

To filter out chains that are also included in subsequent rows, it is in one way or another necessary to keep track of whether a row has any child rows (i.e. that a subsequent row is generated based on the row). One way to do this is to have a serialized form of the chain, however the disadvantage is that you don’t know how long a string you need to store that (and the string length must be specified in the seed query). Another option is to generate an ID for each row – for example using the UUID() function. Then in rows generated from the row make a reference to the parent row. This is the option used here.

A disadvantage of this approach is that for tables with more then one column in the foreign key, the two rows generated are no longer identical. So, it is necessary to handle this in the main query. However, it is now easy to only include the end of the chains as these will not have another row with the parent ID set to the row’s ID. To find this, use a LEFT OUTER JOIN and look for rows where the optional row returns a NULL ID (that is, a row was not found).

Final Query

The final query thus becomes:

mysql> WITH RECURSIVE `reference` (ID, ParentID, Level,
                                   TABLE_SCHEMA, TABLE_NAME,
                                   ReferenceChain) AS (
            SELECT UUID(), CAST(NULL AS CHAR(36)), 1,
                   TABLE_SCHEMA, TABLE_NAME,
                   JSON_ARRAY(
                       JSON_OBJECT('Level', 1,
                                   'TableSchema', TABLE_SCHEMA,
                                   'TableName', TABLE_NAME,
                                   'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                       )
                   )
              FROM information_schema.KEY_COLUMN_USAGE
             WHERE REFERENCED_TABLE_SCHEMA = 'db1'
                   AND REFERENCED_TABLE_NAME = 'a'
             GROUP BY TABLE_SCHEMA, TABLE_NAME
             UNION
            SELECT UUID(), r.ID, r.Level+1,
                   ko.TABLE_SCHEMA, ko.TABLE_NAME,
                   JSON_ARRAY_APPEND(
                       ReferenceChain, '$',
                       (SELECT JSON_OBJECT('Level', r.Level+1,
                                           'TableSchema', TABLE_SCHEMA,
                                           'TableName', TABLE_NAME,
                                           'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                               )
                          FROM information_schema.KEY_COLUMN_USAGE ki
                         WHERE ki.TABLE_SCHEMA = ko.TABLE_SCHEMA
                               AND TABLE_NAME = ko.TABLE_NAME
                               AND ki.REFERENCED_TABLE_SCHEMA = ko.REFERENCED_TABLE_SCHEMA
                               AND ki.REFERENCED_TABLE_NAME = ko.REFERENCED_TABLE_NAME)
                   )
              FROM `reference` r
                   INNER JOIN information_schema.KEY_COLUMN_USAGE ko ON
                         ko.REFERENCED_TABLE_SCHEMA = r.TABLE_SCHEMA
                         AND ko.REFERENCED_TABLE_NAME = r.TABLE_NAME
       )
       SELECT DISTINCT r.Level AS NumLevels,
              JSON_PRETTY(r.ReferenceChain) AS ReferenceChain
         FROM `reference` r
              LEFT OUTER JOIN `reference` rc ON rc.ParentID = r.ID
        WHERE rc.ID IS NULL\G
*************************** 1. row ***************************
     NumLevels: 1
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "d",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 2. row ***************************
     NumLevels: 1
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "e",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 3. row ***************************
     NumLevels: 2
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "f",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  }
]
*************************** 4. row ***************************
     NumLevels: 3
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 3,
    "TableName": "d",
    "ColumnNames": [
      "c_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 5. row ***************************
     NumLevels: 3
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 3,
    "TableName": "f",
    "ColumnNames": [
      "c_id"
    ],
    "TableSchema": "db1"
  }
]
5 rows in set (0.03 sec)

The DISTINCT in the main part of the query ensures that duplicates due to multiple columns in the foreign key are filtered out.

Note

One thing this version of the query does not handle is circular key relations. For example if you add the column c_id to a with a foreign key to the c table, then an infinite number of chains will be created. So, there need to be a condition that detects when a loop is getting created. That is an exercise for the reader – or for a later blog.

Thus, this schema has five unique chains leading to the a tables. You can also verify this from the EER diagram – for reference, here it is again:

MySQL Workbench EER Diagram
MySQL Workbench EER Diagram

What Does I/O Latencies and Bytes Mean in the Performance and sys Schemas?

The MEM InnoDB File I/O Graphs showing a peak in latency and bytes at 12:51.

The Performance Schema and sys schema are great for investigating what is going on in MySQL including investigating performance issues. In my work in MySQL Support, I have several times heard questions whether a peak in the InnoDB Data File I/O – Latency graph in MySQL Enterprise Monitor (MEM) or some values from the corresponding tables and view in the Performance Schema and sys schema are cause for concern. This blog will discuss what these observations means and how to use them.

The Tables and Views Involved

This blog will look into three sources in the Performance Schema for I/O latencies, so let’s first take a look at those tables. The three Performance Schema tables are:

  • events_waits_summary_global_by_event_name: with the event name set to wait/io/table/sql/handler or wait/io/file/%. This is the table used for the waits_global_by_latency and wait_classes_global_by_% views in the sys schema.
  • table_io_waits_summary_by_table: this is the table used for the schema_table_statistics% views in the sys schema.
  • file_summary_by_instance: this is the table used for the io_global_by_file_by% views in the sys schema.

These are also the sources used in MySQL Enterprise Monitor for the InnoDB Data File I/O graphs shown above. Let’s take a look at an example of each of the tables.

events_waits_summary_global_by_event_name

The events_waits_summary_global_by_event_name has aggregate data for wait events grouped by the event name. For the purpose of this discussion it is the table and file wait/io events that are of interested. An example of the data returned is:

mysql> SELECT *
         FROM performance_schema.events_waits_summary_global_by_event_name
        WHERE EVENT_NAME = 'wait/io/table/sql/handler'
              OR EVENT_NAME LIKE 'wait/io/file/%'
        ORDER BY SUM_TIMER_WAIT DESC
        LIMIT 5;
+--------------------------------------+------------+----------------+----------------+----------------+----------------+
| EVENT_NAME                           | COUNT_STAR | SUM_TIMER_WAIT | MIN_TIMER_WAIT | AVG_TIMER_WAIT | MAX_TIMER_WAIT |
+--------------------------------------+------------+----------------+----------------+----------------+----------------+
| wait/io/file/innodb/innodb_log_file  |    4003029 | 97371921796204 |        5371684 |       24324412 |   180878537324 |
| wait/io/table/sql/handler            |    4049559 | 43338753895440 |         494128 |       10702072 |   138756437188 |
| wait/io/file/innodb/innodb_data_file |      25850 | 11395061934996 |              0 |      440814508 |    43029060332 |
| wait/io/file/sql/binlog              |      20041 |  1316679917820 |              0 |       65699088 |    25816580304 |
| wait/io/file/sql/io_cache            |       2439 |    68212824648 |        1448920 |       27967360 |      628484180 |
+--------------------------------------+------------+----------------+----------------+----------------+----------------+
5 rows in set (0.00 sec)

mysql> SELECT EVENT_NAME, COUNT_STAR,
              sys.format_time(SUM_TIMER_WAIT) AS SumTimerWait,
              sys.format_time(MIN_TIMER_WAIT) AS MinTimeWait,
              sys.format_time(AVG_TIMER_WAIT) AS AvgTimerWait,
              sys.format_time(MAX_TIMER_WAIT) AS MaxTimerWait
         FROM performance_schema.events_waits_summary_global_by_event_name
        WHERE EVENT_NAME = 'wait/io/table/sql/handler'
              OR EVENT_NAME LIKE 'wait/io/file/
%' ORDER BY SUM_TIMER_WAIT DESC LIMIT 5;
+--------------------------------------+------------+--------------+-------------+--------------+--------------+
| EVENT_NAME                           | COUNT_STAR | SumTimerWait | MinTimeWait | AvgTimerWait | MaxTimerWait |
+--------------------------------------+------------+--------------+-------------+--------------+--------------+
| wait/io/file/innodb/innodb_log_file  |    4003029 | 1.62 m       | 5.37 us     | 24.32 us     | 180.88 ms    |
| wait/io/table/sql/handler            |    4049559 | 43.34 s      | 494.13 ns   | 10.70 us     | 138.76 ms    |
| wait/io/file/innodb/innodb_data_file |      25853 | 11.40 s      | 0 ps        | 440.78 us    | 43.03 ms     |
| wait/io/file/sql/binlog              |      20041 | 1.32 s       | 0 ps        | 65.70 us     | 25.82 ms     |
| wait/io/file/sql/io_cache            |       2439 | 68.21 ms     | 1.45 us     | 27.97 us     | 628.48 us    |
+--------------------------------------+------------+--------------+-------------+--------------+--------------+
5 rows in set (0.01 sec)

mysql> SELECT *
         FROM sys.waits_global_by_latency
        WHERE events = 'wait/io/table/sql/handler'
              OR events LIKE 'wait/io/file/%'
        LIMIT 5;
+--------------------------------------+---------+---------------+-------------+-------------+
| events                               | total   | total_latency | avg_latency | max_latency |
+--------------------------------------+---------+---------------+-------------+-------------+
| wait/io/file/innodb/innodb_log_file  | 4003029 | 1.62 m        | 24.32 us    | 180.88 ms   |
| wait/io/table/sql/handler            | 4049559 | 43.34 s       | 10.70 us    | 138.76 ms   |
| wait/io/file/innodb/innodb_data_file |   25874 | 11.43 s       | 441.88 us   | 43.03 ms    |
| wait/io/file/sql/binlog              |   20131 | 1.32 s        | 65.66 us    | 25.82 ms    |
| wait/io/file/sql/io_cache            |    2439 | 68.21 ms      | 27.97 us    | 628.48 us   |
+--------------------------------------+---------+---------------+-------------+-------------+
5 rows in set (0.01 sec)

These three queries show the same data, just obtained and displayed in different ways.

In the result there are two groups of events. The wait/io/table events (the wait/io/table/sql/handler is the only event of this group which is why it can be listed explicitly) and the wait/io/file group.

The table events are for accessing data in tables. It does not matter whether the data is cached in the buffer pool or is accessed on disk. In this table and view, there is no distinguishing between different types of access (read, write, etc.).

The file events are, as the name suggest, for actually accessing files. There is one file event per file type. For example, in he output there are the wait/io/file/innodb/innodb_log_file event for accessing the InnoDB redo log files, the wait/io/file/innodb/innodb_data_file event for accessing the InnoDB data files themselves, the wait/io/file/sql/binlog event for the binary log files, etc.

In the second query, all of the timings are wrapped in the sys.format_time() function. The timings returned by the Performance Schema are in picoseconds (10^-12 second) which are somewhat hard for us humans to read. The sys.format_time() function converts the picoseconds to human readable strings. When you sort, however, make sure you sort by the original non-converted values.

Tip

Use the sys.format_time() function to convert the picoseconds to a human readable value, but do only so for the returned row; sort and filter by the original values. The Performance Schema always returns timings in picoseconds irrespective of the timer used internally for the event.

The sys schema by default returns the timing as human readable values. If you need the values in picoseconds prefix the table name with x$, for example sys.x$waits_global_by_latency. The sys schema includes an ORDER BY clause in most views. For the waits_global_by_latency view, the default ordering is by the total latency, so there is no need to add an ORDER BY clause in this example.

table_io_waits_summary_by_table

The table_io_waits_summary_by_table Performance Schema table and schema_table_statistics% sys schema views are related to the wait/io/table/sql/handler event just discussed. These provide information about the amount of time spent per table. Unlike querying the wait/io/table/sql/handler in the wait_events_% tables, it split the time spent into whether it is used for reads, writes, fetch, insert, update, or delete. The read and write columns are the aggregates for the corresponding read and write operations, respectively. Since fetch is the only read operation, the read and fetch columns will have the same values.

The table and view show the table I/O, i.e. the access to table data irrespective of whether the data is accessed in-memory or on disk. This is similar to the wait/io/table/sql/handler event. An example of the result of querying the table and view for the employees.salaries table is:

mysql> SELECT *
         FROM performance_schema.table_io_waits_summary_by_table
        WHERE OBJECT_SCHEMA = 'employees'
              AND OBJECT_NAME = 'salaries'\
*************************** 1. row ***************************
     OBJECT_TYPE: TABLE
   OBJECT_SCHEMA: employees
     OBJECT_NAME: salaries
      COUNT_STAR: 2844047
  SUM_TIMER_WAIT: 31703741623644
  MIN_TIMER_WAIT: 4975456
  AVG_TIMER_WAIT: 11147072
  MAX_TIMER_WAIT: 138756437188
      COUNT_READ: 0
  SUM_TIMER_READ: 0
  MIN_TIMER_READ: 0
  AVG_TIMER_READ: 0
  MAX_TIMER_READ: 0
     COUNT_WRITE: 2844047
 SUM_TIMER_WRITE: 31703741623644
 MIN_TIMER_WRITE: 4975456
 AVG_TIMER_WRITE: 11147072
 MAX_TIMER_WRITE: 138756437188
     COUNT_FETCH: 0
 SUM_TIMER_FETCH: 0
 MIN_TIMER_FETCH: 0
 AVG_TIMER_FETCH: 0
 MAX_TIMER_FETCH: 0
    COUNT_INSERT: 2844047
SUM_TIMER_INSERT: 31703741623644
MIN_TIMER_INSERT: 4975456
AVG_TIMER_INSERT: 11147072
MAX_TIMER_INSERT: 138756437188
    COUNT_UPDATE: 0
SUM_TIMER_UPDATE: 0
MIN_TIMER_UPDATE: 0
AVG_TIMER_UPDATE: 0
MAX_TIMER_UPDATE: 0
    COUNT_DELETE: 0
SUM_TIMER_DELETE: 0
MIN_TIMER_DELETE: 0
AVG_TIMER_DELETE: 0
MAX_TIMER_DELETE: 0
1 row in set (0.00 sec)

mysql> SELECT *
         FROM sys.schema_table_statistics
        WHERE table_schema = 'employees'
              AND table_name = 'salaries'\G
*************************** 1. row ***************************
     table_schema: employees
       table_name: salaries
    total_latency: 31.70 s
     rows_fetched: 0
    fetch_latency: 0 ps
    rows_inserted: 2844047
   insert_latency: 31.70 s
     rows_updated: 0
   update_latency: 0 ps
     rows_deleted: 0
   delete_latency: 0 ps
 io_read_requests: 493
          io_read: 7.70 MiB
  io_read_latency: 611.04 ms
io_write_requests: 8628
         io_write: 134.91 MiB
 io_write_latency: 243.19 ms
 io_misc_requests: 244
  io_misc_latency: 2.50 s
1 row in set (0.05 sec)

In this case it shows that there has been mostly writes – inserts – for the table. The sys schema view effectively joins on the performance_schema.file_summary_by_instance for the read columns, so for the schema_table_statistics view fetch and read are not synonyms.

So, what is it the file_summary_by_instance table shows that is different table the “table I/O” that has been the topic of the first two tables? Let’s see.

file_summary_by_instance

Unlike the two previous tables, the file_summary_by_instance shows how much time is spent on actual file I/O and how much data is accessed. This makes the file_summary_by_instance table and the corresponding sys schema views very useful for determining where time is spent doing disk I/O and which files have the most data accesses on disk.

An example of using the Performance Schema and two of the sys schema views is:

mysql> SELECT *
         FROM performance_schema.file_summary_by_instance
        WHERE FILE_NAME LIKE '%\\\\employees\\\\salaries.ibd'\G
*************************** 1. row ***************************
                FILE_NAME: C:\ProgramData\MySQL\MySQL Server 8.0\Data\employees\salaries.ibd
               EVENT_NAME: wait/io/file/innodb/innodb_data_file
    OBJECT_INSTANCE_BEGIN: 2230271392896
               COUNT_STAR: 9365
           SUM_TIMER_WAIT: 3351594917896
           MIN_TIMER_WAIT: 11970500
           AVG_TIMER_WAIT: 357885020
           MAX_TIMER_WAIT: 40146113948
               COUNT_READ: 493
           SUM_TIMER_READ: 611040652056
           MIN_TIMER_READ: 65421052
           AVG_TIMER_READ: 1239433224
           MAX_TIMER_READ: 16340582272
 SUM_NUMBER_OF_BYTES_READ: 8077312
              COUNT_WRITE: 8628
          SUM_TIMER_WRITE: 243186542696
          MIN_TIMER_WRITE: 11970500
          AVG_TIMER_WRITE: 28185588
          MAX_TIMER_WRITE: 1984546920
SUM_NUMBER_OF_BYTES_WRITE: 141459456
               COUNT_MISC: 244
           SUM_TIMER_MISC: 2497367723144
           MIN_TIMER_MISC: 154910196
           AVG_TIMER_MISC: 10235113564
           MAX_TIMER_MISC: 40146113948
1 row in set (0.00 sec)

mysql> SELECT *
         FROM sys.io_global_by_file_by_latency
        WHERE file = '@@datadir\\employees\\salaries.ibd'\G
*************************** 1. row ***************************
         file: @@datadir\employees\salaries.ibd
        total: 9365
total_latency: 3.35 s
   count_read: 493
 read_latency: 611.04 ms
  count_write: 8628
write_latency: 243.19 ms
   count_misc: 244
 misc_latency: 2.50 s
1 row in set (0.09 sec)

mysql> SELECT *
         FROM sys.io_global_by_file_by_bytes
         WHERE file = '@@datadir\\employees\\salaries.ibd'\G
*************************** 1. row ***************************
         file: @@datadir\employees\salaries.ibd
   count_read: 493
   total_read: 7.70 MiB
     avg_read: 16.00 KiB
  count_write: 8628
total_written: 134.91 MiB
    avg_write: 16.01 KiB
        total: 142.61 MiB
    write_pct: 94.60
1 row in set (0.10 sec)

This example is from Microsoft Windows, and as always when backslashes are in play, it is fun to try to determine the appropriate number of backslashes to use. When specifying the file name with LIKE, you need four backslashes per backslash in the file name; when using = you need two backslashes.

Again, the values are split into reads and writes (though not as detailed as before with fetch, insert, update, and delete – that is not known at the level where the file I/O happens). The miscellaneous values include everything that is not considered reads or write; this includes opening and closing the file.

The sys schema queries not only have formatted the timings, but also the path and the bytes. This has been done using the sys.format_path() and sys.format_bytes() functions, respectively.

From the output, it can be seen that despite no rows were ever fetched (read) from the employees.salaries table (that was found in the previous output), there has still been some read file I/O. This was what the sys.schema_table_statistics view reflected.

So, what does all of this mean? The graph in MySQL Enterprise Monitor showed that there was six seconds file I/O latency for the InnoDB data files. Is that bad? As often with these kinds of questions, the answer is: “it depends”.

What to Make of the Values

In this case we have a case where the graphs in MySQL Enterprise Monitor show a peak for the latency and bytes used doing I/O on the InnoDB data files. This is actually disk I/O. But what exactly does that means and should the alarm sound?

Let’s first refresh our memory on how the graphs looks:

The MEM InnoDB File I/O Graphs showing a peak in latency and bytes at 12:31.

If you are not using MySQL Enterprise Monitor, you may have similar graphs from your monitoring solution, or you have obtained latency and bytes values from the tables and views discussed in this blog.

The latency graph shows that we have done six seconds if I/O. What does that mean? It is the aggregate I/O during the period the data was collected. In this case, the data is plotted for every minute, so in the one minute around 12:51, of the 60 seconds a total of six seconds was spent doing I/O. Now, the six seconds suddenly do no sound so bad. Similar for the bytes, around 4.6MiB of data was read or written.

In general, the values obtained either from the monitoring graphs or from the underlying tables cannot be used to conclude whether the is a problem or not. They just show how much I/O was done at different times.

Similar for the values from the Performance Schema. On their own they do not tell much. You can almost say they are too neutral – they just state how much work was done, not whether it was too much or too little.

A more useful way to use this data is in case a problem is reported. This can be that system administrator reports the disks are 100% utilized or that end users report the system is slow. Then, you can go and look at what happened. If the disk I/O was unusually high at that point in time, then that is likely related, and you can continue your investigation from there.

There are more reports in MySQL Enterprise Monitor both as time series graphs and as point-in-time snapshots. The point-in-time snapshots are often using the sys schema views but allows sorting. An example is the Database File I/O reports:

MEM's Database File I/O Report

MySQL Workbench also provides performance reports based on the sys scheme. The equivalent to the previous report is the Top File I/O Activity Report:

MySQL Workbench's Top File I/O Activity Report

The MySQL Workbench performance reports also allows you to export the report or copy the query used to generate the report, so you can execute it manually.

With respect to the wait/io/table/sql/handler events, then remember that I/O here does not mean disk I/O, but table I/O. So, all it means that time is accumulating for these events – including when looking at the per table is that the data in the table is used. There are also per index values in table_io_waits_summary_by_index_usage Performance Schema table and the schema_index_statistics sys view which have the same meaning. Whether a given usage is too high depends on many things. However, it can again be useful to investigate what is causing problems.

For the example data from this blog, it was triggered by loading the employees sample database. So, there was no problem. If you want to put data into your database, it must necessarily be written to the data file, otherwise it will not be persisted. However, if you think the data import took too long, you can use the data as part of your investigation on how to make the import faster.

The conclusion is that you should not panic if you see the I/O latencies peak. On their own they just mean that more work was done during that period that usual. Without context a peak is no worse than a dip (which could indicate something is preventing work from being done or that there is a natural slow period). Instead use the I/O latencies and bytes together with other observations to understand where MySQL is spending time and for which files data is being read and written.

References

I will recommend you look at the following references, if you want to understand more about the Performance Schema tables and sys schema views discussed in this blog:

The GitHub repository for the sys schema includes all the definitions of the views (and other objects). Since these are written in plain SQL, they are very useful to see in more depth where the data is coming from. The GitHub website allows you to browse through each of the files. Each sys schema object is defined in its own file.