InnoDB Locks Analysis: Why is Blocking Query NULL and How To Find More Information About the Blocking Transaction?

This post was originally published on the MySQL Support Team Blog at https://blogs.oracle.com/mysqlsupport/entry/innodb_locks_analysis_why_is on 14 April 2017.

Consider the scenario that you execute a query. You expect it to be fast – typically subsecond – but now it take not return until after 50 seconds (innodb_lock_wait_timeout seconds) and then it returns with an error:

mysql> UPDATE world.City SET Population = Population + 999 WHERE ID = 130;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

You continue to investigate the issue using the sys.innodb_lock_waits view or the underlying Information Schema tables (INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS).

Note

The above Information Schema tables with lock and lock waits information have been moved to the Performance Schema in 8.0 as the data_locks and data_lock_waits tables. The sys schema view however works the same.

However, when you query the locks information, the blocking query is returned as NULL. What does that mean and how to proceed from that to get information about the blocking transaction?

Setting Up an Example

Before proceeding, lets set up an example which will be investigated later in the blog. The example can be set up as (do not disconnect Connection 1 when the queries have been executed):

Step 1 – Connection 1:

Connection 1> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

Connection 1> UPDATE world.City SET Population = Population + 1 WHERE ID = 130;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Connection 1> UPDATE world.City SET Population = Population + 1 WHERE ID = 131;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Connection 1> UPDATE world.City SET Population = Population + 1 WHERE ID = 132;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Connection 1> UPDATE world.City SET Population = Population + 1 WHERE ID = 133;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Step 2 – Connection 2:

Connection 2> UPDATE world.City SET Population = Population + 999 WHERE ID = 130;

Step 3 – Connection 3:

The following output while Connection 2 is still blokcing from sys.innodb_lock_waits shows that the blocking query is NULL (slightly reformatted):

Connection 3> SELECT * FROM sys.innodb_lock_waits\G
*************************** 1. row ***************************
                wait_started: 2017-04-15 09:54:53
                    wait_age: 00:00:03
               wait_age_secs: 3
                locked_table: `world`.`City`
                locked_index: PRIMARY
                 locked_type: RECORD
              waiting_trx_id: 2827
         waiting_trx_started: 2017-04-15 09:54:53
             waiting_trx_age: 00:00:03
     waiting_trx_rows_locked: 1
   waiting_trx_rows_modified: 0
                 waiting_pid: 5
               waiting_query: UPDATE world.City SET Populati ... opulation + 999
                              WHERE ID = 130
             waiting_lock_id: 2827:24:6:41
           waiting_lock_mode: X
             blocking_trx_id: 2826
                blocking_pid: 3
              blocking_query: NULL
            blocking_lock_id: 2826:24:6:41
          blocking_lock_mode: X
        blocking_trx_started: 2017-04-15 09:54:46
            blocking_trx_age: 00:00:10
    blocking_trx_rows_locked: 4
  blocking_trx_rows_modified: 4
     sql_kill_blocking_query: KILL QUERY 3
sql_kill_blocking_connection: KILL 3
1 row in set, 3 warnings (0.00 sec)

Connection 3> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1681 Message: 'INFORMATION_SCHEMA.INNODB_LOCK_WAITS' is deprecated
         and will be removed in a future release.
*************************** 2. row ***************************
  Level: Warning
   Code: 1681 Message: 'INFORMATION_SCHEMA.INNODB_LOCKS' is deprecated and
         will be removed in a future release.
*************************** 3. row ***************************
  Level: Warning
   Code: 1681 Message: 'INFORMATION_SCHEMA.INNODB_LOCKS' is deprecated and
         will be removed in a future release. 3 rows in set (0.00 sec)

The warnings will only occur in the 5.7.14 and later as the InnoDB lock tables being moved to the Performance Schema in MySQL 8.0. It is recommended to use the sys.innodb_lock_waits view as that is updated accordingly in MySQL 8.0.

Investigating Idle Transactions

To investigate idle transactions, you need to use the Performance Schema to get this information. First determine the Performance Schema thread id for the blocking transaction. For this you need the blocking_pid, in the above example:

                blocking_pid: 3

and use this with the The threads Table table like:

Connection 3> SELECT THREAD_ID FROM performance_schema.threads WHERE PROCESSLIST_ID = 3;
+-----------+
| THREAD_ID |
+-----------+
|        28 |
+-----------+
1 row in set (0.00 sec)

For the following queries insert the thread id found above for the THREAD_ID = … where clauses.

To get the latest query executed, use the The events_statements_current Table table or the The session and x$session Views view:

Connection 3> SELECT THREAD_ID, SQL_TEXT
                FROM performance_schema.events_statements_current
               WHERE THREAD_ID = 28;
+-----------+------------------------------------------------------------------+
| THREAD_ID | SQL_TEXT                                                         |
+-----------+------------------------------------------------------------------+
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 133 |
+-----------+------------------------------------------------------------------+
1 row in set (0.00 sec)

or:

Connection 3> SELECT * FROM sys.session WHERE thd_id = 28\G
*************************** 1. row ***************************
                thd_id: 28
               conn_id: 3
                  user: root@localhost
                    db: NULL
               command: Sleep
                 state: NULL
                  time: 447
     current_statement: NULL
     statement_latency: NULL
              progress: NULL
          lock_latency: 117.00 us
         rows_examined: 1
             rows_sent: 0
         rows_affected: 1
            tmp_tables: 0
       tmp_disk_tables: 0
             full_scan: NO
        last_statement: UPDATE world.City SET Population = Population + 1 WHERE ID = 133
last_statement_latency: 321.06 us
        current_memory: 0 bytes
             last_wait: NULL
     last_wait_latency: NULL
                source: NULL
           trx_latency: NULL
             trx_state: ACTIVE
        trx_autocommit: NO
                   pid: 7717
          program_name: mysql
1 row in set (0.08 sec)

In this case this does not explain why the lock is held as the last query update a different row then where the lock issue occurs. However if the events_statements_history consumer is enabled (it is by default in MySQL 5.7 and later), the The events_statements_history Table table will include the last 10 statements (by default) executed for the connection:

Connection 3> SELECT THREAD_ID, SQL_TEXT
                FROM performance_schema.events_statements_history
               WHERE THREAD_ID = 28
               ORDER BY EVENT_ID;
+-----------+------------------------------------------------------------------+
| THREAD_ID | SQL_TEXT                                                         |
+-----------+------------------------------------------------------------------+
|        28 | SELECT DATABASE()                                                |
|        28 | NULL                                                             |
|        28 | show databases                                                   |
|        28 | show tables                                                      |
|        28 | START TRANSACTION                                                |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 130 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 131 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 132 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 133 |
+-----------+------------------------------------------------------------------+
9 rows in set (0.00 sec)

So now the history of the blocking transaction can be seen and it is possible to determine why the locking issue occur.

Note

The history also includes some queries executed before the transaction started. These are not related to the locking issue.

If transaction monitoring is also enabled (only available in MySQL 5.7 and later), it is possible to get more information about the transaction and automatically limit the query of the history to the current transaction. Transaction monitoring is not enabled by default. To enable it, use:

mysql> UPDATE performance_schema.setup_consumers
          SET ENABLED = 'YES'
        WHERE NAME = 'events_transactions_current';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE performance_schema.setup_instruments
          SET ENABLED = 'YES'
        WHERE NAME = 'transaction';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Information

This must be done before either of the transactions is started. Only transaction started after the transaction monitoring is enabled will be instrumented.

If the above was enabled before the blocking transaction started, you can get more details about the blocking transaction as:

Connection 3> SELECT *
                FROM performance_schema.events_transactions_current
               WHERE THREAD_ID = 28\G
*************************** 1. row ***************************
                      THREAD_ID: 28
                       EVENT_ID: 12
                   END_EVENT_ID: NULL
                     EVENT_NAME: transaction
                          STATE: ACTIVE
                         TRX_ID: NULL
                           GTID: AUTOMATIC
                  XID_FORMAT_ID: NULL
                      XID_GTRID: NULL
                      XID_BQUAL: NULL
                       XA_STATE: NULL
                         SOURCE: transaction.cc:209
                    TIMER_START: NULL
                      TIMER_END: NULL
                     TIMER_WAIT: NULL
                    ACCESS_MODE: READ WRITE
                ISOLATION_LEVEL: REPEATABLE READ
                     AUTOCOMMIT: NO
           NUMBER_OF_SAVEPOINTS: 0
NUMBER_OF_ROLLBACK_TO_SAVEPOINT: 0
    NUMBER_OF_RELEASE_SAVEPOINT: 0
          OBJECT_INSTANCE_BEGIN: NULL
               NESTING_EVENT_ID: 11
             NESTING_EVENT_TYPE: STATEMENT
1 row in set (0.00 sec)

And to get the statement history of the transaction:

Connection 3> SELECT t.THREAD_ID, s.SQL_TEXT
                FROM performance_schema.events_transactions_current t
                     INNER JOIN performance_schema.events_statements_history s
                             ON s.THREAD_ID = t.THREAD_ID
                            AND s.NESTING_EVENT_ID = t.EVENT_ID
               WHERE t.THREAD_ID = 28
               ORDER BY s.EVENT_ID;
+-----------+------------------------------------------------------------------+
| THREAD_ID | SQL_TEXT                                                         |
+-----------+------------------------------------------------------------------+
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 130 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 131 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 132 |
|        28 | UPDATE world.City SET Population = Population + 1 WHERE ID = 133 |
+-----------+------------------------------------------------------------------+
4 rows in set (0.00 sec)

Working Around MySQL Cluster Push Down Limitations Using Subqueries

This post was originally published on the MySQL Support Team Blog at https://blogs.oracle.com/mysqlsupport/entry/working_around_mysql_cluster_push on 5 August 2016.

I worked on an issue recently where a query was too slow when executed in MySQL Cluster. The issue was that Cluster has some restrictions when it comes to push down conditions.

As an example of this, consider the following query using the employees sample database. The query takes a look at the average salary based on how many years the employee has been with the company. As the latest hire date in the database is in January 2000, the query uses 1 February 2000 as the reference date.

Initially the query performs like (performance is with two data nodes and all nodes in the same virtual machine on a laptop, so the timings are not necessarily representative of a production system, though the improvements should be repeatable):

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
+-----------------+-------------+------------+
| LengthOfService | NoEmployees | AvgSalary  |
+-----------------+-------------+------------+
|               0 |        1230 | 53051.4211 |
|               1 |        3667 | 54488.7478 |
|               2 |        5893 | 55670.2121 |
|               3 |        8305 | 57162.7269 |
|               4 |       10647 | 58557.9498 |
|               5 |       12804 | 60132.0652 |
|               6 |       15267 | 61769.0817 |
|               7 |       17540 | 62992.4075 |
|               8 |       19290 | 64468.6666 |
|               9 |       21737 | 66047.8462 |
|              10 |       24056 | 67671.1557 |
|              11 |       26488 | 68748.2062 |
|              12 |       28294 | 70409.9716 |
|              13 |       30249 | 72006.3509 |
|              14 |       32572 | 73478.7101 |
|              15 |         297 | 74331.7798 |
+-----------------+-------------+------------+
16 rows in set (23.89 sec)

The straight join is needed as the performance is better than leaving the join order up to the optimizer.

The schema for the two tables in use is:

mysql> SHOW CREATE TABLE employees\G
*************************** 1. row ***************************
       Table: employees
Create Table: CREATE TABLE `employees` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`),
  CONSTRAINT `FK_130_154` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Why this poor performance? Looking at the EXPLAIN plan is the first step:

mysql> EXPLAIN EXTENDED SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService, COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary FROM salaries STRAIGHT_JOIN employees ON employees.emp_no = salaries.emp_no WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01' GROUP BY LengthOfService\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
         type: ALL
possible_keys: PRIMARY,emp_no
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2844047
     filtered: 100.00
        Extra: Using where with pushed condition ((`employees`.`salaries`.`from_date` <= '2000-02-01') and (`employees`.`salaries`.`to_date` >= '2000-02-01')); Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: employees.salaries.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set, 3 warnings (0.00 sec)

The EXPLAIN plan itself does not look bad – the index usage is as expected. However note the 3 warnings – one is the usual rewritten query after the application of rewriting and optimizer rules, but the two other gives more information why the performance is not what would be expected:

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: Table 'salaries' is not pushable: GROUP BY cannot be done using index on grouped columns.
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: Table 'employees' is not pushable: GROUP BY cannot be done using index on grouped columns.
*************************** 3. row ***************************
...

Here it can be seen that the tables are not pushable, because they are involved in the GROUP BY.

This gave me an idea. A couple of years ago, I wrote a post on the MySQL Server Blog about MySQL Server Blog: Better Performance for JOINs Not Using Indexes. These materialized temporary tables can also include auto keys that can improve the join performance significantly. What if a similar tactics is used with the above query?

Changing the join on the employees table to join a subquery:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN (SELECT * FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
...
16 rows in set (6.64 sec)

That is more than a factor 3 improvement.

The new EXPLAIN plan is:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: salaries
         type: ALL
possible_keys: PRIMARY,emp_no
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2844047
        Extra: Using where with pushed condition; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ref
possible_keys: <auto_key0>
          key: <auto_key0>
      key_len: 4
          ref: employees.salaries.emp_no
         rows: 10
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: employees
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 300024
        Extra: NULL
3 rows in set (0.01 sec)

Note how <auto_key0> is used for the join on the derived table. This is what helps make the rewrite work.

Actually we can do a bit better. The above subquery selects all columns from the employees table even though only the emp_no and hire_date columns are used. So the next rewrite is:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN (SELECT emp_no, hire_date FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
...
16 rows in set (5.74 sec)

The improvement of only including the columns needed will vary. For example if the internal temporary table ends up being converted into an on-disk table because of the extra data, or a covering index no longer can be used can increase the importance of only choosing the columns needed.

A final slight improvement can be gained by also converting the salaries table into a subquery:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM (SELECT emp_no, salary
                 FROM salaries
                WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
              ) AS salaries
              STRAIGHT_JOIN (SELECT emp_no, hire_date FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        GROUP BY LengthOfService;
...
16 rows in set (5.36 sec)

The total speed up is from a little under 24 seconds to a little over 5 second or more than a factor 4.

Note: the above rewrite will not work out of the box in MySQL Cluster 7.5. The reason is that it is based on MySQL Server 5.7 where the optimizer can perform more advanced rewrites than it was possible on MySQL 5.6. So by default the above subqueries will be rewritten to normal joins and you end up with the original query again. To use the tactics of subqueries in Cluster 7.5, it is necessary first to disable the optimizer switch allowing the optimizer to rewrite the subqueries to normal joins:

mysql> SET optimizer_switch='derived_merge=off';