Locking SELECT with CREATE TABLE, INSERT INTO, and User Variables

Locking is an important concept in databases. They help regulate access to the data, so your SELECT queries return consistent results and DML and DDL statements leave the data and schema in a consistent state. For the data, there are four different transaction isolation levels that influence which locks are taken. The most two commonly used isolation levels are REPEATABLE READ (the default in InnoDB) and READ COMMITTED (the default in some other databases). Both of those are said to provide non-locking reads, but there is a little more to it than that.

Selecting into a user variable causing a lock wait timeout.
Selecting into a user variable causing a lock wait timeout.

One case where reads are always locking is when you explicitly requests locks by adding the FOR SHARE or FOR UPDATE modifiers. However there are also cases where SELECT statements becomes locking due to the way the result of the statement is used. Two such cases are CREATE TABLE ... SELECT (INSERT INTO ... SELECT behaves the same way) and assigning the result of the statement to a user variable. Those two cases are the topic of this blog.

Tip

To make it easier to investigate the locks taken, a SLEEP(0.01) is inserted into the WHERE clause of the statements. The SLEEP() function returns 0 on success, so WHERE SLEEP(0.01) = 0 adds a 0.01 delay per row without changing the result of the query. The examples use the world database that can be downloaded from MySQL's page for other documentation.

CREATE TABLE .. SELECT / INSERT INTO … SELECT

First, you can investigate the locks taken by a CREATE TABLE ... SELECT and INSERT INTO ... SELECT statements. Since these two behaves the same for this example, only the CREATE TABLE version will be shown. The default transaction isolation level of REPEATABLE READ will be used:

Connection 1> SELECT PS_CURRENT_THREAD_ID();
+------------------------+
| PS_CURRENT_THREAD_ID() |
+------------------------+
|                     56 |
+------------------------+
1 row in set (0.0004 sec)

Connection 1> CREATE TABLE world._tmp_city
              SELECT * FROM world.city WHERE SLEEP(0.01) = 0;

Because of the SLEEP(0.01) in the WHERE clause, the statement will take around one minute to execute. The easiest way in MySQL 8.0 to determine which data locks are requested by a statement is to use the performance_schema.data_locks table. In this case, the locks on the city table is those of interest, and you get get those like (notice that the thread id found above is used to only get the locks for that one connection):

Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
                FROM performance_schema.data_locks
               WHERE THREAD_ID = 56
                     AND OBJECT_SCHEMA = 'world'
                     AND OBJECT_NAME = 'city'
               GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+------------+-----------+-----------+-------------+----------+
| INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | COUNT(*) |
+------------+-----------+-----------+-------------+----------+
| NULL       | TABLE     | IS        | GRANTED     |        1 |
| PRIMARY    | RECORD    | S         | GRANTED     |      525 |
+------------+-----------+-----------+-------------+----------+
2 rows in set (0.0042 sec)

The number of locks depend on how far into the execution the CREATE TABLE ... SELECT statement is. The outpuf rom the data_locks table shows there is an shared intention lock on the table and 525 shared record locks on the primary key (in InnoDB that is the row itself). At the end of the statement – just before the statement completes, all rows will be locked. What does that mean? If you try to update or delete one of the rows that are locked in a different connection, that statement will block and possibly encounter a lock wait timeout:

Connection 2> SET SESSION innodb_lock_wait_timeout = 2;
Query OK, 0 rows affected (0.0006 sec)

mysql> UPDATE world.city
          SET Population = Population + 1
        WHERE ID = 1;
ERROR: 1205: Lock wait timeout exceeded; try restarting transaction

In this case, the timeout is set to happen after two seconds to avoid having to wait for the error to occur.

So, this was with the REPEATABLE READ transaction isolation level. Is there a difference in READ COMMITTED? Yes, there is. In that case the CREATE TABLE ... SELECT and INSERT INTO ... SELECT statements do not take any locks on the city table.

Tip

The READ COMMITTED transaction isolation level takes fewer locks than REPEATABLE READ. This includes the case of CREATE TABLE ... SELECT and INSERT INTO ... SELECFT where the SELECT part is non-locking.

MySQL Concurrency

Book

If you want to learn more about locks and transactions in MySQL, then I have written MySQL Concurrency published by Apress. The book covers monitoring of locks and transactions, the various lock levels and types, how lock conflicts work, and six case studies investigating and reducing lock situations.

The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.

User Variables

The other use case, I want to discuss is storing the result of a SELECT statement in a user variable. This has an unexpected twist. As a start, consider finding the number of rows in the table and store the result in the @my_var user variable using the default REPEATABLE READ transaction isolation level:

Connection 1> SELECT PS_CURRENT_THREAD_ID();
+------------------------+
| PS_CURRENT_THREAD_ID() |
+------------------------+
|                     56 |
+------------------------+
1 row in set (0.0005 sec)

Connection 1> SET SESSION transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.0004 sec)

Connection 1> SET @my_var = (SELECT COUNT(*)
                               FROM world.city
                              WHERE SLEEP(0.01) = 0);

Selecting from the performance_schema.data_locks while the query is executing, gives a result similar to before:

Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
                FROM performance_schema.data_locks
               WHERE THREAD_ID = 56
                     AND OBJECT_SCHEMA = 'world'
                     AND OBJECT_NAME = 'city'
               GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+-------------+-----------+-----------+-------------+----------+
| INDEX_NAME  | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | COUNT(*) |
+-------------+-----------+-----------+-------------+----------+
| NULL        | TABLE     | IS        | GRANTED     |        1 |
| CountryCode | RECORD    | S         | GRANTED     |      248 |
+-------------+-----------+-----------+-------------+----------+
2 rows in set (0.0027 sec)

The optimiser in this case chooses to use the CountryCode index for counting the number of rows, but otherwise the locking situation is the same as before. It does however mean that you can update the row provided you do not use the CountryCode index to access the row or update the CountryCode index, so in that sense it is an improvement. (The optimiser will choose the execution plan it thinks is most efficient – has the lowest cost – so you mileage may vary.)

However, the big surprise is when you change to the READ COMMITTED transaction isolation level:

Connection 1> SET SESSION transaction_isolation = 'READ-COMMITTED';
Query OK, 0 rows affected (0.0005 sec)

Connection 1> SET @my_var = (SELECT COUNT(*)
                               FROM world.city
                              WHERE SLEEP(0.01) = 0);

Unlike before, assigning the value to a user variable still takes locks in the READ COMMITTED transaction isolation level:

Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
                FROM performance_schema.data_locks
               WHERE THREAD_ID = 56
                     AND OBJECT_SCHEMA = 'world'
                     AND OBJECT_NAME = 'city'
               GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+-------------+-----------+---------------+-------------+----------+
| INDEX_NAME  | LOCK_TYPE | LOCK_MODE     | LOCK_STATUS | COUNT(*) |
+-------------+-----------+---------------+-------------+----------+
| NULL        | TABLE     | IS            | GRANTED     |        1 |
| CountryCode | RECORD    | S,REC_NOT_GAP | GRANTED     |      271 |
+-------------+-----------+---------------+-------------+----------+
2 rows in set (0.0023 sec)

The locking is slightly reduced compared to REPEATABLE READ (no gap locks) but this is still enough to cause serious locking problems in a production system.

Warning

Assigning the result of a SELECT statement to a user variable can even in the READ COMMITTED transaction isolation level cause severe locking issues. The locks are held till the end of the transaction.

Why does it takes locks even in READ COMMITTED when you assign the result to a user variable? I have not been able to find any documentation of it, but by assigning the value to a variable, you can re-use the value including in statements that modify the data, so for data consistency it does make sense that the locks last till the end of the transaction.

I have worked with MySQL databases since 2006 both as an SQL developer, a database administrator, and for more than eight years as part of the Oracle MySQL Support team. I have spoken at MySQL Connect and Oracle OpenWorld on several occasions. I have contributed to the sys schema and four Oracle Certified Professional (OCP) exams for MySQL 5.6 to 8.0. I have written four books, all published at Apress.

6 Comments on “Locking SELECT with CREATE TABLE, INSERT INTO, and User Variables

  1. Hello, thanks for bloging about locking in InnoDB 🙂

    I’d like to clarify some points from your article.

    > (notice that the thread id found above is used to only get the locks for that one connection)

    The `performance_schema.data_locks.THREAD_ID` column is not intended to be used for searching for locks *belonging* to a given transaction – it’s the thread which has *created* the lock, which could’ve happen on behalf of another transaction.
    To identify rows *belonging* to transaction x, one should rather use `performance_schema.data_locks.ENGINE_TRANSACTION_ID = x`.
    See https://dev.mysql.com/doc/mysql-perfschema-excerpt/8.0/en/performance-schema-data-locks-table.html for reference.

    It’s often the case that a transaction creates a lock for itself, and then the two columns agree with each other.

    But it’s also possible that a lock gets created due to implicit-to-explicit lock conversion, which happens from a thread which is interested in the state of the lock queue for a given resource and thus wants to make everything explicit, and then this observer’s id will be put in the `performance_schema.data_locks.THREAD_ID` column because technically it’s this particular event in this particular thread’s timeline which *created* the lock object.
    The THREAD_ID column is meant to be used in combination with EVENT_ID column, to locate relevant events which lead to the *creation* of the lock. See:
    https://dev.mysql.com/doc/mysql-perfschema-excerpt/8.0/en/performance-schema-obtaining-parent-events.html

    To obtain current transaction’s id, you first need to ensure it is started (which is tricky, because simply doing `BEGIN` is not enough – for performance reasons, InnoDB lazily begins the transaction when proccessing the first real statement, such as `SELECT`, `UPDATE`, `DELETE`,..) and then can query `information_schema.innodb_trx` for `trx_mysql_thread_id = connection_id()`.
    https://dev.mysql.com/doc/refman/8.0/en/information-schema-innodb-trx-table.html

    > The optimiser in this case chooses to use the CountryCode index for counting the number of rows, but otherwise the locking situation is the same as before. It does however mean that you can update the row provided you do not use the CountryCode index to access the row, so in that sense it is an improvement.

    In this particular scenario you’ve shown, in which the `SELECT` query scans all rows, and thus also locks all records in the `CountryCode` index, it may indeed be possible to update rows in some ways, but not in others. Basically, `UPDATE` operations which modify the columns which do not affect the columns in `CountryCode` and the `PRIMARY KEY`, (say: `UPDATE world.city SET Population = Population+1 WHERE ID=1`) should be fine. But, if you try any `UPDATE` operation which changes the mapping from `CountryCode`’s columns to `PRIMARY KEY` (say: `UPDATE world.city SET CountryCode = “Whatever” WHERE ID=1`), then even if the PRIMARY index was used to access the row, the `CountryCode` index still needs updating too, and this will be blocked by the `SELECT`’s lock. That is finding the record to modify can be handled by the PRIMARY index, but the change itself may require modifying several indexes and then the records in them must be properly locked. Obviously, `INSERT` and and `DELETE` operations also affect this mapping, thus might also block.

    > Why does it takes locks even in READ COMMITTED when you assign the result to a user variable? I have not been able to find any documentation of it, but by assigning the value to a variable, you can re-use the value including in statements that modify the data, so for data consistency it does make sense that the locks last till the end of the transaction.

    The `SET @var = (SELECT…)` syntax causes the locking.
    You can use a different syntax, which gives you more control over locking behaviour.
    When you use
    “`
    SELECT COUNT(*) FROM world.city WHERE SLEEP(0.01) = 0 INTO @my_var;
    “`
    in READ COMMITTED, then no locks are taken.
    If you use:
    “`
    SELECT COUNT(*) FROM world.city WHERE SLEEP(0.01) = 0 INTO @my_var FOR UPDATE;
    “`
    then the locks are taken.
    See https://dev.mysql.com/doc/refman/8.0/en/select-into.html for reference.
    Also, you might be intersted in discussion https://bugs.mysql.com/bug.php?id=67452 from 2012.

    Best regards!

    • Hi Jakub,

      Thanks for the detailed comment.

      The `performance_schema.data_locks.THREAD_ID` column is not intended to be used for searching for locks *belonging* to a given transaction – it’s the thread which has *created* the lock, which could’ve happen on behalf of another transaction.

      Yes the blog is a bit simplified in that respect.

      In this particular scenario you’ve shown, in which the `SELECT` query scans all rows, and thus also locks all records in the `CountryCode` index …

      Yes, I should have been explicit that it is not just using the CountryCode index to access it but also updating it.

      You can use a different syntax, which gives you more control over locking behaviour.

      Thanks. I didn’t test that. Good to know.

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.