Locking SELECT with CREATE TABLE, INSERT INTO, and User Variables
Locking is an important concept in databases. They help regulate access to the data, so your SELECT
queries return consistent results and DML and DDL statements leave the data and schema in a consistent state. For the data, there are four different transaction isolation levels that influence which locks are taken. The most two commonly used isolation levels are REPEATABLE READ
(the default in InnoDB) and READ COMMITTED
(the default in some other databases). Both of those are said to provide non-locking reads, but there is a little more to it than that.
One case where reads are always locking is when you explicitly requests locks by adding the FOR SHARE
or FOR UPDATE
modifiers. However there are also cases where SELECT
statements becomes locking due to the way the result of the statement is used. Two such cases are CREATE TABLE ... SELECT
(INSERT INTO ... SELECT
behaves the same way) and assigning the result of the statement to a user variable. Those two cases are the topic of this blog.
CREATE TABLE .. SELECT / INSERT INTO … SELECT
First, you can investigate the locks taken by a CREATE TABLE ... SELECT
and INSERT INTO ... SELECT
statements. Since these two behaves the same for this example, only the CREATE TABLE
version will be shown. The default transaction isolation level of REPEATABLE READ
will be used:
Connection 1> SELECT PS_CURRENT_THREAD_ID();
+------------------------+
| PS_CURRENT_THREAD_ID() |
+------------------------+
| 56 |
+------------------------+
1 row in set (0.0004 sec)
Connection 1> CREATE TABLE world._tmp_city
SELECT * FROM world.city WHERE SLEEP(0.01) = 0;
Because of the SLEEP(0.01)
in the WHERE
clause, the statement will take around one minute to execute. The easiest way in MySQL 8.0 to determine which data locks are requested by a statement is to use the performance_schema.data_locks
table. In this case, the locks on the city
table is those of interest, and you get get those like (notice that the thread id found above is used to only get the locks for that one connection):
Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
FROM performance_schema.data_locks
WHERE THREAD_ID = 56
AND OBJECT_SCHEMA = 'world'
AND OBJECT_NAME = 'city'
GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+------------+-----------+-----------+-------------+----------+
| INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | COUNT(*) |
+------------+-----------+-----------+-------------+----------+
| NULL | TABLE | IS | GRANTED | 1 |
| PRIMARY | RECORD | S | GRANTED | 525 |
+------------+-----------+-----------+-------------+----------+
2 rows in set (0.0042 sec)
The number of locks depend on how far into the execution the CREATE TABLE ... SELECT
statement is. The outpuf rom the data_locks
table shows there is an shared intention lock on the table and 525 shared record locks on the primary key (in InnoDB that is the row itself). At the end of the statement – just before the statement completes, all rows will be locked. What does that mean? If you try to update or delete one of the rows that are locked in a different connection, that statement will block and possibly encounter a lock wait timeout:
Connection 2> SET SESSION innodb_lock_wait_timeout = 2;
Query OK, 0 rows affected (0.0006 sec)
mysql> UPDATE world.city
SET Population = Population + 1
WHERE ID = 1;
ERROR: 1205: Lock wait timeout exceeded; try restarting transaction
In this case, the timeout is set to happen after two seconds to avoid having to wait for the error to occur.
So, this was with the REPEATABLE READ
transaction isolation level. Is there a difference in READ COMMITTED
? Yes, there is. In that case the CREATE TABLE ... SELECT
and INSERT INTO ... SELECT
statements do not take any locks on the city
table.
Book
If you want to learn more about locks and transactions in MySQL, then I have written MySQL Concurrency published by Apress. The book covers monitoring of locks and transactions, the various lock levels and types, how lock conflicts work, and six case studies investigating and reducing lock situations.
The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.
User Variables
The other use case, I want to discuss is storing the result of a SELECT
statement in a user variable. This has an unexpected twist. As a start, consider finding the number of rows in the table and store the result in the @my_var
user variable using the default REPEATABLE READ
transaction isolation level:
Connection 1> SELECT PS_CURRENT_THREAD_ID();
+------------------------+
| PS_CURRENT_THREAD_ID() |
+------------------------+
| 56 |
+------------------------+
1 row in set (0.0005 sec)
Connection 1> SET SESSION transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.0004 sec)
Connection 1> SET @my_var = (SELECT COUNT(*)
FROM world.city
WHERE SLEEP(0.01) = 0);
Selecting from the performance_schema.data_locks
while the query is executing, gives a result similar to before:
Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
FROM performance_schema.data_locks
WHERE THREAD_ID = 56
AND OBJECT_SCHEMA = 'world'
AND OBJECT_NAME = 'city'
GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+-------------+-----------+-----------+-------------+----------+
| INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | COUNT(*) |
+-------------+-----------+-----------+-------------+----------+
| NULL | TABLE | IS | GRANTED | 1 |
| CountryCode | RECORD | S | GRANTED | 248 |
+-------------+-----------+-----------+-------------+----------+
2 rows in set (0.0027 sec)
The optimiser in this case chooses to use the CountryCode
index for counting the number of rows, but otherwise the locking situation is the same as before. It does however mean that you can update the row provided you do not use the CountryCode
index to access the row or update the CountryCode
index, so in that sense it is an improvement. (The optimiser will choose the execution plan it thinks is most efficient – has the lowest cost – so you mileage may vary.)
However, the big surprise is when you change to the READ COMMITTED
transaction isolation level:
Connection 1> SET SESSION transaction_isolation = 'READ-COMMITTED';
Query OK, 0 rows affected (0.0005 sec)
Connection 1> SET @my_var = (SELECT COUNT(*)
FROM world.city
WHERE SLEEP(0.01) = 0);
Unlike before, assigning the value to a user variable still takes locks in the READ COMMITTED
transaction isolation level:
Connection 2> SELECT INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS, COUNT(*)
FROM performance_schema.data_locks
WHERE THREAD_ID = 56
AND OBJECT_SCHEMA = 'world'
AND OBJECT_NAME = 'city'
GROUP BY INDEX_NAME, LOCK_TYPE, LOCK_MODE, LOCK_STATUS;
+-------------+-----------+---------------+-------------+----------+
| INDEX_NAME | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | COUNT(*) |
+-------------+-----------+---------------+-------------+----------+
| NULL | TABLE | IS | GRANTED | 1 |
| CountryCode | RECORD | S,REC_NOT_GAP | GRANTED | 271 |
+-------------+-----------+---------------+-------------+----------+
2 rows in set (0.0023 sec)
The locking is slightly reduced compared to REPEATABLE READ
(no gap locks) but this is still enough to cause serious locking problems in a production system.
Why does it takes locks even in READ COMMITTED
when you assign the result to a user variable? I have not been able to find any documentation of it, but by assigning the value to a variable, you can re-use the value including in statements that modify the data, so for data consistency it does make sense that the locks last till the end of the transaction.
Thank you for this good article briefing oracle MySql DBA for developer
Hello, thanks for bloging about locking in InnoDB 🙂
I’d like to clarify some points from your article.
> (notice that the thread id found above is used to only get the locks for that one connection)
The `performance_schema.data_locks.THREAD_ID` column is not intended to be used for searching for locks *belonging* to a given transaction – it’s the thread which has *created* the lock, which could’ve happen on behalf of another transaction.
To identify rows *belonging* to transaction x, one should rather use `performance_schema.data_locks.ENGINE_TRANSACTION_ID = x`.
See https://dev.mysql.com/doc/mysql-perfschema-excerpt/8.0/en/performance-schema-data-locks-table.html for reference.
It’s often the case that a transaction creates a lock for itself, and then the two columns agree with each other.
But it’s also possible that a lock gets created due to implicit-to-explicit lock conversion, which happens from a thread which is interested in the state of the lock queue for a given resource and thus wants to make everything explicit, and then this observer’s id will be put in the `performance_schema.data_locks.THREAD_ID` column because technically it’s this particular event in this particular thread’s timeline which *created* the lock object.
The THREAD_ID column is meant to be used in combination with EVENT_ID column, to locate relevant events which lead to the *creation* of the lock. See:
https://dev.mysql.com/doc/mysql-perfschema-excerpt/8.0/en/performance-schema-obtaining-parent-events.html
To obtain current transaction’s id, you first need to ensure it is started (which is tricky, because simply doing `BEGIN` is not enough – for performance reasons, InnoDB lazily begins the transaction when proccessing the first real statement, such as `SELECT`, `UPDATE`, `DELETE`,..) and then can query `information_schema.innodb_trx` for `trx_mysql_thread_id = connection_id()`.
https://dev.mysql.com/doc/refman/8.0/en/information-schema-innodb-trx-table.html
> The optimiser in this case chooses to use the CountryCode index for counting the number of rows, but otherwise the locking situation is the same as before. It does however mean that you can update the row provided you do not use the CountryCode index to access the row, so in that sense it is an improvement.
In this particular scenario you’ve shown, in which the `SELECT` query scans all rows, and thus also locks all records in the `CountryCode` index, it may indeed be possible to update rows in some ways, but not in others. Basically, `UPDATE` operations which modify the columns which do not affect the columns in `CountryCode` and the `PRIMARY KEY`, (say: `UPDATE world.city SET Population = Population+1 WHERE ID=1`) should be fine. But, if you try any `UPDATE` operation which changes the mapping from `CountryCode`’s columns to `PRIMARY KEY` (say: `UPDATE world.city SET CountryCode = “Whatever” WHERE ID=1`), then even if the PRIMARY index was used to access the row, the `CountryCode` index still needs updating too, and this will be blocked by the `SELECT`’s lock. That is finding the record to modify can be handled by the PRIMARY index, but the change itself may require modifying several indexes and then the records in them must be properly locked. Obviously, `INSERT` and and `DELETE` operations also affect this mapping, thus might also block.
> Why does it takes locks even in READ COMMITTED when you assign the result to a user variable? I have not been able to find any documentation of it, but by assigning the value to a variable, you can re-use the value including in statements that modify the data, so for data consistency it does make sense that the locks last till the end of the transaction.
The `SET @var = (SELECT…)` syntax causes the locking.
You can use a different syntax, which gives you more control over locking behaviour.
When you use
“`
SELECT COUNT(*) FROM world.city WHERE SLEEP(0.01) = 0 INTO @my_var;
“`
in READ COMMITTED, then no locks are taken.
If you use:
“`
SELECT COUNT(*) FROM world.city WHERE SLEEP(0.01) = 0 INTO @my_var FOR UPDATE;
“`
then the locks are taken.
See https://dev.mysql.com/doc/refman/8.0/en/select-into.html for reference.
Also, you might be intersted in discussion https://bugs.mysql.com/bug.php?id=67452 from 2012.
Best regards!
Hi Jakub,
Thanks for the detailed comment.
Yes the blog is a bit simplified in that respect.
Yes, I should have been explicit that it is not just using the
CountryCode
index to access it but also updating it.Thanks. I didn’t test that. Good to know.
Hi! thank you for the information this article is very interesting
good article
Thank you