MySQL 8.0.12: Instant ALTER TABLE

There are many nice changes included in the MySQL 8.0.12 release that were published a couple of days ago. One of the most exciting is the ability to make instant schema changes to tables. This blog will look into why I think that that is a stand-out change.

I will not go into details with the implementation other than noting that the new data dictionary in MySQL 8.0 has allowed for a nice implementation of the patch that was contributed by the Tencent Games DBA Team. If you are interested in learning more about the implementation, I will recommend you to read the blog by Bin Su (MySQL 8.0: InnoDB now supports Instant ADD COLUMN) and the worklog implementing the feature (WL#11250: Support Instant Add Column).

Contribution

Thanks to the Tencent Games DBA Team who contributed the patch for this feature.

Why Do We Need Instant ALTER TABLE?

As you may know, InnoDB has since version 5.6 supported online schema changes. So, a fair thought may be why that is no longer good enough. The thing is that while online schema changes are very nice and allows you to make changes to the data (DML statements) while the schema change is made, there are still some problems:

  • Online ALTER TABLE still requires a meta data lock for the duration of the operation. That is, it is not possible to make other schema changes until the ALTER TABLE has completed.
  • In replication setups, the SQL thread handling the online schema change will still block other changes coming through the replication channel from occurring. This means that an online schema change that takes an hour suddenly makes the replication lag an hour.
  • This is compounded in chained replication setups where the delay on the last instances in the topology is multiplied with the number of times the change has been replication. Suddenly this schema change that did not really affect the replication master becomes a major pain and delays the deployment of the application upgrade that relies on the schema changes.
  • Even though the change is online, it still is heavy on resources: disk for storing the extra copy of the table when the change cannot be made in-place, CPU and disk activity that may cause other queries to become slow, etc.
  • There is a limit to the amount of DML changes that can be made during the schema change. If you make too many changes, the schema change will fail.

So, in short, online schema changes are only the first stop. Instance schema changes is the future.

Which Changes Can Be Made Instantly?

While it would be great if all schema changes could be made instantly, unfortunately that is not the case. Just as not all schema changes can be made online, there are limitations to the new instant feature. In short, the changes that can be made instantly must only affect the metadata of the table. The metadata is stored in the data dictionary. The changes that can be made with the instant ALTER TABLE feature as per 8.0.12 are:

  • Adding a new column as the last column in the table.
  • Adding a generated virtual column.
  • Dropping a generated virtual column.
  • Setting a default value for an existing column.
  • Dropping the default value for an existing column.
  • Changing the list of values allowed for a column with the ENUM or SET data types. A requirement is that the storage size does not change for the column.
  • Change whether the index type is set explicitly for an existing index.

And who knows, maybe later the feature can be extended to cover more changes. There are also a few limitations that are good to be aware of:

  • The row format cannot be COMPRESSED.
  • The table cannot have a fulltext index.
  • Tables in the data dictionary cannot use the instant algorithm.
  • Temporary tables are not supported.

How to Ensure You are Using the Expected Algorithm?

One problem with schema changes is that here are different algorithms depending on the schema change. Currently there are three different algorithms:

  • INSTANT: the change completes very quickly (yes not quite instantly) as only the metadata in the data dictionary needs to be updated.
  • INPLACE: the changes are made within the existing table, i.e. a complete table copy is avoided.
  • COPY: the table is copied into a new copy with the new definition.

By default, MySQL chooses the algorithm doing the least work. That is, INSTANT if that is supported, otherwise INPLACE if that is supported, and finally COPY. Additionally, there is the concept of locking which can be set to either NONE, SHARED, or EXCLUSIVE.

So, how do you ensure you are not ending up with a copying ALTER TABLE taking exclusive locks when you thought the operation was going to be instant? The answer is to explicitly set the algorithm and lock type. That way, MySQL will throw an error if your schema change is not compatible with the requested algorithm. The same principle can be used to force a full rebuild of the table if you for example have a small table and don’t see a reason to worry about instantly added columns (see also more later).

I will give some examples of specifying the ALGORITHM and LOCK options to ALTER TABLE later. However, we first need an example table to play with.

Tip

Always specify the ALGORITHM and LOCK options explicitly to avoid unpleasant surprises.

Creating an Example Table

For the purpose of the upcoming examples of instant schema changes, it is necessary to have a table to play with. The table will be the testtbl table in the my_schema schema. The table can be created and populated with sample data by using MySQL Shell – this allows us to take advantage of the support for scripting. The table will have one million rows.

Note

To limit the amount of scrolling, the prompt in MySQL Shell has been changed to just show the mode (JS, PY, SQL).

JS> \connect root@localhost
Creating a session to 'root@localhost'
Please provide the password for 'root@localhost': ********
Save password for 'root@localhost'? [Y]es/[N]o/Ne[v]er (default No):
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 17 (X protocol)
Server version: 8.0.12 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

JS> \sql
Switching to SQL mode... Commands end with ;

SQL> CREATE SCHEMA my_schema;
Query OK, 1 row affected (0.0509 sec)

SQL> \use my_schema
Default schema set to `my_schema`.
Fetching table and column names from `my_schema` for auto-completion... Press ^C to stop.

SQL> CREATE TABLE testtbl (
 ...   id int unsigned NOT NULL auto_increment,
 ...   val varchar(36) NOT NULL,
 ...   PRIMARY KEY (id)
 ... ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.0497 sec)
SQL> \py
Switching to Python mode...

Py> \use my_schema
Default schema `my_schema` accessible through db.

Py> db.testtbl
<Table:testtbl>

Py> import uuid

Py> for i in range(1000):
...     session.start_transaction()
...     stmt = db.testtbl.insert("val")
...     for j in range(1000):
...         stmt = stmt.values(uuid.uuid1().hex)
...     stmt.execute()
...     session.commit()
...
Query OK, 0 rows affected (0.0029 sec)

Py> db.testtbl.select("COUNT(*)")
+----------+
| COUNT(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.2476 sec)

Note how this takes advantage of the feature in MySQL Shell where the db object in JavaScript and Python mode has the tables as properties. Thus it is possible to refer to the testtbl table in the my_schema schema as db.testtbl after using \use my_schema to read the database objects. The data is inserted in 1000 batches of 1000 rows using a double loop.

Now it is time to play with schema changes.

Schema Changes Examples

For comparison, let’s first add a column using the old algorithms. First, add the column val2 using the copying algorithm – note this is not supported online, so a shared lock is taken:

SQL> ALTER TABLE testtbl
 ...       ADD COLUMN val2 varchar(36) DEFAULT NULL,
 ...       ALGORITHM=COPY, LOCK=SHARED;
Query OK, 1000000 rows affected (5.3952 sec)

Note

The test is made on a laptop – the timings themselves are not relevant, but they can be used to compare the time taken for the three algorithms.

So that took around 5.4 seconds. That is not too bad, but we can do much better than that. Let’s try the in-place algorithm. Not only will that require less work (though still a fair bit), it is also possible to perform the change online.

SQL> ALTER TABLE testtbl
 ...       ADD COLUMN val3 varchar(36) DEFAULT NULL,
 ...       ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (1.7511 sec)

So, this is a factor three faster despite the table having an extra column compared to the original table. And it is possible to execute DML statements. That is a big win. But let’s move on to the finale: instant ALTER TABLE:

SQL> ALTER TABLE testtbl
 ...       ADD COLUMN val4 varchar(36) DEFAULT NULL,
 ...       ALGORITHM=INSTANT;
Query OK, 0 rows affected (0.0490 sec)

Wow – that’s less than 5/100 of a second – that’s instant enough for me. Note hos the LOCK option is not specified here. The instant algorithm is always as lock free as possible, so the LOCK option is not allowed.

There is a little more to it than this. As mentioned by Bin Su, there are some trade offs such as the row size if not checked upfront. So, in some cases it can be useful to know whether a table contains instantly added columns. This can be seen through the innodb_tables view in the Information Schema on the data dictionary:

Py> schema = session.get_schema("information_schema")
Query OK, 1000 items affected (0.0091 sec)

Py> inno_tables = schema.get_table("INNODB_TABLES")
Query OK, 0 rows affected (0.0002 sec)

Py> inno_tables.select("NAME", "INSTANT_COLS").where("INSTANT_COLS > 0")
+-------------------+--------------+
| NAME              | INSTANT_COLS |
+-------------------+--------------+
| my_schema/testtbl |            4 |
+-------------------+--------------+
1 row in set (0.0405 sec)

Now what is that? 4 columns? But we only added one of the columns using the instant algorithm. What the INSTANT_COLS column shows is how many columns existed before the first instant column was added. In the example the columns id, val, val2, and val3 existed, then val4 was added using the instant algorithm. For tables that have never had any columns added instantly, the value of INSTANT_COLS is 0.

Want to Read More?

This blog has just been an introduction to the new MySQL 8.0.12 feature of instant schema changes. I will recommend you to read Bin Xu’s blog as well as the documentation in the MySQL reference manual to fully understand the feature:

Conclusion

The new feature allowing instant schema changes is a great way to avoid time consuming operations. Particularly in replication setups where the replicated statement will block while it applies, the different between an online and an instant schema change is like night and day.

MySQL 8.0.12 can be downloaded from MySQL Community Downloads or you can install it through one of the “native installers” such as MySQL Installer for Microsoft Windows or one of our Linux repositories (Yum, APT, SUSE). If you are a customer and need the commercial version of MySQL Server, I will recommend you to download it from Patches & Updates in My Oracle Support (MOS); there is also a 30 days trial version available from Oracle Software Delivery Cloud.

Please do not hesitate to provide feedback. Bugs and feature requests can be logged in the MySQL Bugs database.

I have worked with MySQL databases since 2006 both as an SQL developer, a database administrator, and for more than eight years as part of the Oracle MySQL Support team. I have spoken at MySQL Connect and Oracle OpenWorld on several occasions. I have contributed to the sys schema and four Oracle Certified Professional (OCP) exams for MySQL 5.6 to 8.0. I have written four books, all published at Apress.