InnoDB Progress Information

MySQL has since version 5.7 had support for progress information for some queries. As promised in my previous post, I will here discuss how you can use that to get progress information for ALTER TABLE on InnoDB tables.

Background and Setup

Progress information is implemented through the Performance Schema using the stage events. In version 8.0.12 there are currently seven stages that can provide this information for  ALTER TABLE statements on InnoDB tables. In MySQL 8, it is easy to list the stages capable of reporting progress information by using the setup_instruments Performance Schema table:

This also shows how the setup_instruments table in MySQL 8 has some additional information about the instruments such as properties and documentation (not included in the output). Adding this information is still work in progress.

MySQL 5.7 does not provide as easy a way to obtain the instruments providing progress information. Instead you need to consult the reference manual. However, the principle in using the feature is the same.

As you can see, all of the instruments are enabled and timed by default. What is not enabled by default, however, is the consumer that can make the information available:

Since the consumers form a hierarchical system, the sys schema function ps_is_consumer_enabled() is used to show whether the consumer is enabled taking the whole hierarchy into consideration.

In order to use the progress information, you need to enable the events_stages_current consumer. This is the consumer that is responsible for keeping the performance_schema.events_stages_current table up to date, i.e. record the current (or latest if there is no current stage) for each thread. With the default Performance Schema settings, the rest of the hierarchy is enabled. To enable event_stages_current and verify it will be consuming instruments, you can use the following queries:

That is it. Now you can monitor the progress of the queries that uses the stages with progress information.

Note: The more parts of the Performance Schema that is enabled and the more fine grained monitoring, the more overhead. Stages are not the worst with respect to overhead; nevertheless it is recommended you keep an eye on the affect of enabling the events_stages_current consumer.

Monitoring Progress

The base for monitoring the progress information is the performance_schema.events_stages_current table. There are two columns of interest for this discussion:

  • WORK_COMPLETED: The amount of work that is reported to have been completed.
  • WORK_ESTIMATED: The estimated amount of work that needs to be done.

For InnoDB ALTER TABLE the estimated amount of work is for the entire operation. That said, the estimate may be revised during the process, so it may happen that the if you calculate the percentage it decreases as time goes. However, in general the percentage (100% * WORK_COMPLETED/WORK_ESTIMATED) will increase steadily until the operation completes at 100%.

To learn more about how the progress information works, the following pages in the manual are recommended:

For now, let’s look at an example.

Example

For the example, the salaries table in the employees sample database will be used. The table is sufficiently large that it will be possible to query the progress while adding a column using the INPLACE algorithm. As discussed in MySQL 8.0.12: Instant ALTER TABLE, it is possible to add a column instantly, but for the purpose of this example, the INPLACE algorithm illustrates the progress information feature better. The query that will be executed is:

The performance_schema.events_stages_current table can be joined with the performance_schema.events_statements_current to show the query and progress. For example:

There is another way though. Instead of using the performance_schema.events_stages_current table directly, an easier way is to use the sys.session view. This is an advanced process list that includes much more information than the usual SHOW PROCESSLIST statement including progress information. The performance of sys.session has been improved with more than an order of magnitude in MySQL 8 by the addition of indexes to the Performance Schema tables making it highly useful.

Querying the sys.session view for sessions showing progress information while the ALTER TABLE is in progress returns an output similar to the following example:

In the example, the @sys.statement_truncate_len user variable is set to 85. By default the sys schema ensures the current statement is at most 64 characters long. In order to avoid truncation in this case, the truncate length is increased to 85.

The example output shows that the progress is at around 49%. It is important to note that is an estimate and not an exact number. The ALTER TABLE is performing a merge sort at the time, and the query has been running for 4.22 seconds.

A couple of other interesting columns are included. It can be seen the connection is using 464KiB at the time. In MySQL 8 memory instrumentation is enabled by default (in MySQL 5.7 you need to enable it yourself). Additionally, the name of the program executing the query is MySQLWorkbench, that is the query originates from MySQL Workbench.

So, next time you plan a large ALTER TABLE operation, consider enabling the events_stages_current consumer, so you can follow the progress.

MySQL 8.0.12: Instant ALTER TABLE

There are many nice changes included in the MySQL 8.0.12 release that were published a couple of days ago. One of the most exciting is the ability to make instant schema changes to tables. This blog will look into why I think that that is a stand-out change.

I will not go into details with the implementation other than noting that the new data dictionary in MySQL 8.0 has allowed for a nice implementation of the patch that was contributed by the Tencent Games DBA Team. If you are interested in learning more about the implementation, I will recommend you to read the blog by Bin Su (MySQL 8.0: InnoDB now supports Instant ADD COLUMN) and the worklog implementing the feature (WL#11250: Support Instant Add Column).

Thanks to the Tencent Games DBA Team who contributed the patch for this feature.

Why Do We Need Instant ALTER TABLE?

As you may know, InnoDB has since version 5.6 supported online schema changes. So, a fair thought may be why that is no longer good enough. The thing is that while online schema changes are very nice and allows you to make changes to the data (DML statements) while the schema change is made, there are still some problems:

  • Online ALTER TABLE still requires a meta data lock for the duration of the operation. That is, it is not possible to make other schema changes until the ALTER TABLE has completed.
  • In replication setups, the SQL thread handling the online schema change will still block other changes coming through the replication channel from occurring. This means that an online schema change that takes an hour suddenly makes the replication lag an hour.
  • This is compounded in chained replication setups where the delay on the last instances in the topology is multiplied with the number of times the change has been replication. Suddenly this schema change that did not really affect the replication master becomes a major pain and delays the deployment of the application upgrade that relies on the schema changes.
  • Even though the change is online, it still is heavy on resources: disk for storing the extra copy of the table when the change cannot be made in-place, CPU and disk activity that may cause other queries to become slow, etc.
  • There is a limit to the amount of DML changes that can be made during the schema change. If you make too many changes, the schema change will fail.

So, in short, online schema changes are only the first stop. Instance schema changes is the future.

Which Changes Can Be Made Instantly?

While it would be great if all schema changes could be made instantly, unfortunately that is not the case. Just as not all schema changes can be made online, there are limitations to the new instant feature. In short, the changes that can be made instantly must only affect the metadata of the table. The metadata is stored in the data dictionary. The changes that can be made with the instant ALTER TABLE feature as per 8.0.12 are:

  • Adding a new column as the last column in the table.
  • Adding a generated virtual column.
  • Dropping a generated virtual column.
  • Setting a default value for an existing column.
  • Dropping the default value for an existing column.
  • Changing the list of values allowed for a column with the ENUM or SET data types. A requirement is that the storage size does not change for the column.
  • Change whether the index type is set explicitly for an existing index.

And who knows, maybe later the feature can be extended to cover more changes. There are also a few limitations that are good to be aware of:

  • The row format cannot be COMPRESSED.
  • The table cannot have a fulltext index.
  • Tables in the data dictionary cannot use the instant algorithm.
  • Temporary tables are not supported.

How to Ensure You are Using the Expected Algorithm?

One problem with schema changes is that here are different algorithms depending on the schema change. Currently there are three different algorithms:

  • INSTANT: the change completes very quickly (yes not quite instantly) as only the metadata in the data dictionary needs to be updated.
  • INPLACE: the changes are made within the existing table, i.e. a complete table copy is avoided.
  • COPY: the table is copied into a new copy with the new definition.

By default, MySQL chooses the algorithm doing the least work. That is, INSTANT if that is supported, otherwise INPLACE if that is supported, and finally COPY. Additionally, there is the concept of locking which can be set to either NONE, SHARED, or EXCLUSIVE.

So, how do you ensure you are not ending up with a copying ALTER TABLE taking exclusive locks when you thought the operation was going to be instant? The answer is to explicitly set the algorithm and lock type. That way, MySQL will throw an error if your schema change is not compatible with the requested algorithm. The same principle can be used to force a full rebuild of the table if you for example have a small table and don’t see a reason to worry about instantly added columns (see also more later).

I will give some examples of specifying the ALGORITHM and LOCK options to ALTER TABLE later. However, we first need an example table to play with.

Tip: Always specify the ALGORITHM and LOCK options explicitly to avoid unpleasant surprises.

Creating an Example Table

For the purpose of the upcoming examples of instant schema changes, it is necessary to have a table to play with. The table will be the testtbl table in the my_schema schema. The table can be created and populated with sample data by using MySQL Shell – this allows us to take advantage of the support for scripting. The table will have one million rows.

Note: To limit the amount of scrolling, the prompt in MySQL Shell has been changed to just show the mode (JS, PY, SQL).

Note how this takes advantage of the feature in MySQL Shell where the db object in JavaScript and Python mode has the tables as properties. Thus it is possible to refer to the testtbl table in the my_schema schema as db.testtbl after using \use my_schema to read the database objects. The data is inserted in 1000 batches of 1000 rows using a double loop.

Now it is time to play with schema changes.

Schema Changes Examples

For comparison, let’s first add a column using the old algorithms. First, add the column val2 using the copying algorithm – note this is not supported online, so a shared lock is taken:

Note: the test is made on a laptop – the timings themselves are not relevant, but they can be used to compare the time taken for the three algorithms.

So that took around 5.4 seconds. That is not too bad, but we can do much better than that. Let’s try the in-place algorithm. Not only will that require less work (though still a fair bit), it is also possible to perform the change online.

So, this is a factor three faster despite the table having an extra column compared to the original table. And it is possible to execute DML statements. That is a big win. But let’s move on to the finale: instant ALTER TABLE:

Wow – that’s less than 5/100 of a second – that’s instant enough for me. Note hos the LOCK option is not specified here. The instant algorithm is always as lock free as possible, so the LOCK option is not allowed.

There is a little more to it than this. As mentioned by Bin Su, there are some trade offs such as the row size if not checked upfront. So, in some cases it can be useful to know whether a table contains instantly added columns. This can be seen through the innodb_tables view in the Information Schema on the data dictionary:

Now what is that? 4 columns? But we only added one of the columns using the instant algorithm. What the INSTANT_COLS column shows is how many columns existed before the first instant column was added. In the example the columns id, val, val2, and val3 existed, then val4 was added using the instant algorithm. For tables that have never had any columns added instantly, the value of INSTANT_COLS is 0.

Want to Read More?

This blog has just been an introduction to the new MySQL 8.0.12 feature of instant schema changes. I will recommend you to read Bin Xu’s blog as well as the documentation in the MySQL reference manual to fully understand the feature:

Conclusion

The new feature allowing instant schema changes is a great way to avoid time consuming operations. Particularly in replication setups where the replicated statement will block while it applies, the different between an online and an instant schema change is like night and day.

MySQL 8.0.12 can be downloaded from MySQL Community Downloads or you can install it through one of the “native installers” such as MySQL Installer for Microsoft Windows or one of our Linux repositories (Yum, APT, SUSE). If you are a customer and need the commercial version of MySQL Server, I will recommend you to download it from Patches & Updates in My Oracle Support (MOS); there is also a 30 days trial version available from Oracle Software Delivery Cloud.

Please do not hesitate to provide feedback. Bugs and feature requests can be logged in the MySQL Bugs database.