Working Around MySQL Cluster Push Down Limitations Using Subqueries

This post was originally published on the MySQL Support Team Blog at on 5 August 2016.

I worked on an issue recently where a query was too slow when executed in MySQL Cluster. The issue was that Cluster has some restrictions when it comes to push down conditions.

As an example of this, consider the following query using the employees sample database. The query takes a look at the average salary based on how many years the employee has been with the company. As the latest hire date in the database is in January 2000, the query uses 1 February 2000 as the reference date.

Initially the query performs like (performance is with two data nodes and all nodes in the same virtual machine on a laptop, so the timings are not necessarily representative of a production system, though the improvements should be repeatable):

The straight join is needed as the performance is better than leaving the join order up to the optimizer.

The schema for the two tables in use is:

Why this poor performance? Looking at the EXPLAIN plan is the first step:

The EXPLAIN plan itself does not look bad – the index usage is as expected. However note the 3 warnings – one is the usual rewritten query after the application of rewriting and optimizer rules, but the two other gives more information why the performance is not what would be expected:

Here it can be seen that the tables are not pushable, because they are involved in the GROUP BY.

This gave me an idea. A couple of years ago, I wrote a post on the MySQL Server Blog about MySQL Server Blog: Better Performance for JOINs Not Using Indexes. These materialized temporary tables can also include auto keys that can improve the join performance significantly. What if a similar tactics is used with the above query?

Changing the join on the employees table to join a subquery:

That is more than a factor 3 improvement.

The new EXPLAIN plan is:

Note how <auto_key0> is used for the join on the derived table. This is what helps make the rewrite work.

Actually we can do a bit better. The above subquery selects all columns from the employees table even though only the emp_no and hire_date columns are used. So the next rewrite is:

The improvement of only including the columns needed will vary. For example if the internal temporary table ends up being converted into an on-disk table because of the extra data, or a covering index no longer can be used can increase the importance of only choosing the columns needed.

A final slight improvement can be gained by also converting the salaries table into a subquery:

The total speed up is from a little under 24 seconds to a little over 5 second or more than a factor 4.

Note: the above rewrite will not work out of the box in MySQL Cluster 7.5. The reason is that it is based on MySQL Server 5.7 where the optimizer can perform more advanced rewrites than it was possible on MySQL 5.6. So by default the above subqueries will be rewritten to normal joins and you end up with the original query again. To use the tactics of subqueries in Cluster 7.5, it is necessary first to disable the optimizer switch allowing the optimizer to rewrite the subqueries to normal joins: