Category Archives: MySQL Cluster

MySQL NDB Cluster Backups

Today – 31 March – is world backup day, so I thought I would write a little about backups in MySQL NDB Cluster.

Just because NDB Cluster offers built-in redundancy and high availability does not mean backups are not important. They are – as ever and as for everything in software. The redundancy does not protect against user errors (anyone ever executed DROP TABLE or DROP SCHEMA by accident?) neither does it protect against a natural disaster, fire, or another disaster hitting the data center. Similar with high availability.

In short, if the data is in any way remotely important for you, you ensure you have a backup. Furthermore, a backup is not worth any more than your ability to restore it. If a fire rages your data center, it does not help you have the best backup in the world hosted in that data center.

So, before actually creating and restoring a backup, let us look at two best practices when it comes to backups.

Best Practices

The best practices mentioned here are by no means unique to MySQL NDB Cluster nor even databases. They are not exhaustive either, but more meant as something guidelines to have in mind when designing your backups.

Use a Backup Method that Works with Your Product

It sounds pretty obvious – why would you ever use a backup solution that does not work? Obviously no one does that on purpose, but unfortunately it is too common that it has not been checked whether the backup solution is appropriate.

With respect to MySQL NDB Cluster, I can mention that rsync of the NDB file system will not work, neither will any other method of creating a binary backup from the file system (including MySQL Enterprise Backup). It does not work either to use mysqldump unless you keep the cluster read-only for example by putting the cluster into “single user mode” and locking all tables.

When you test your backups make sure that you make changes to the data while the backup is running. A backup method may work when the database is idle, but not when concurrent writes are occurring.

In a little bit, I will show what the recommended way to create an online backup in NDB Cluster is.

Ensure You Can Restore Your Backups

There are two parts to this: can you retrieve your backups even in the worst case scenario, and do you know how to restore your backups?

You cannot assume that a backup that is kept locally on the same host or even in the same data center will be available when you need it. Think in terms of a major disaster such as the entire data center gone. Is it likely to happen? Fortunately not, but from time to time really bad things happens: fires, earthquakes, flooding, etc. Even if it is a once a century event, do you want to run the risk?

So, ensure you are copying your backups off site. How far away you need to copy it depends on several factors, but at least ensure it is not in the same suburb.

The other aspect is that too often, the first time a restore is attempted is when there is a total outage and everyone is in panic mode. That is not the optimal time to learn about the restore requirements and gotchas. Make it routine to restore backups. It serves too purposes: it validates your backups – see also the previous best practice – and it validates your steps to restore a backup.

Creating a Backup

It is very easy to create an online backup of a cluster using MySQL NDB Cluster as it is built-in. In the simplest of cases, it is as trivial as to execute the START BACKUP command in the ndb_mgm client, for example:

Each backup has a backup ID. In the above example, the ID is 1 (“Backup 1 started from …”). When a backup is started without specifying a backup ID, MySQL NDB Cluster determines what the previously highest used ID is and adds one to that. However, while this is convenient, it does mean the backup ID does not carry any information other than the sequence the backups were made.

An alternative is to explicitly request a given ID. Supported IDs are 1 through 4294967294. One option is to choose the ID to be YYmmddHHMM where YY is the year, mm the month, dd the day, HH the hours in 24 hours format, and MM the minutes. Zero-padded the numbers if the value is less than 10. This makes the backup ID reflect when the backup was created.

To specify the backup ID explicitly specify the requested ID as the first argument after START BACKUP, for example (using the interactive mode of ndb_mgm this time):

Here the backup ID is 1803311603 meaning the backup was created on 31 March 2018 at 16:03.

There are other arguments that can be used, for example to specify whether the snapshot time (where the backup is consistent) should be at the start of the end (the default) of the backup. The HELP START BACKUP command can be used to get online help with the START BACKUP command.

Remember that START BACKUP only backs up NDBCluster tables. Use mysqldump, mysqlpump, or another backup program to backup the schema and/or non-NDBCluster tables.

Restoring a Backup

It is a little more complicated to restore a backup than to create it, but once you have tried it a few times, it should not provide any major issues.

The backups are restored using the ndb_restore program. It is an NDB API program that supports both restoring the schema and data. It is recommended to perform the restore in three steps:

  1. Restore the schema.
  2. Restore the data with indexes disabled.
  3. Rebuild the indexes.
In MySQL NDB Cluster 7.4 and later, restoring the schema with ndb_restore did not change the number of partitions to the default of the cluster you restore to. If you have not yet upgraded to MySQL NDB Cluster 7.5, it is recommended to restore the schema from a mysqldump or mysqlpump backup if the cluster does not have the same number of data nodes and LDM threads.

The restore examples assumes you are restoring into an empty cluster. There is also support for partial restores and renaming tables, but that will not be discussed here. Let us take a look at the three steps.

Step 1: Restore the Schema

The schema is restored using the –restore_meta option, for example:

The arguments used here are:

  • –ndb-connectstring=localhost:1186. The host and port number where to connect to the management node(s). This example is from a test cluster with all nodes on the same host. In general you will not be specifying localhost here (never ever have the management and data nodes on the same host or even the same physical server – a topic for another day).
  • –nodeid=1. This tells which node ID to restore from. This is based on the node ID from the cluster where the backup was created. Either data node can be used.
  • –backupid=18033311603. The backup ID to restore.
  • –backup_path=…. The location of the backup files.
  • –restore_meta. Restore the schema (called meta data).
  • –disable-indexes. Do not restore the indexes (we will rebuild them later).

You may wonder why we do not want to restore the indexes. I will get back to that after the restore has been completed.

You should only execute this command once and only for one node id. Before proceeding to the next step, ensure the step completed without errors. The next step is to restore the data.

Step 2: Restore the Data

The command to restore the data is very similar to restoring the schema. The main differences is that –restore_meta will be replaced by –restore_data and that ndb_restore should be used once for each data node that was in the cluster where the backup was created.

For example in case of two data nodes:

These steps can be run in parallel as long as it does not cause an overload of the data nodes. A rule of thumb is that you can execute one ndb_restore –restore_data per host you have data nodes one. I.e. if you have one data node per host, you can restore all parts in parallel. If you have two data nodes per host, it may be necessary to divide the restore into two parts.

The final step is to rebuild the indexes.

Step 3: Rebuild the Indexes

As we disabled the indexes while restoring the schema and data, it is necessary to recreate them. This is done in a similar way to restoring the data – i.e. it should only be done for one node ID, for example:

That’s it. You can use the data again. But why was it that the indexes where disabled? Let me return to that.

Why Disable Indexes During the Restore?

There are two reasons to disable the indexes while restoring the schema and data:

  • Performance
  • Constraints (unique indexes and foreign keys)

As such, it is only necessary to disable the indexes while restoring the data, but there is no reason to create the indexes during the schema restore just to remove them again in the next step.

By disabling the indexes, there is no need to maintain the indexes during the restore. This allows us to restore the data faster, but then we need to rebuild the indexes at the end. This is still faster though, and if BuildIndexThreads and the number of fragments per data node are greater than 1, the rebuild will happen in parallel like during a restart.

The second thing is that if you have unique keys or foreign keys, it is in general not possible to restore the backup with indexes enabled. The reason is that the backup happens in parallel across the data nodes with the changes happening during the backup recorded separately. When you restore the data, it is not possible to guarantee that data and log are restored in the same order as the changes occurred during the backup. So, to avoid unique key and foreign key errors, it is necessary to disable the indexes until after the data has been restored.

Do not worry – this does not mean that the restored data will be inconsistent. At the end of the backup – and rebuilding the indexes checks for this – the constraints are fulfilled again.

Want to Know More?

This blog really only scratches the surface of backups. If you want to read more, some references are:

What is MySQL NDB Cluster?

I have had the opportunity to write a blog for Apress with a brief introduction to MySQL NDB Cluster. The blog gives a brief overview of the history and why you should consider it. The architecture is described before some key characteristics are discussed.

If you are interested, the blog can be found at https://www.apress.com/us/blog/all-blog-posts/what-is-mysql-ndb-cluster/15454530.

Happy reading.

Over view of the MySQL NDB Cluster architecture.

New Book: Pro MySQL NDB Cluster

It is with great pleasure, I can announce that a new book dedicated to MySQL NDB Cluster has just been released. The book Pro MySQL NDB Cluster is written by my colleague Mikiya Okuno and myself and is a nearly 700 pages deep dive into the world of MySQL NDB Cluster. The book is published by Apress.

Tip: There are several ways to cluster MySQL. This book is about the product MySQL Cluster (often called MySQL NDB Cluster to clarify which cluster it is). There is also MySQL InnoDB Cluster, clustering using replication, and clustering through operating or hardware features. Pro MySQL NDB Cluster is only about the former.

We very much hope you will enjoy the book. Feedback and questions are most welcome, for example on Twitter (@nippondanji and @JWKrogh).

Note: At the time of writing, only the eBook is available for purchase. A softcover version will follow as soon as it has been possible to print it; this can also be pre-ordered now. – Update: The softcover version of the book is now also available.

The book is divided into five parts and 20 chapters.

Part I – The Basics

The first part provides some background information on the various parts in MySQL NDB Cluster and how it works. The chapters are:

  • Chapter 1: Architecture and Core Concepts
  • Chapter 2: The Data Nodes

Part II – Installation and Configuration

The second part focuses on the installation and configuration related topics, including replication between clusters. The chapter are:

  • Chapter 3: System Planning
  • Chapter 4: Configuration
  • Chapter 5: Installation
  • Chapter 6: Replication

Part III – Daily Tasks and Maintenance

In the third part, the topics include tasks that is part of the daily routine as a database administrator plus a tutorial where the tasks discussed in parts II and III are handled through MySQL Cluster Manager. The chapters are:

  • Chapter 7: The NDB Management Client and Other NDB Utilities
  • Chapter 8: Backups and Restores
  • Chapter 9: Table Maintenance
  • Chapter 10: Restarts
  • Chapter 11: Upgrades and Downgrades
  • Chapter 12: Security Considerations
  • Chapter 13: MySQL Cluster Manager

Chapter IV – Monitoring and Troubleshooting

The fourth part continues with two topics that are also part of the daily routine: monitoring and troubleshooting. The chapters are:

  • Chapter 14: Monitoring Solutions and the Operating System
  • Chapter 15: Sources for Monitoring Data
  • Chapter 16: Monitoring MySQL NDB Cluster
  • Chapter 17: Typical Troubles and Solutions

Chapter V – Development and Performance Tuning

The final part covers topics that are related to development and getting the tuning the cluster with respect to performance. The chapters are:

  • Chapter 18: Developing Applications Using SQL with MySQL NDB Cluster
  • Chapter 19: MySQL NDB Cluster as a NoSQL Database
  • Chapter 20: MySQL NDB Cluster and Application Performance Tuning

Working Around MySQL Cluster Push Down Limitations Using Subqueries

This post was originally published on the MySQL Support Team Blog at https://blogs.oracle.com/mysqlsupport/entry/working_around_mysql_cluster_push on 5 August 2016.

I worked on an issue recently where a query was too slow when executed in MySQL Cluster. The issue was that Cluster has some restrictions when it comes to push down conditions.

As an example of this, consider the following query using the employees sample database. The query takes a look at the average salary based on how many years the employee has been with the company. As the latest hire date in the database is in January 2000, the query uses 1 February 2000 as the reference date.

Initially the query performs like (performance is with two data nodes and all nodes in the same virtual machine on a laptop, so the timings are not necessarily representative of a production system, though the improvements should be repeatable):

The straight join is needed as the performance is better than leaving the join order up to the optimizer.

The schema for the two tables in use is:

Why this poor performance? Looking at the EXPLAIN plan is the first step:

The EXPLAIN plan itself does not look bad – the index usage is as expected. However note the 3 warnings – one is the usual rewritten query after the application of rewriting and optimizer rules, but the two other gives more information why the performance is not what would be expected:

Here it can be seen that the tables are not pushable, because they are involved in the GROUP BY.

This gave me an idea. A couple of years ago, I wrote a post on the MySQL Server Blog about MySQL Server Blog: Better Performance for JOINs Not Using Indexes. These materialized temporary tables can also include auto keys that can improve the join performance significantly. What if a similar tactics is used with the above query?

Changing the join on the employees table to join a subquery:

That is more than a factor 3 improvement.

The new EXPLAIN plan is:

Note how <auto_key0> is used for the join on the derived table. This is what helps make the rewrite work.

Actually we can do a bit better. The above subquery selects all columns from the employees table even though only the emp_no and hire_date columns are used. So the next rewrite is:

The improvement of only including the columns needed will vary. For example if the internal temporary table ends up being converted into an on-disk table because of the extra data, or a covering index no longer can be used can increase the importance of only choosing the columns needed.

A final slight improvement can be gained by also converting the salaries table into a subquery:

The total speed up is from a little under 24 seconds to a little over 5 second or more than a factor 4.

Note: the above rewrite will not work out of the box in MySQL Cluster 7.5. The reason is that it is based on MySQL Server 5.7 where the optimizer can perform more advanced rewrites than it was possible on MySQL 5.6. So by default the above subqueries will be rewritten to normal joins and you end up with the original query again. To use the tactics of subqueries in Cluster 7.5, it is necessary first to disable the optimizer switch allowing the optimizer to rewrite the subqueries to normal joins:

 

New Member of the Cluster API Family: The Native Node.js Connector

MySQL Cluster 7.3 went GA yesterday and with it came a new member of the MySQL Cluster API family: mysql-js – a native Node.js connector. mysql-js uses the NDB API to connect directly to the data nodes which improves performance compared to executing queries through the MySQL nodes.

For an introduction to mysql-js and installation instructions I will recommend taking a look at the official API documentation and Andrew Morgan’s blog; the latter also has an overview of the new features in MySQL Cluster 7.3 in general.

To get a feel for how the new API works, I went ahead and created a small test program that will take one or more files with delimited data (e.g. similar to what you get with SELECT … INTO OUTFILE and inserts the data into a table. I have tried to keep things simple. This means that no other external modules than mysql-js is used, not very much error handling has been included, the reading, parsing of the data files could be done much better, performance has not been considered, etc. – but I would rather focus on the usage of mysql-js.

The complete example can be found in the file nodejs_tabinsert.js. The following will go through the important bits.

Preparation

The first part of the script is not really specific to mysql-js, so I will go lightly over that. A few of the arguments deserve a couple of extra words:

  • –log-level: when set to debug or detail some output with information about what happens inside mysql-js is logged. This can be useful to learn more about the module or for debugging.
  • –basedir: this is the same as the basedir option for mysqld – it sets where MySQL has been installed. It is used for loading the mysql-js module. Default is /usr/local/mysql.
  • –database and –table: which table to insert the data into. The default database is test, but the table name must always be specified.
  • –connect-string: the script connects directly to the cluster nodes, so it needs the NDB connect-string similar to other NDB programs. The default is localhost:1186.
  • –delimiter: the delimiter used in the data files. The default is a tab (\t).

Setting Up mysql-js

With all the arguments parsed, it is not possible to load the mysql-js module:

The unified_debug class is part of mysql-js and allows to get debug information from inside mysql-js logged to the console.

The nosql.ConnectionProperties() method will return an object with the default settings for the chosen adapter – in this case ndb. After that we can change the settings where we do not want the defaults. It is also possible to use an object with the settings as the argument instead of ‘ndb’; that requires setting the name of the adapter using the “implementation” property. Currently the two supported adapters are ‘ndb’ (as in this example) and ‘mysql’ which connects to mysqld instead. ‘mysql’ required node-mysql version 2.0 and also support InnoDB.

As the ‘ndb’ adapter connects directly to the cluster nodes, no authentication is used. This is the same as for the NDB API.

Callbacks and Table Mapping Constructor

We will load each file inside a transaction. The trxCommit() callback will verify that the transaction was committed without error and then closes the session.

The onInsert callback checks whether each insert worked correctly. When all rows for the session (file) have been inserted, it commits the transaction.

The tableRow is the constructor later used for the table mapping. It is used to set up the object with the data to be inserted for that row. tableMeta is a TableMetaData object with information about the table we are inserting into.

The Session

This is were the bulk of the work is done. Each file will have it’s own session.

The onSession function is a callback that is used when creating (opening) the sessions.

The first step is to get the meta data for the table. As all data is inserted into the same table, in principle we could reuse the same meta data object for all sessions, but the getTableMetaData() method is a method of the session, so it cannot be fetched until this point.

Next a transaction is started. We get the transaction with the session.currentTransaction() method. This returns an idle transaction which can then be started using the begin() method. As such there is not need to store the transaction in a variable; as can be seen in the trxCommit() and onInsert() callbacks above, it is also possible to call session.currnetTransaction() repeatedly – it will keep returning the same transaction object.

The rest of the onSession function processes the actual data. The insert itself is performed with the session.persist() method.

Edit: using a session this way to insert the rows one by one is obviously not very efficient as it requires a round trip to the data nodes for each row. For bulk inserts the Batch class is a better choice, however I chose Session to demonstrate using multiple updates inside a transaction.

Creating the Sessions

First the table mapping is defined. Then a session is opened for each file. Opening a session means connecting to the cluster, so it can be a relatively expensive step.

Running the Script

To test the script, the table t1 in the test database will be used:

For the data files, I have been using:

t1a.txt:

t1b.txt:

Running the script:

One important observation is that even though the session for t1a.txt is created before the one for t1b, the t1b.txt file is ending up being inserted first. Actually if the inserts were using auto-increments, it would be possible to see that in fact, the actual assignment of auto-increment values will in general alternate between rows from t1b.txt and t1a.txt. The lesson: in node.js do not count on knowing the exact order of operations.

I hope this example will spark your interest in mysql-js. Feedback is most welcome – both bug reports and feature requests can be reports at bugs.mysql.com.