MySQL NDB Cluster Backups
Today – 31 March – is world backup day, so I thought I would write a little about backups in MySQL NDB Cluster.
Just because NDB Cluster offers built-in redundancy and high availability does not mean backups are not important. They are – as ever and as for everything in software. The redundancy does not protect against user errors (anyone ever executed DROP TABLE
or DROP SCHEMA
by accident?) neither does it protect against a natural disaster, fire, or another disaster hitting the data center. Similar with high availability.
In short, if the data is in any way remotely important for you, you ensure you have a backup. Furthermore, a backup is not worth any more than your ability to restore it. If a fire rages your data center, it does not help you have the best backup in the world hosted in that data center.
So, before actually creating and restoring a backup, let us look at two best practices when it comes to backups.
Best Practices
The best practices mentioned here are by no means unique to MySQL NDB Cluster nor even databases. They are not exhaustive either, but more meant as something guidelines to have in mind when designing your backups.
Use a Backup Method that Works with Your Product
It sounds pretty obvious – why would you ever use a backup solution that does not work? Obviously no one does that on purpose, but unfortunately it is too common that it has not been checked whether the backup solution is appropriate.
With respect to MySQL NDB Cluster, I can mention that rsync
of the NDB file system will not work, neither will any other method of creating a binary backup from the file system (including MySQL Enterprise Backup). It does not work either to use mysqldump
unless you keep the cluster read-only for example by putting the cluster into “single user mode” and locking all tables.
When you test your backups make sure that you make changes to the data while the backup is running. A backup method may work when the database is idle, but not when concurrent writes are occurring.
In a little bit, I will show what the recommended way to create an online backup in NDB Cluster is.
Ensure You Can Restore Your Backups
There are two parts to this: can you retrieve your backups even in the worst case scenario, and do you know how to restore your backups?
You cannot assume that a backup that is kept locally on the same host or even in the same data center will be available when you need it. Think in terms of a major disaster such as the entire data center gone. Is it likely to happen? Fortunately not, but from time to time really bad things happens: fires, earthquakes, flooding, etc. Even if it is a once a century event, do you want to run the risk?
So, ensure you are copying your backups off site. How far away you need to copy it depends on several factors, but at least ensure it is not in the same suburb.
The other aspect is that too often, the first time a restore is attempted is when there is a total outage and everyone is in panic mode. That is not the optimal time to learn about the restore requirements and gotchas. Make it routine to restore backups. It serves too purposes: it validates your backups – see also the previous best practice – and it validates your steps to restore a backup.
Creating a Backup
It is very easy to create an online backup of a cluster using MySQL NDB Cluster as it is built-in. In the simplest of cases, it is as trivial as to execute the START BACKUP
command in the ndb_mgm
client, for example:
shell$ ndb_mgm --ndb-connectstring=localhost:1186 \
--execute="START BACKUP"
Connected to Management Server at: localhost:1186
Waiting for completed, this may take several minutes
Node 1: Backup 1 started from node 49
Node 1: Backup 1 started from node 49 completed
StartGCP: 4970 StopGCP: 4973
#Records: 4025756 #LogRecords: 1251
Data: 120749052 bytes Log: 50072 bytes
Each backup has a backup ID. In the above example, the ID is 1 (“Backup 1 started from …”). When a backup is started without specifying a backup ID, MySQL NDB Cluster determines what the previously highest used ID is and adds one to that. However, while this is convenient, it does mean the backup ID does not carry any information other than the sequence the backups were made.
An alternative is to explicitly request a given ID. Supported IDs are 1 through 4294967294. One option is to choose the ID to be YYmmddHHMM where YY is the year, mm the month, dd the day, HH the hours in 24 hours format, and MM the minutes. Zero-padded the numbers if the value is less than 10. This makes the backup ID reflect when the backup was created.
To specify the backup ID explicitly specify the requested ID as the first argument after START BACKUP
, for example (using the interactive mode of ndb_mgm
this time):
ndb_mgm> START BACKUP 1803311603
Waiting for completed, this may take several minutes
Node 1: Backup 1803311603 started from node 49
Node 1: Backup 1803311603 started from node 49 completed
StartGCP: 5330 StopGCP: 5333
#Records: 4025756 #LogRecords: 1396
Data: 120749052 bytes Log: 55880 bytes
Here the backup ID is 1803311603 meaning the backup was created on 31 March 2018 at 16:03.
There are other arguments that can be used, for example to specify whether the snapshot time (where the backup is consistent) should be at the start of the end (the default) of the backup. The HELP START BACKUP command can be used to get online help with the START BACKUP command.
Restoring a Backup
It is a little more complicated to restore a backup than to create it, but once you have tried it a few times, it should not provide any major issues.
The backups are restored using the ndb_restore program. It is an NDB API program that supports both restoring the schema and data. It is recommended to perform the restore in three steps:
- Restore the schema.
- Restore the data with indexes disabled.
- Rebuild the indexes.
The restore examples assumes you are restoring into an empty cluster. There is also support for partial restores and renaming tables, but that will not be discussed here. Let us take a look at the three steps.
Step 1: Restore the Schema
The schema is restored using the --restore_meta
option, for example:
shell$ ndb_restore --ndb-connectstring=localhost:1186 \
--nodeid=1 --backupid=1803311603 \
--backup_path=/backups/cluster/BACKUP/BACKUP-1803311603 \
--restore_meta --disable-indexes
Nodeid = 1
Backup Id = 1803311603
backup path = /backups/cluster/BACKUP/BACKUP-1803311603
2018-03-31 16:28:07 [restore_metadata] Read meta data file header
Opening file '/backups/cluster/BACKUP/BACKUP-1803311603/BACKUP-1803311603.1.ctl'
File size 47368 bytes
Backup version in files: ndb-6.3.11 ndb version: mysql-5.7.21 ndb-7.5.9
2018-03-31 16:28:07 [restore_metadata] Load content
Stop GCP of Backup: 5332
2018-03-31 16:28:07 [restore_metadata] Get number of Tables
2018-03-31 16:28:07 [restore_metadata] Validate Footer
Connected to ndb!!
2018-03-31 16:28:08 [restore_metadata] Restore objects (tablespaces, ..)
2018-03-31 16:28:08 [restore_metadata] Restoring tables
Successfully restored table `world/def/country`
...
2018-03-31 16:28:11 [restore_data] Start restoring table data
NDBT_ProgramExit: 0 - OK
The arguments used here are:
- –ndb-connectstring=localhost:1186. The host and port number where to connect to the management node(s). This example is from a test cluster with all nodes on the same host. In general you will not be specifying localhost here (never ever have the management and data nodes on the same host or even the same physical server – a topic for another day).
- –nodeid=1. This tells which node ID to restore from. This is based on the node ID from the cluster where the backup was created. Either data node can be used.
- –backupid=18033311603. The backup ID to restore.
- –backup_path=…. The location of the backup files.
- –restore_meta. Restore the schema (called meta data).
- –disable-indexes. Do not restore the indexes (we will rebuild them later).
You may wonder why we do not want to restore the indexes. I will get back to that after the restore has been completed.
You should only execute this command once and only for one node id. Before proceeding to the next step, ensure the step completed without errors. The next step is to restore the data.
Step 2: Restore the Data
The command to restore the data is very similar to restoring the schema. The main differences is that –restore_meta will be replaced by –restore_data and that ndb_restore should be used once for each data node that was in the cluster where the backup was created.
For example in case of two data nodes:
shell$ ndb_restore --ndb-connectstring=localhost:1186 \
--nodeid=1 --backupid=1803311603 \
--backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
--restore_data --disable-indexes
shell$ ndb_restore --ndb-connectstring=localhost:1186 \
--nodeid=2 --backupid=1803311603 \
--backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
--restore_data --disable-indexes
These steps can be run in parallel as long as it does not cause an overload of the data nodes. A rule of thumb is that you can execute one ndb_restore –restore_data per host you have data nodes one. I.e. if you have one data node per host, you can restore all parts in parallel. If you have two data nodes per host, it may be necessary to divide the restore into two parts.
The final step is to rebuild the indexes.
Step 3: Rebuild the Indexes
As we disabled the indexes while restoring the schema and data, it is necessary to recreate them. This is done in a similar way to restoring the data – i.e. it should only be done for one node ID, for example:
shell$ ndb_restore --ndb-connectstring=localhost:1186 \
--nodeid=1 --backupid=1803311603 \
--backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
--rebuild-indexes
That's it. You can use the data again. But why was it that the indexes where disabled? Let me return to that.
Why Disable Indexes During the Restore?
There are two reasons to disable the indexes while restoring the schema and data:
- Performance
- Constraints (unique indexes and foreign keys)
As such, it is only necessary to disable the indexes while restoring the data, but there is no reason to create the indexes during the schema restore just to remove them again in the next step.
By disabling the indexes, there is no need to maintain the indexes during the restore. This allows us to restore the data faster, but then we need to rebuild the indexes at the end. This is still faster though, and if BuildIndexThreads and the number of fragments per data node are greater than 1, the rebuild will happen in parallel like during a restart.
The second thing is that if you have unique keys or foreign keys, it is in general not possible to restore the backup with indexes enabled. The reason is that the backup happens in parallel across the data nodes with the changes happening during the backup recorded separately. When you restore the data, it is not possible to guarantee that data and log are restored in the same order as the changes occurred during the backup. So, to avoid unique key and foreign key errors, it is necessary to disable the indexes until after the data has been restored.
Do not worry – this does not mean that the restored data will be inconsistent. At the end of the backup – and rebuilding the indexes checks for this – the constraints are fulfilled again.
Want to Know More?
This blog really only scratches the surface of backups. If you want to read more, some references are:
- The MySQL Reference Manual:
- Pro MySQL NDB Cluster (Apress)
This book by Mikiya Okuno and myself has a chapter (31 pages) dedicated to discussing backups and restores.