Today is World Backup Day, so I thought I would use the opportunity to discuss some best practices and general considerations regarding backing up MySQL instances. While I focus on MySQL, several of these tips apply to backups in general.
Before heading into the gory details, let's first take a look at the best practices at a high level:
- Make sure you can restore your backups:
- Document and script the restore procedures. Do you know the steps required to restore a full backup – or a single table?
- Keep copies of the backups off-site. Do you have a copy of your backup if the data center becomes unavailable for example due to a fire?
- Validate your backups. Does your backup method work with the features you use? Are you writing to a disk which is failing?
- Monitor the backups. Do you know when a backup failed? How long time does the backups take?
- Use a backup method appropriate for your system and your requirements.
- Never stop considering your backup strategy. The World changes, so does your backup requirements.
The rest of this blog will discuss all of these things in more detail.
Make Sure You Can Restore Your Backups
It may seem obvious, but one of the more common issues I see is that backups exist, but the steps to restore them are not known. Or even worse, the backups cannot be restored at all as they are broken.
There are several things to consider: how to restore a backup, do you have access to the backup, and is the backup valid?
The Restore Procedure
If you do not know how to restore your backups, the day you do need to restore one, a relatively standard operation can turn into a major crisis with the manager staring down your backup.
So, make sure you practice the steps to restore your backups for all of the scenarios you can think of, for example:
- a plain full restore of the backup
- a point-in-time recovery, that is: restore the backup itself and apply binary logs up to a given point in time
- a partial restore, for example to restore a single table or schema from a full backup
There are more possible scenarios. Take some time to consider which are important for your databases and regular practice doing those kind of restores.
When you practice a restore, document all steps you make in detail and keep the steps in a place where they can easily be found again – for example in a knowledge base. Even better, script the restore; that works both to document how the restore should be done, but also automates the steps and ensure each restore is done in the same way. If you need to restore a backup in the middle of a crisis, then having all the steps scripted and documented not only helps you remember what to do, but also reduces the chance that something goes wrong.
Related to this discussion is that you should copy the backups to remote storage.
Copy the Backups Off Site
In the previous section, it was discussed how you need to consider all your restore scenarios. That should include the case where the whole server or even whole data center is gone. What do you do in that case?
Other than the need to provision a new MySQL instance somewhere else, you also need to still have access to your backups. This means that a backup that is only stored locally (either on the same host or in the same data center) is of no use for this case.
When you decide where to store your backups, you need to consider your requirements. How long time is acceptable to wait to download the backup during a recovery, and what kind of disasters (power outage, fire, earthquake, meteor strike, etc.) must the backup be able to survive? You can choose to have backups available on the local host and/or data center, so they are quickly available, for example in case a user deletes the wrong data. Then have another storage location either in the other end of the country or even on another continent to protect against a major disaster.
Of course even having the best written instructions in the World and copies of the backups on all continents do not help you if the backup is corrupted or broken.
Verify Your Backups
A backup is only as good as your ability to restore it and bring the restored instance online. This is the reason, it is so important to test your restore procedures as discussed above. Optimally, you should restore every single backup. In the real world that is not always realistic, but it is still important that you practice a restore from time to time.
In practice it may not be possible to restore every single backup in all the restore combinations. So, you will need to add some other checks. The exact checks you should do depend on your backups, but some possibilities are:
- MySQL Enterprise Backup (MEB) has a
validatecommand. This will verify the InnoDB checksums for each page. This checks whether the backup is truncated, corrupted, or damaged.
- MySQL Enterprise Backup can store the result of the backup in the
mysql.backup_historytable (enabled by default). This includes the overall backup status.
- Verify the backup is created and has a minimum size.
- If you have a logical backup, grep for some strings you know should be in the backup, such as
The validation of your backups is of course only useful if you realize when the validation fails, so you also need to monitor the backups.
Monitor the Backups
Monitoring is one of the most important tasks for a database administrator. That also includes monitoring the backups. Once you have verification of the backups in place, you need to ensure the validation status is monitored.
How you do this depends on the monitoring solution you use. In MySQL Enterprise Monitor (MEM) there is a built-in backup dashboard with information about your MySQL Enterprise Backup (MEB) backups; this information is based on the data logged by MySQL Enterprise Backup to the
mysql.backup_history table and includes the type of backup, the backup status, how long time the backup took, how long time locks were held, etc. MySQL Enterprise Monitor also creates events when backups fail.
This far, all the advises have been focused on what you should do with the backup after it has been created. What about creating the backups?
When you decide how you want to create the backup, there are many considerations to take. This section will consider some of those.
First of all you need to determine what you need for your backups and what interruption of your production system is allowed when creating the backups. Some of the things to consider are:
- How much data can you afford to lose in case of a catastrophic disaster?
- How long time is acceptable to restore the backup?
- What data must be included in the backup?
- Which other files (for example binary logs and configuration files) must be included?
- Do you need to be able to do a point-in-time recovery?
- Can the production system be taken offline during the backup or into read-only mode? If so, for how long?
Answering these questions helps you determine the backup method that is optimal for your system. Some of the backup methods available are:
- Logical Backups:
mysqlpump: This is available in MySQL 5.7 and later and allows for parallel backups. In most cases other than for MySQL NDB Cluster, it is preferred over
mysqldump: This is the classical program to create logical backups in MySQL.
- Native NDB Backups: This is a bit of hybrid between a logical backup and a raw backup specialized for the NDBCluster storage engine. It uses a native storage format but can be converted to CSV files.
- Binary (Raw) Backups:
- MySQL Enterprise Backup (MEB): This is available for customers with a MySQL Enterprise Edition subscription. It is particularly useful for MySQL instances that mainly use the InnoDB storage engine.
- File system snapshots: For example when using LVM or ZFS, you can tell the file system to create a snapshot, then copy the files.
Whichever method you choose, make sure you understand its limitations. As an example, file system snapshots can work great in many cases, but if MySQL uses more than one file system for the database files, then it may not be possible to create a consistent snapshot (
FLUSH TABLES WITH READ LOCK does not stop background writes for InnoDB except for tables that have been explicitly listed).
You also need to take the overhead of the backup method into consideration. At the very least it will impact MySQL by reading the data. There will also be some locking involved even if it in some cases may be very limited. In all cases, creating the backup at the most quiet time of the day can help reduce the impact. An option is also to use a replica for the backups, but even in that case the overhead must be considered as the replica need to be able to keep up or catch up before the next backup.
Now you have considered how to create the backups, validated them, copied them to secure off-site locations, and tested all possible the restore scenarios. So you are all set and can put backups on auto-pilot? Not so fast.
Backups Are a Never Ending Process
The World is not a static place. Neither are your MySQL instances. The configuration changes, the application add new features, requirements change, the amount of data grows, new MySQL instance are installed on new hardware or different cloud shapes or with a different cloud provider, there are updates to MySQL and the backup program, and so on.
This means that the process of working with the backup and restore processes never ends. You need to regularly evaluate whether your backup strategy still works and fulfills all requirements. Look at the bright side: you keep learning and the experience you have gathered since the last evaluation may help you implement an even better backup solution.
Happy World Backup Day.