Automatic Decryption of MySQL Binary Logs Using Python
One of the new features in MySQL 8.0.14 is support for encrypting the binary logs. While encryption makes the data more secure (provided the key is secret of course), it can make life a bit more difficult in terms of how easy it is to do tasks such as point-in-time recoveries. This blog shows how you can use the binlog_decrypt.py
Python script to decrypt the binary logs as long as you have the keyring that was used to encrypt it.
Introduction and Background
João Gramacho wrote a nice blog how you can use standard Linux programs to decrypt the binary logs. This inspired me to consider implementing the same, but using Python which should make the script easier to use. Specifically, my aim was that the Python script should have the following features:
- It should work cross platform. I have tested the script on Oracle Linux 7 and Microsoft Windows 7.
- The key used to encrypt binary logs can be rotated, so different binary logs use different keys. The script should automatically determine which key a binary log uses and extract if from the keyring. For simplicity, I only implemented support for the
keyring_file
plugin. - The script should be able to handle multiple binary logs and gracefully handle unencrypted binary logs.
Introducing binlog_decrypt.py
As it turned out once I understood how the keyring file works, the task was pretty straight forward using João's blog to get the required steps. I have maintained the overall steps from that blog. The result can be downloaded from the following link:
Some important comments about the script are:
- The script only works with Python 3 (tested with Python 3.6).
- All work is done in-memory. While this gives good performance (a 1.1GiB binary log on my laptop decrypts in around three seconds when the encrypted log is in the operating system I/O cache), it does mean that the memory usage is quite high. The 1.1GiB file resulted in a 3.2GiB peak memory usage.
- Other than performing checks of the binary log content, I have added limited error checking. This is to keep focus on the actual work required to decrypt the binary log.
- The
cryptography
module is used for the decryption work. The easiest way to install the module is to usepip
(see below). - The keyring must be from the
keyring_file
plugin and using format version 2.0 (the format current as of MySQL 8.0.14). If you use a different keyring plugin, you can use the keyring migration feature to create a copy of the keyring usingkeyring_file
. (But, please note thatkeyring_file
is not a secure keyring format.)
Installing Prerequisites
If you are using Oracle Linux 7, Red Hat Enterprise Linux (RHEL) 7, or CentOS 7, the included Python version is 2.7. This will not work with the binlog_decrypt.py
script. You can install Python 3.6 in addition to Python 2.7 from the EPEL repository using the following steps (assuming you have already added the EPEL repository):
shell$ yum install python36
shell$ python3.6 -m ensurepip
shell$ python3.6 -m pip install --upgrade pip
This also installs and upgrades the pip
command which can be invoked using python3.6 -m pip
.
On all platforms, you can install the cryptography
module using pip
, for example (from Microsoft Windows):
PS:> python -m pip install cryptography
Collecting cryptography
Downloading https://files.pythonhosted.org/packages/65/d6/48e8194ab0d0d643acb89042a853d029c7cd2daaaba52cf4ff83ff0060a9/cryptography-2.5-cp36-cp36m-win_amd64.whl (1.5MB)
100% |████████████████████████████████| 1.5MB 4.7MB/s
Collecting asn1crypto>=0.21.0 (from cryptography)
Downloading https://files.pythonhosted.org/packages/ea/cd/35485615f45f30a510576f1a56d1e0a7ad7bd8ab5ed7cdc600ef7cd06222/asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)
100% |████████████████████████████████| 102kB 5.8MB/s
Requirement already satisfied: six>=1.4.1 in c:\users\jesper\appdata\local\programs\python\python36\lib\site-packages (from cryptography) (1.11.0)
Collecting cffi!=1.11.3,>=1.8 (from cryptography)
Downloading https://files.pythonhosted.org/packages/2f/85/a9184548ad4261916d08a50d9e272bf6f93c54f3735878fbfc9335efd94b/cffi-1.11.5-cp36-cp36m-win_amd64.whl (166kB)
100% |████████████████████████████████| 174kB 5.5MB/s
Collecting pycparser (from cffi!=1.11.3,>=1.8->cryptography)
Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
100% |████████████████████████████████| 163kB 5.2MB/s
Installing collected packages: asn1crypto, pycparser, cffi, cryptography
Running setup.py install for pycparser ... done
If you use Oracle Linux 7, RHEL 7, or CentOS 7, invoke pip
using python3.6 -m pip
instead.
Using binlog_decrypt.py
You can now test the script. Assuming you have two binary logs of which the first is not encrypted and the second is encrypted:
mysql> SHOW BINARY LOGS;
+---------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000001 | 722755 | No |
| binlog.000002 | 723022 | Yes |
+---------------+-----------+-----------+
3 rows in set (0.01 sec)
You can now use the script as:
PS:> python binlog_decrypt.py --keyring_file_data="C:\ProgramData\MySQL\MySQL Server 8.0\keyring" "C:\ProgramData\MySQL\MySQL Server 8.0\data\binlog.000001" "C:\ProgramData\My
SQL\MySQL Server 8.0\data\binlog.000002"
binlog.000001: Binary log is not encrypted. Skipping.
binlog.000002: Keyring key ID for is 'MySQLReplicationKey_59e3f95b-e0d6-11e8-94e8-ace2d35785be_1'
binlog.000005: Successfully decrypted as 'C:\tmp\plain-binlog.000005'
Notice how binlog.000001
is skipped because it is detected that the binary log is not encrypted.
This is just an example. Invoke the script with the --help
argument to get a description of all of the options.
The Full Source Code
For reference, here is the full source code for the script:
import sys
import os
import struct
import collections
import hashlib
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
def key_and_iv_from_password(password):
# Based on
# https://stackoverflow.com/questions/13907841/implement-openssl-aes-encryption-in-python
key_length = 32
iv_length = 16
required_length = key_length + iv_length
password = password
key_iv = hashlib.sha512(password).digest()
tmp = [key_iv]
while len(tmp) < required_length:
tmp.append(hashlib.sha512(tmp[-1] + password).digest())
key_iv += tmp[-1]
key = key_iv[:key_length]
iv = key_iv[key_length:required_length]
return key, iv
class Key(
collections.namedtuple(
'Key', [
'key_id',
'key_type',
'user_id',
'key_data',
]
)):
__slots__ = ()
class Keyring(object):
_keys = []
_keyring_file_version = None
_xor_str = '*305=Ljt0*!@$Hnm(*-9-w;:'.encode('utf-8')
def __init__(self, keyring_filepath):
self.read_keyring(keyring_filepath)
def _read_key(self, data):
overall_length = struct.unpack('<Q', data[0:8])[0]
key_id_length = struct.unpack('<Q', data[8:16])[0]
key_type_length = struct.unpack('<Q', data[16:24])[0]
user_id_length = struct.unpack('<Q', data[24:32])[0]
key_length = struct.unpack('<Q', data[32:40])[0]
key_id_start = 40
key_type_start = key_id_start + key_id_length
user_id_start = key_type_start + key_type_length
key_start = user_id_start + user_id_length
key_end = key_start + key_length
key_id = data[key_id_start:key_type_start].decode('utf-8')
key_type = data[key_type_start:user_id_start].decode('utf-8')
# The User ID may be blank in which case the length is zero
user_id = data[user_id_start:key_start].decode('utf-8') if user_id_length > 0 else None
key_raw = data[key_start:key_end]
xor_str_len = len(self._xor_str)
key_data = bytes([key_raw[i] ^ self._xor_str[i%xor_str_len]
for i in range(len(key_raw))])
return Key(key_id, key_type, user_id, key_data)
def read_keyring(self, filepath):
keyring_data = bytearray()
with open(filepath, 'rb') as keyring_fs:
chunk = keyring_fs.read()
while len(chunk) > 0:
keyring_data.extend(chunk)
chunk = keyring_fs.read()
keyring_fs.close()
# Verify the start of the file is "Keyring file version:"
header = keyring_data[0:21]
if header.decode('utf-8') != 'Keyring file version:':
raise ValueError('Invalid header in the keyring file: {0}'
.format(header.hex()))
# Get the keyring version - currently only 2.0 is supported
version = keyring_data[21:24].decode('utf-8')
if version != '2.0':
raise ValueError('Unsupported keyring version: {0}'
.format(version))
self._keyring_file_version = version
keyring_length = len(keyring_data)
offset = 24
keys = []
while offset < keyring_length and keyring_data[offset:offset+3] != b'EOF':
key_length = struct.unpack('<Q', keyring_data[offset:offset+8])[0]
key_data = keyring_data[offset:offset+key_length]
key = self._read_key(key_data)
keys.append(key)
offset += key_length
self._keys = keys
def get_key(self, key_id, user_id):
for key in self._keys:
if key.key_id == key_id and key.user_id == user_id:
return key
return None
def decrypt_binlog(binlog, keyring, out_dir, prefix):
'''Decrypts a binary log and outputs it to out_dir with the prefix
prepended. The arguments are:
* binlog - the path to the encrypted binary log
* keyring - a Keyring object
* out_dir - the output directory
* prefix - prefix to add to the binary log basename.
'''
magic_encrypted = 'fd62696e'
magic_decrypted = 'fe62696e'
binlog_basename = os.path.basename(binlog)
decrypt_binlog_path = os.path.join(
out_dir, '{0}{1}'.format(prefix, binlog_basename))
if os.path.exists(decrypt_binlog_path):
print("{0}: Decrypted binary log path, '{1}' already exists. Skipping"
.format(binlog_basename, decrypt_binlog_path), file=sys.stderr)
return False
with open(binlog, 'rb') as binlog_fs:
# Verify the magic bytes are correct
magic = binlog_fs.read(4)
if magic.hex() == magic_decrypted:
print('{0}: Binary log is not encrypted. Skipping.'
.format(binlog_basename), file=sys.stderr)
return False
elif magic.hex() != magic_encrypted:
print("{0}: Found invalid magic '0x{1}' for encrypted binlog file."
.format(binlog_basename, magic.hex(), file=sys.stderr))
return False
# Get the encrypted version (must currently be 1)
version = struct.unpack('<B', binlog_fs.read(1))[0]
if version != 1:
print("{0}: Unsupported binary log encrypted version '{1}'"
.format(binlog_basename, version), file=sys.stderr)
return False
# First header field is a TLV: the keyring key ID
field_type = struct.unpack('<B', binlog_fs.read(1))[0]
if field_type != 1:
print('{0}: Invalid field type ({1}). Keyring key ID (1) was '
+ 'expected.'.format(binlog_basename, field_type),
file=sys.stderr)
return False
keyring_id_len = struct.unpack('<B', binlog_fs.read(1))[0]
keyring_id = binlog_fs.read(keyring_id_len).decode('utf-8')
print("{0}: Keyring key ID for is '{1}'"
.format(binlog_basename, keyring_id), file=sys.stderr)
# Get the key from the keyring file
key = keyring.get_key(keyring_id, None)
# Second header is a TV: the encrypted file password
field_type = struct.unpack('<B', binlog_fs.read(1))[0]
if field_type != 2:
print('{0}: Invalid field type ({1}). Encrypted file password (2) '
+ 'was expected.'.format(binlog_basename, field_type),
file=sys.stderr)
return False
encrypted_password = binlog_fs.read(32)
# Third header field is a TV: the IV to decrypt the file password
field_type = struct.unpack('<B', binlog_fs.read(1))[0]
if field_type != 3:
print('{0}: Invalid field type ({1}). IV to decrypt the file '
+ 'password (3) was expected.'
.format(binlog_basename, field_type), file=sys.stderr)
return False
iv = binlog_fs.read(16)
backend = default_backend()
cipher = Cipher(algorithms.AES(key.key_data), modes.CBC(iv),
backend=backend)
decryptor = cipher.decryptor()
password = decryptor.update(encrypted_password) + decryptor.finalize()
# Generate the file key and IV
key, iv = key_and_iv_from_password(password)
nonce = iv[0:8] + bytes(8)
# Decrypt the file data (the binary log content)
# The encrypted binary log headers are 512, so skip those
binlog_fs.seek(512, os.SEEK_SET)
binlog_encrypted_data = binlog_fs.read()
binlog_fs.close()
cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=backend)
decryptor = cipher.decryptor()
binlog_decrypted_data = decryptor.update(binlog_encrypted_data)
binlog_decrypted_data += decryptor.finalize()
binlog_encrypted_data = None
# Check decrypted binary log magic
magic = binlog_decrypted_data[0:4]
if magic.hex() != magic_decrypted:
print("{0}: Found invalid magic '0x{1}' for decrypted binlog file."
.format(binlog_basename, magic.hex()), file=sys.stderr)
return False
# Write the decrypted binary log to disk
with open(decrypt_binlog_path, 'wb') as new_fs:
new_fs.write(binlog_decrypted_data)
new_fs.close()
print("{0}: Successfully decrypted as '{1}'"
.format(binlog_basename, decrypt_binlog_path))
return True
def decrypt_binlogs(args):
'''Outer routine for decrypted one or more binary logs. The
argument args is a named touple (typically from the argparse
parser) with the following members:
* args.binlogs - a list or tuple of the binary logs to decrypt
* args.keyring_file_data - the path to the file with the
kerying data for the keyring_file plugin.
* args.dir - the output directory for the decrypted binary logs
* args.prefix - the prefix to prepend to the basename of the
encrypted binary log filenames. This allows you to output
the decrypted to the same directory as the encrypted
binary logs without overwriting the original files.
'''
keyring = Keyring(args.keyring_file_data)
for binlog in args.binlogs:
decrypt_binlog(binlog, keyring, args.dir, args.prefix)
def main(argv):
import argparse
parser = argparse.ArgumentParser(
prog='decrypt_binlog.py',
description='Decrypt one or more binary log files from MySQL Server '
+'8.0.14+ created with binlog_encryption = ON. The '
+'binary log files have the prefix given with --prefix '
+'prepended to their file names.'
+'If an output file already exists, the file will be '
+'skipped.',
epilog='All work is performed in-memory. For this reason, the'
+'expected peak memory usage is around three times the'
+'size of the largest binary log. As max_binlog_size can'
+'at most be 1G, for instances exlusively executing small'
+'transactions, the memory usage can thus be up to around'
+'3.5G. For instances executing large transactions, the'
+'binary log files can be much larger than 1G and thus the'
+'memory usage equally larger.')
parser.add_argument('-d', '--dir', default=os.getcwd(),
dest='dir',
help='The destination directory for the decrypted binary log files. '
+'The default is to use the current directory.')
parser.add_argument('-p', '--prefix', default='plain-',
dest='prefix',
help='The prefix to prepand to the basename of the binary log file.'
+'The default is plain-.')
parser.add_argument('-k', '--keyring_file_data', default=None,
dest='keyring_file_data',
help='The path to the keyring file. The same as keyring_file_data in '
+'the MySQL configuration. This option is mandatory.')
parser.add_argument('binlogs', nargs=argparse.REMAINDER,
help='The binary log files to decrypt.')
args = parser.parse_args()
if not args.binlogs:
print('ERROR: At least one binary log file must be specified.\n',
file=sys.stderr)
parser.print_help(file=sys.stderr)
sys.exit(1)
if not args.keyring_file_data:
print('ERROR: The path to the keyring file must be specified.\n',
file=sys.stderr)
parser.print_help(file=sys.stderr)
sys.exit(1)
decrypt_binlogs(args)
if __name__ == '__main__':
main(sys.argv[1:])
The start of the script is the handling of the keyring. Then follows the code for decrypting the binary logs which has a total of three functions (from bottom and up):
main
: The function for handling the command line arguments.decrypt_binlogs
: Initializes the keyring and loops over the binary logs.decrypt_binlog
: Decrypts a single binary log.
For a closer discussion of the individual steps to decrypt the binary log, I recommend you to read João Gramacho's blog How to manually decrypt an encrypted binary log file.
Thank you, script realy work!