Automatic Decryption of MySQL Binary Logs Using Python

One of the new features in MySQL 8.0.14 is support for encrypting the binary logs. While encryption makes the data more secure (provided the key is secret of course), it can make life a bit more difficult in terms of how easy it is to do tasks such as point-in-time recoveries. This blog shows how you can use the binlog_decrypt.py Python script to decrypt the binary logs as long as you have the keyring that was used to encrypt it.

Decrypting a binary log with binlog_decrypt.py

Introduction and Background

João Gramacho wrote a nice blog how you can use standard Linux programs to decrypt the binary logs. This inspired me to consider implementing the same, but using Python which should make the script easier to use. Specifically, my aim was that the Python script should have the following features:

  • It should work cross platform. I have tested the script on Oracle Linux 7 and Microsoft Windows 7.
  • The key used to encrypt binary logs can be rotated, so different binary logs use different keys. The script should automatically determine which key a binary log uses and extract if from the keyring. For simplicity, I only implemented support for the keyring_file plugin.
  • The script should be able to handle multiple binary logs and gracefully handle unencrypted binary logs.

Introducing binlog_decrypt.py

As it turned out once I understood how the keyring file works, the task was pretty straight forward using João’s blog to get the required steps. I have maintained the overall steps from that blog. The result can be downloaded from the following link:

Advice

I would like to say a big thank you to João Gramacho and Ivan Švaljek for tips on how to work with the keyring_file data format.

Some important comments about the script are:

  • The script only works with Python 3 (tested with Python 3.6).
  • All work is done in-memory. While this gives good performance (a 1.1GiB binary log on my laptop decrypts in around three seconds when the encrypted log is in the operating system I/O cache), it does mean that the memory usage is quite high. The 1.1GiB file resulted in a 3.2GiB peak memory usage.
  • Other than performing checks of the binary log content, I have added limited error checking. This is to keep focus on the actual work required to decrypt the binary log.
  • The cryptography module is used for the decryption work. The easiest way to install the module is to use pip (see below).
  • The keyring must be from the keyring_file plugin and using format version 2.0 (the format current as of MySQL 8.0.14). If you use a different keyring plugin, you can use the keyring migration feature to create a copy of the keyring using keyring_file. (But, please note that keyring_file is not a secure keyring format.)

Warning

The script uses the low-level methods of the cryptography module (the hazmat sub module – named so for a good reason). This is OK in this case as it is pure decryption. However, for encryption please do not use the hazmat methods unless you really know what you are doing.

Installing Prerequisites

If you are using Oracle Linux 7, Red Hat Enterprise Linux (RHEL) 7, or CentOS 7, the included Python version is 2.7. This will not work with the binlog_decrypt.py script. You can install Python 3.6 in addition to Python 2.7 from the EPEL repository using the following steps (assuming you have already added the EPEL repository):

shell$ yum install python36
shell$ python3.6 -m ensurepip
shell$ python3.6 -m pip install --upgrade pip

This also installs and upgrades the pip command which can be invoked using python3.6 -m pip.

On all platforms, you can install the cryptography module using pip, for example (from Microsoft Windows):

PS:> python -m pip install cryptography
Collecting cryptography
  Downloading https://files.pythonhosted.org/packages/65/d6/48e8194ab0d0d643acb89042a853d029c7cd2daaaba52cf4ff83ff0060a9/cryptography-2.5-cp36-cp36m-win_amd64.whl (1.5MB)
    100% |████████████████████████████████| 1.5MB 4.7MB/s
Collecting asn1crypto>=0.21.0 (from cryptography)
  Downloading https://files.pythonhosted.org/packages/ea/cd/35485615f45f30a510576f1a56d1e0a7ad7bd8ab5ed7cdc600ef7cd06222/asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)
    100% |████████████████████████████████| 102kB 5.8MB/s
Requirement already satisfied: six>=1.4.1 in c:\users\jesper\appdata\local\programs\python\python36\lib\site-packages (from cryptography) (1.11.0)
Collecting cffi!=1.11.3,>=1.8 (from cryptography)
  Downloading https://files.pythonhosted.org/packages/2f/85/a9184548ad4261916d08a50d9e272bf6f93c54f3735878fbfc9335efd94b/cffi-1.11.5-cp36-cp36m-win_amd64.whl (166kB)
    100% |████████████████████████████████| 174kB 5.5MB/s
Collecting pycparser (from cffi!=1.11.3,>=1.8->cryptography)
  Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
    100% |████████████████████████████████| 163kB 5.2MB/s
Installing collected packages: asn1crypto, pycparser, cffi, cryptography
  Running setup.py install for pycparser ... done

If you use Oracle Linux 7, RHEL 7, or CentOS 7, invoke pip using python3.6 -m pip instead.

Using binlog_decrypt.py

You can now test the script. Assuming you have two binary logs of which the first is not encrypted and the second is encrypted:

mysql> SHOW BINARY LOGS;
+---------------+-----------+-----------+
| Log_name      | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000001 |    722755 | No        |
| binlog.000002 |    723022 | Yes       |
+---------------+-----------+-----------+
3 rows in set (0.01 sec)

You can now use the script as:

PS:> python binlog_decrypt.py --keyring_file_data="C:\ProgramData\MySQL\MySQL Server 8.0\keyring" "C:\ProgramData\MySQL\MySQL Server 8.0\data\binlog.000001" "C:\ProgramData\My
SQL\MySQL Server 8.0\data\binlog.000002"
binlog.000001: Binary log is not encrypted. Skipping.
binlog.000002: Keyring key ID for is 'MySQLReplicationKey_59e3f95b-e0d6-11e8-94e8-ace2d35785be_1'
binlog.000005: Successfully decrypted as 'C:\tmp\plain-binlog.000005'

Notice how binlog.000001 is skipped because it is detected that the binary log is not encrypted.

This is just an example. Invoke the script with the --help argument to get a description of all of the options.

The Full Source Code

For reference, here is the full source code for the script:

import sys
import os
import struct
import collections
import hashlib
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend

def key_and_iv_from_password(password):
    # Based on
    # https://stackoverflow.com/questions/13907841/implement-openssl-aes-encryption-in-python

    key_length = 32
    iv_length = 16
    required_length = key_length + iv_length
    password = password

    key_iv = hashlib.sha512(password).digest()
    tmp = [key_iv]
    while len(tmp) < required_length:
        tmp.append(hashlib.sha512(tmp[-1] + password).digest())
        key_iv += tmp[-1]

    key = key_iv[:key_length]
    iv = key_iv[key_length:required_length]

    return key, iv


class Key(
    collections.namedtuple(
        'Key', [
            'key_id',
            'key_type',
            'user_id',
            'key_data',
        ]
    )):
    __slots__ = ()


class Keyring(object):
    _keys = []
    _keyring_file_version = None
    _xor_str = '*305=Ljt0*!@$Hnm(*-9-w;:'.encode('utf-8')

    def __init__(self, keyring_filepath):
        self.read_keyring(keyring_filepath)

    def _read_key(self, data):
        overall_length = struct.unpack('<Q', data[0:8])[0]
        key_id_length = struct.unpack('<Q', data[8:16])[0]
        key_type_length = struct.unpack('<Q', data[16:24])[0]
        user_id_length = struct.unpack('<Q', data[24:32])[0]
        key_length = struct.unpack('<Q', data[32:40])[0]

        key_id_start = 40
        key_type_start = key_id_start + key_id_length
        user_id_start = key_type_start + key_type_length
        key_start = user_id_start + user_id_length
        key_end = key_start + key_length

        key_id = data[key_id_start:key_type_start].decode('utf-8')
        key_type = data[key_type_start:user_id_start].decode('utf-8')
        # The User ID may be blank in which case the length is zero
        user_id = data[user_id_start:key_start].decode('utf-8') if user_id_length > 0 else None
        key_raw = data[key_start:key_end]
        xor_str_len = len(self._xor_str)
        key_data = bytes([key_raw[i] ^ self._xor_str[i%xor_str_len]
                          for i in range(len(key_raw))])

        return Key(key_id, key_type, user_id, key_data)

    def read_keyring(self, filepath):
        keyring_data = bytearray()
        with open(filepath, 'rb') as keyring_fs:
            chunk = keyring_fs.read()
            while len(chunk) > 0:
                keyring_data.extend(chunk)
                chunk = keyring_fs.read()

            keyring_fs.close()

        # Verify the start of the file is "Keyring file version:"
        header = keyring_data[0:21]
        if header.decode('utf-8') != 'Keyring file version:':
            raise ValueError('Invalid header in the keyring file: {0}'
                             .format(header.hex()))

        # Get the keyring version - currently only 2.0 is supported
        version = keyring_data[21:24].decode('utf-8')
        if version != '2.0':
            raise ValueError('Unsupported keyring version: {0}'
                             .format(version))

        self._keyring_file_version = version
        keyring_length = len(keyring_data)
        offset = 24
        keys = []
        while offset < keyring_length and keyring_data[offset:offset+3] != b'EOF':
            key_length = struct.unpack('<Q', keyring_data[offset:offset+8])[0]
            key_data = keyring_data[offset:offset+key_length]
            key = self._read_key(key_data)
            keys.append(key)
            offset += key_length

        self._keys = keys

    def get_key(self, key_id, user_id):
        for key in self._keys:
            if key.key_id == key_id and key.user_id == user_id:
                return key

        return None


def decrypt_binlog(binlog, keyring, out_dir, prefix):
    '''Decrypts a binary log and outputs it to out_dir with the prefix
    prepended. The arguments are:

        * binlog - the path to the encrypted binary log
        * keyring - a Keyring object
        * out_dir - the output directory
        * prefix - prefix to add to the binary log basename.
    '''
    magic_encrypted = 'fd62696e'
    magic_decrypted = 'fe62696e'

    binlog_basename = os.path.basename(binlog)
    decrypt_binlog_path = os.path.join(
        out_dir, '{0}{1}'.format(prefix, binlog_basename))
    if os.path.exists(decrypt_binlog_path):
        print("{0}: Decrypted binary log path, '{1}' already exists. Skipping"
              .format(binlog_basename, decrypt_binlog_path), file=sys.stderr)
        return False

    with open(binlog, 'rb') as binlog_fs:
        # Verify the magic bytes are correct
        magic = binlog_fs.read(4)
        if magic.hex() == magic_decrypted:
            print('{0}: Binary log is not encrypted. Skipping.'
                  .format(binlog_basename), file=sys.stderr)
            return False
        elif magic.hex() != magic_encrypted:
            print("{0}: Found invalid magic '0x{1}' for encrypted binlog file."
                  .format(binlog_basename, magic.hex(), file=sys.stderr))
            return False

        # Get the encrypted version (must currently be 1)
        version = struct.unpack('<B', binlog_fs.read(1))[0]
        if version != 1:
            print("{0}: Unsupported binary log encrypted version '{1}'"
                  .format(binlog_basename, version), file=sys.stderr)
            return False

        # First header field is a TLV: the keyring key ID
        field_type = struct.unpack('<B', binlog_fs.read(1))[0]
        if field_type != 1:
            print('{0}: Invalid field type ({1}). Keyring key ID (1) was '
                  + 'expected.'.format(binlog_basename, field_type),
                  file=sys.stderr)
            return False

        keyring_id_len = struct.unpack('<B', binlog_fs.read(1))[0]
        keyring_id = binlog_fs.read(keyring_id_len).decode('utf-8')
        print("{0}: Keyring key ID for is '{1}'"
              .format(binlog_basename, keyring_id), file=sys.stderr)

        # Get the key from the keyring file
        key = keyring.get_key(keyring_id, None)

        # Second header is a TV: the encrypted file password
        field_type = struct.unpack('<B', binlog_fs.read(1))[0]
        if field_type != 2:
            print('{0}: Invalid field type ({1}). Encrypted file password (2) '
                  + 'was expected.'.format(binlog_basename, field_type),
                  file=sys.stderr)
            return False
        encrypted_password = binlog_fs.read(32)

        # Third header field is a TV: the IV to decrypt the file password
        field_type = struct.unpack('<B', binlog_fs.read(1))[0]
        if field_type != 3:
            print('{0}: Invalid field type ({1}). IV to decrypt the file '
                  + 'password (3) was expected.'
                  .format(binlog_basename, field_type), file=sys.stderr)
            return False
        iv = binlog_fs.read(16)

        backend = default_backend()
        cipher = Cipher(algorithms.AES(key.key_data), modes.CBC(iv),
                        backend=backend)
        decryptor = cipher.decryptor()
        password = decryptor.update(encrypted_password) + decryptor.finalize()

        # Generate the file key and IV
        key, iv = key_and_iv_from_password(password)
        nonce = iv[0:8] + bytes(8)
     
        # Decrypt the file data (the binary log content)
        # The encrypted binary log headers are 512, so skip those
        binlog_fs.seek(512, os.SEEK_SET)
        binlog_encrypted_data = binlog_fs.read()
        binlog_fs.close()

    cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=backend)
    decryptor = cipher.decryptor()
    binlog_decrypted_data = decryptor.update(binlog_encrypted_data)
    binlog_decrypted_data += decryptor.finalize()
    binlog_encrypted_data = None

    # Check decrypted binary log magic
    magic = binlog_decrypted_data[0:4]
    if magic.hex() != magic_decrypted:
        print("{0}: Found invalid magic '0x{1}' for decrypted binlog file."
              .format(binlog_basename, magic.hex()), file=sys.stderr)
        return False

    # Write the decrypted binary log to disk
    with open(decrypt_binlog_path, 'wb') as new_fs:
        new_fs.write(binlog_decrypted_data)
        new_fs.close()

    print("{0}: Successfully decrypted as '{1}'"
          .format(binlog_basename, decrypt_binlog_path))
    return True

def decrypt_binlogs(args):
    '''Outer routine for decrypted one or more binary logs. The
    argument args is a named touple (typically from the argparse
    parser) with the following members:

       * args.binlogs - a list or tuple of the binary logs to decrypt
       * args.keyring_file_data - the path to the file with the
            kerying data for the keyring_file plugin.
       * args.dir - the output directory for the decrypted binary logs
       * args.prefix - the prefix to prepend to the basename of the
            encrypted binary log filenames. This allows you to output
            the decrypted to the same directory as the encrypted
            binary logs without overwriting the original files.
    '''
    keyring = Keyring(args.keyring_file_data)
    for binlog in args.binlogs:
        decrypt_binlog(binlog, keyring, args.dir, args.prefix)

def main(argv):
    import argparse

    parser = argparse.ArgumentParser(
        prog='decrypt_binlog.py',
        description='Decrypt one or more binary log files from MySQL Server '
                   +'8.0.14+ created with binlog_encryption = ON. The '
                   +'binary log files have the prefix given with --prefix '
                   +'prepended to their file names.'
                   +'If an output file already exists, the file will be '
                   +'skipped.',
        epilog='All work is performed in-memory. For this reason, the'
               +'expected peak memory usage is around three times the'
               +'size of the largest binary log. As max_binlog_size can'
               +'at most be 1G, for instances exlusively executing small'
               +'transactions, the memory usage can thus be up to around'
               +'3.5G. For instances executing large transactions, the'
               +'binary log files can be much larger than 1G and thus the'
               +'memory usage equally larger.')

    parser.add_argument('-d', '--dir', default=os.getcwd(),
        dest='dir',
        help='The destination directory for the decrypted binary log files. '
             +'The default is to use the current directory.')

    parser.add_argument('-p', '--prefix', default='plain-',
        dest='prefix',
        help='The prefix to prepand to the basename of the binary log file.'
             +'The default is plain-.')

    parser.add_argument('-k', '--keyring_file_data', default=None,
        dest='keyring_file_data',
        help='The path to the keyring file. The same as keyring_file_data in '
             +'the MySQL configuration. This option is mandatory.')

    parser.add_argument('binlogs', nargs=argparse.REMAINDER,
                        help='The binary log files to decrypt.')

    args = parser.parse_args()
    if not args.binlogs:
        print('ERROR: At least one binary log file must be specified.\n',
              file=sys.stderr)
        parser.print_help(file=sys.stderr)
        sys.exit(1)

    if not args.keyring_file_data:
        print('ERROR: The path to the keyring file must be specified.\n',
              file=sys.stderr)
        parser.print_help(file=sys.stderr)
        sys.exit(1)

    decrypt_binlogs(args)


if __name__ == '__main__':
   main(sys.argv[1:])

The start of the script is the handling of the keyring. Then follows the code for decrypting the binary logs which has a total of three functions (from bottom and up):

  • main: The function for handling the command line arguments.
  • decrypt_binlogs: Initializes the keyring and loops over the binary logs.
  • decrypt_binlog: Decrypts a single binary log.

For a closer discussion of the individual steps to decrypt the binary log, I recommend you to read João Gramacho’s blog How to manually decrypt an encrypted binary log file.