More Statistics for Slow Queries: log_slow_extra

The slow query log is the trusted old method of recording slow query, so the database administrator can determine which queries are in the most need for optimization. Since MySQL 5.6, it has to some extend been overshadowed by the Performance Schema which has lower overhead and thus allows collecting statistics about all queries. The slow query log has one major advantage though: the data is persisted. In MySQL 8.0.14 which was recently released, there is an improvement for the slow query log: additional statistics about the recorded queries.

The slow query log with log_slow_extra enabled.
The slow query log with log_slow_extra enabled.

Contribution

Thanks for Facebook for contributing a patch for the new feature.

The additional information is not recorded by default. To enable the feature, enable the log_slow_extra option:

mysql> SET PERSIST log_slow_extra = ON;
Query OK, 0 rows affected (0.05 sec)

Here, SET PERSIST is used, so the configuration change is persisted when MySQL is restarted. If you just want to try the feature, you can use SET GLOBAL, then decide later whether you want to keep it enabled.

That is all that is required. You can now execute a “slow” query and take a look at the recorded data. An easy way to execute a slow query is to execute DO SLEEP(...) where you can replace ... with the number of seconds you want to sleep. However that is not a very interesting query. Another option is to lower long_query_time to ensure your query is captured. If you set long_query_time to zero, all queries are recorded:

mysql> SET SESSION long_query_time = 0.0;
Query OK, 0 rows affected (0.01 sec)

Here SET SESSION is used so only the current connection is affected.

Be Aware

The slow query log does have overhead. It is for most systems not recommended to set long_query_time to zero for all connections all the time. Doing so can cause performance problems.

Finally, execute a query:

mysql> SELECT * FROM world.city ORDER BY Population DESC LIMIT 10;
+------+------------------+-------------+------------------+------------+
| ID   | Name             | CountryCode | District         | Population |
+------+------------------+-------------+------------------+------------+
| 1024 | Mumbai (Bombay)  | IND         | Maharashtra      |   10500000 |
| 2331 | Seoul            | KOR         | Seoul            |    9981619 |
|  206 | São Paulo        | BRA         | São Paulo        |    9968485 |
| 1890 | Shanghai         | CHN         | Shanghai         |    9696300 |
|  939 | Jakarta          | IDN         | Jakarta Raya     |    9604900 |
| 2822 | Karachi          | PAK         | Sindh            |    9269265 |
| 3357 | Istanbul         | TUR         | Istanbul         |    8787958 |
| 2515 | Ciudad de México | MEX         | Distrito Federal |    8591309 |
| 3580 | Moscow           | RUS         | Moscow (City)    |    8389200 |
| 3793 | New York         | USA         | New York         |    8008278 |
+------+------------------+-------------+------------------+------------+
10 rows in set (0.04 sec)

The resulting slow query log record is:

# Time: 2019-01-31T07:24:20.518505Z
# User@Host: root[root] @ localhost [::1]  Id:    15
# Query_time: 0.001827  Lock_time: 0.000087 Rows_sent: 10  Rows_examined: 4089 Thread_id: 15 Errno: 0 Killed: 0 Bytes_received: 0 Bytes_sent: 694 Read_first: 1 Read_last: 0 Read_key: 1 Read_next: 0 Read_prev: 0 Read_rnd: 0 Read_rnd_next: 4080 Sort_merge_passes: 0 Sort_range_count: 0 Sort_rows: 10 Sort_scan_count: 1 Created_tmp_disk_tables: 0 Created_tmp_tables: 0 Start: 2019-01-31T07:24:20.516678Z End: 2019-01-31T07:24:20.518505Z
SET timestamp=1548919460;
SELECT * FROM world.city ORDER BY Population DESC LIMIT 10;

The extra data can be a little hard to see due to the formatting of the blog – the extra fields are:

  • Thread_id: 15
  • Errno: 0
  • Killed: 0
  • Bytes_received: 0
  • Bytes_sent: 694
  • Read_first: 1
  • Read_last: 0
  • Read_key: 1
  • Read_next: 0
  • Read_prev: 0
  • Read_rnd: 0
  • Read_rnd_next: 4080
  • Sort_merge_passes: 0
  • Sort_range_count: 0
  • Sort_rows: 10
  • Sort_scan_count: 1
  • Created_tmp_disk_tables: 0
  • Created_tmp_tables: 0
  • Start: 2019-01-31T07:24:20.516678Z
  • End: 2019-01-31T07:24:20.518505Z

As comparison, here is the information for the same query with log_slow_extra = OFF:

# Time: 2019-01-31T07:24:06.100447Z
# User@Host: root[root] @ localhost [::1]  Id:    15
# Query_time: 0.002286  Lock_time: 0.000133 Rows_sent: 10  Rows_examined: 4089
SET timestamp=1548919446;
SELECT * FROM world.city ORDER BY Population DESC LIMIT 10;

Enjoy.

Automatic Decryption of MySQL Binary Logs Using Python

One of the new features in MySQL 8.0.14 is support for encrypting the binary logs. While encryption makes the data more secure (provided the key is secret of course), it can make life a bit more difficult in terms of how easy it is to do tasks such as point-in-time recoveries. This blog shows how you can use the binlog_decrypt.py Python script to decrypt the binary logs as long as you have the keyring that was used to encrypt it.

Decrypting a binary log with binlog_decrypt.py

Introduction and Background

João Gramacho wrote a nice blog how you can use standard Linux programs to decrypt the binary logs. This inspired me to consider implementing the same, but using Python which should make the script easier to use. Specifically, my aim was that the Python script should have the following features:

  • It should work cross platform. I have tested the script on Oracle Linux 7 and Microsoft Windows 7.
  • The key used to encrypt binary logs can be rotated, so different binary logs use different keys. The script should automatically determine which key a binary log uses and extract if from the keyring. For simplicity, I only implemented support for the keyring_file plugin.
  • The script should be able to handle multiple binary logs and gracefully handle unencrypted binary logs.

Introducing binlog_decrypt.py

As it turned out once I understood how the keyring file works, the task was pretty straight forward using João’s blog to get the required steps. I have maintained the overall steps from that blog. The result can be downloaded from the following link:

Advice

I would like to say a big thank you to João Gramacho and Ivan Švaljek for tips on how to work with the keyring_file data format.

Some important comments about the script are:

  • The script only works with Python 3 (tested with Python 3.6).
  • All work is done in-memory. While this gives good performance (a 1.1GiB binary log on my laptop decrypts in around three seconds when the encrypted log is in the operating system I/O cache), it does mean that the memory usage is quite high. The 1.1GiB file resulted in a 3.2GiB peak memory usage.
  • Other than performing checks of the binary log content, I have added limited error checking. This is to keep focus on the actual work required to decrypt the binary log.
  • The cryptography module is used for the decryption work. The easiest way to install the module is to use pip (see below).
  • The keyring must be from the keyring_file plugin and using format version 2.0 (the format current as of MySQL 8.0.14). If you use a different keyring plugin, you can use the keyring migration feature to create a copy of the keyring using keyring_file. (But, please note that keyring_file is not a secure keyring format.)

Warning

The script uses the low-level methods of the cryptography module (the hazmat sub module – named so for a good reason). This is OK in this case as it is pure decryption. However, for encryption please do not use the hazmat methods unless you really know what you are doing.

Installing Prerequisites

If you are using Oracle Linux 7, Red Hat Enterprise Linux (RHEL) 7, or CentOS 7, the included Python version is 2.7. This will not work with the binlog_decrypt.py script. You can install Python 3.6 in addition to Python 2.7 from the EPEL repository using the following steps (assuming you have already added the EPEL repository):

shell$ yum install python36
shell$ python3.6 -m ensurepip
shell$ python3.6 -m pip install --upgrade pip

This also installs and upgrades the pip command which can be invoked using python3.6 -m pip.

On all platforms, you can install the cryptography module using pip, for example (from Microsoft Windows):

PS:> python -m pip install cryptography
Collecting cryptography
  Downloading https://files.pythonhosted.org/packages/65/d6/48e8194ab0d0d643acb89042a853d029c7cd2daaaba52cf4ff83ff0060a9/cryptography-2.5-cp36-cp36m-win_amd64.whl (1.5MB)
    100% |████████████████████████████████| 1.5MB 4.7MB/s
Collecting asn1crypto>=0.21.0 (from cryptography)
  Downloading https://files.pythonhosted.org/packages/ea/cd/35485615f45f30a510576f1a56d1e0a7ad7bd8ab5ed7cdc600ef7cd06222/asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)
    100% |████████████████████████████████| 102kB 5.8MB/s
Requirement already satisfied: six>=1.4.1 in c:\users\jesper\appdata\local\programs\python\python36\lib\site-packages (from cryptography) (1.11.0)
Collecting cffi!=1.11.3,>=1.8 (from cryptography)
  Downloading https://files.pythonhosted.org/packages/2f/85/a9184548ad4261916d08a50d9e272bf6f93c54f3735878fbfc9335efd94b/cffi-1.11.5-cp36-cp36m-win_amd64.whl (166kB)
    100% |████████████████████████████████| 174kB 5.5MB/s
Collecting pycparser (from cffi!=1.11.3,>=1.8->cryptography)
  Downloading https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz (158kB)
    100% |████████████████████████████████| 163kB 5.2MB/s
Installing collected packages: asn1crypto, pycparser, cffi, cryptography
  Running setup.py install for pycparser ... done

If you use Oracle Linux 7, RHEL 7, or CentOS 7, invoke pip using python3.6 -m pip instead.

Using binlog_decrypt.py

You can now test the script. Assuming you have two binary logs of which the first is not encrypted and the second is encrypted:

mysql> SHOW BINARY LOGS;
+---------------+-----------+-----------+
| Log_name      | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000001 |    722755 | No        |
| binlog.000002 |    723022 | Yes       |
+---------------+-----------+-----------+
3 rows in set (0.01 sec)

You can now use the script as:

PS:> python binlog_decrypt.py --keyring_file_data="C:\ProgramData\MySQL\MySQL Server 8.0\keyring" "C:\ProgramData\MySQL\MySQL Server 8.0\data\binlog.000001" "C:\ProgramData\My
SQL\MySQL Server 8.0\data\binlog.000002"
binlog.000001: Binary log is not encrypted. Skipping.
binlog.000002: Keyring key ID for is 'MySQLReplicationKey_59e3f95b-e0d6-11e8-94e8-ace2d35785be_1'
binlog.000005: Successfully decrypted as 'C:\tmp\plain-binlog.000005'

Notice how binlog.000001 is skipped because it is detected that the binary log is not encrypted.

This is just an example. Invoke the script with the --help argument to get a description of all of the options.

The Full Source Code

For reference, here is the full source code for the script:

import sys
import os
import struct
import collections
import hashlib
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend

def key_and_iv_from_password(password):
    # Based on
    # https://stackoverflow.com/questions/13907841/implement-openssl-aes-encryption-in-python

    key_length = 32
    iv_length = 16
    required_length = key_length + iv_length
    password = password

    key_iv = hashlib.sha512(password).digest()
    tmp = [key_iv]
    while len(tmp) < required_length:
        tmp.append(hashlib.sha512(tmp[-1] + password).digest())
        key_iv += tmp[-1]

    key = key_iv[:key_length]
    iv = key_iv[key_length:required_length]

    return key, iv


class Key(
    collections.namedtuple(
        'Key', [
            'key_id',
            'key_type',
            'user_id',
            'key_data',
        ]
    )):
    __slots__ = ()


class Keyring(object):
    _keys = []
    _keyring_file_version = None
    _xor_str = '*305=Ljt0*!@$Hnm(*-9-w;:'.encode('utf-8')

    def __init__(self, keyring_filepath):
        self.read_keyring(keyring_filepath)

    def _read_key(self, data):
        overall_length = struct.unpack('<Q', data[0:8])[0]
        key_id_length = struct.unpack('<Q', data[8:16])[0]
        key_type_length = struct.unpack('<Q', data[16:24])[0]
        user_id_length = struct.unpack('<Q', data[24:32])[0]
        key_length = struct.unpack('<Q', data[32:40])[0]

        key_id_start = 40
        key_type_start = key_id_start + key_id_length
        user_id_start = key_type_start + key_type_length
        key_start = user_id_start + user_id_length
        key_end = key_start + key_length

        key_id = data[key_id_start:key_type_start].decode('utf-8')
        key_type = data[key_type_start:user_id_start].decode('utf-8')
        # The User ID may be blank in which case the length is zero
        user_id = data[user_id_start:key_start].decode('utf-8') if user_id_length > 0 else None
        key_raw = data[key_start:key_end]
        xor_str_len = len(self._xor_str)
        key_data = bytes([key_raw[i] ^ self._xor_str[i%xor_str_len]
                          for i in range(len(key_raw))])

        return Key(key_id, key_type, user_id, key_data)

    def read_keyring(self, filepath):
        keyring_data = bytearray()
        with open(filepath, 'rb') as keyring_fs:
            chunk = keyring_fs.read()
            while len(chunk) > 0:
                keyring_data.extend(chunk)
                chunk = keyring_fs.read()

            keyring_fs.close()

        # Verify the start of the file is "Keyring file version:"
        header = keyring_data[0:21]
        if header.decode('utf-8') != 'Keyring file version:':
            raise ValueError('Invalid header in the keyring file: {0}'
                             .format(header.hex()))

        # Get the keyring version - currently only 2.0 is supported
        version = keyring_data[21:24].decode('utf-8')
        if version != '2.0':
            raise ValueError('Unsupported keyring version: {0}'
                             .format(version))

        self._keyring_file_version = version
        keyring_length = len(keyring_data)
        offset = 24
        keys = []
        while offset < keyring_length and keyring_data[offset:offset+3] != b'EOF':
            key_length = struct.unpack('<Q', keyring_data[offset:offset+8])[0]
            key_data = keyring_data[offset:offset+key_length]
            key = self._read_key(key_data)
            keys.append(key)
            offset += key_length

        self._keys = keys

    def get_key(self, key_id, user_id):
        for key in self._keys:
            if key.key_id == key_id and key.user_id == user_id:
                return key

        return None


def decrypt_binlog(binlog, keyring, out_dir, prefix):
    '''Decrypts a binary log and outputs it to out_dir with the prefix
    prepended. The arguments are:

        * binlog - the path to the encrypted binary log
        * keyring - a Keyring object
        * out_dir - the output directory
        * prefix - prefix to add to the binary log basename.
    '''
    magic_encrypted = 'fd62696e'
    magic_decrypted = 'fe62696e'

    binlog_basename = os.path.basename(binlog)
    decrypt_binlog_path = os.path.join(
        out_dir, '{0}{1}'.format(prefix, binlog_basename))
    if os.path.exists(decrypt_binlog_path):
        print("{0}: Decrypted binary log path, '{1}' already exists. Skipping"
              .format(binlog_basename, decrypt_binlog_path), file=sys.stderr)
        return False

    with open(binlog, 'rb') as binlog_fs:
        # Verify the magic bytes are correct
        magic = binlog_fs.read(4)
        if magic.hex() == magic_decrypted:
            print('{0}: Binary log is not encrypted. Skipping.'
                  .format(binlog_basename), file=sys.stderr)
            return False
        elif magic.hex() != magic_encrypted:
            print("{0}: Found invalid magic '0x{1}' for encrypted binlog file."
                  .format(binlog_basename, magic.hex(), file=sys.stderr))
            return False

        # Get the encrypted version (must currently be 1)
        version = struct.unpack('<B', binlog_fs.read(1))[0]
        if version != 1:
            print("{0}: Unsupported binary log encrypted version '{1}'"
                  .format(binlog_basename, version), file=sys.stderr)
            return False

        # First header field is a TLV: the keyring key ID
        field_type = struct.unpack('<B', binlog_fs.read(1))[0]
        if field_type != 1:
            print('{0}: Invalid field type ({1}). Keyring key ID (1) was '
                  + 'expected.'.format(binlog_basename, field_type),
                  file=sys.stderr)
            return False

        keyring_id_len = struct.unpack('<B', binlog_fs.read(1))[0]
        keyring_id = binlog_fs.read(keyring_id_len).decode('utf-8')
        print("{0}: Keyring key ID for is '{1}'"
              .format(binlog_basename, keyring_id), file=sys.stderr)

        # Get the key from the keyring file
        key = keyring.get_key(keyring_id, None)

        # Second header is a TV: the encrypted file password
        field_type = struct.unpack('<B', binlog_fs.read(1))[0]
        if field_type != 2:
            print('{0}: Invalid field type ({1}). Encrypted file password (2) '
                  + 'was expected.'.format(binlog_basename, field_type),
                  file=sys.stderr)
            return False
        encrypted_password = binlog_fs.read(32)

        # Third header field is a TV: the IV to decrypt the file password
        field_type = struct.unpack('<B', binlog_fs.read(1))[0]
        if field_type != 3:
            print('{0}: Invalid field type ({1}). IV to decrypt the file '
                  + 'password (3) was expected.'
                  .format(binlog_basename, field_type), file=sys.stderr)
            return False
        iv = binlog_fs.read(16)

        backend = default_backend()
        cipher = Cipher(algorithms.AES(key.key_data), modes.CBC(iv),
                        backend=backend)
        decryptor = cipher.decryptor()
        password = decryptor.update(encrypted_password) + decryptor.finalize()

        # Generate the file key and IV
        key, iv = key_and_iv_from_password(password)
        nonce = iv[0:8] + bytes(8)
     
        # Decrypt the file data (the binary log content)
        # The encrypted binary log headers are 512, so skip those
        binlog_fs.seek(512, os.SEEK_SET)
        binlog_encrypted_data = binlog_fs.read()
        binlog_fs.close()

    cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=backend)
    decryptor = cipher.decryptor()
    binlog_decrypted_data = decryptor.update(binlog_encrypted_data)
    binlog_decrypted_data += decryptor.finalize()
    binlog_encrypted_data = None

    # Check decrypted binary log magic
    magic = binlog_decrypted_data[0:4]
    if magic.hex() != magic_decrypted:
        print("{0}: Found invalid magic '0x{1}' for decrypted binlog file."
              .format(binlog_basename, magic.hex()), file=sys.stderr)
        return False

    # Write the decrypted binary log to disk
    with open(decrypt_binlog_path, 'wb') as new_fs:
        new_fs.write(binlog_decrypted_data)
        new_fs.close()

    print("{0}: Successfully decrypted as '{1}'"
          .format(binlog_basename, decrypt_binlog_path))
    return True

def decrypt_binlogs(args):
    '''Outer routine for decrypted one or more binary logs. The
    argument args is a named touple (typically from the argparse
    parser) with the following members:

       * args.binlogs - a list or tuple of the binary logs to decrypt
       * args.keyring_file_data - the path to the file with the
            kerying data for the keyring_file plugin.
       * args.dir - the output directory for the decrypted binary logs
       * args.prefix - the prefix to prepend to the basename of the
            encrypted binary log filenames. This allows you to output
            the decrypted to the same directory as the encrypted
            binary logs without overwriting the original files.
    '''
    keyring = Keyring(args.keyring_file_data)
    for binlog in args.binlogs:
        decrypt_binlog(binlog, keyring, args.dir, args.prefix)

def main(argv):
    import argparse

    parser = argparse.ArgumentParser(
        prog='decrypt_binlog.py',
        description='Decrypt one or more binary log files from MySQL Server '
                   +'8.0.14+ created with binlog_encryption = ON. The '
                   +'binary log files have the prefix given with --prefix '
                   +'prepended to their file names.'
                   +'If an output file already exists, the file will be '
                   +'skipped.',
        epilog='All work is performed in-memory. For this reason, the'
               +'expected peak memory usage is around three times the'
               +'size of the largest binary log. As max_binlog_size can'
               +'at most be 1G, for instances exlusively executing small'
               +'transactions, the memory usage can thus be up to around'
               +'3.5G. For instances executing large transactions, the'
               +'binary log files can be much larger than 1G and thus the'
               +'memory usage equally larger.')

    parser.add_argument('-d', '--dir', default=os.getcwd(),
        dest='dir',
        help='The destination directory for the decrypted binary log files. '
             +'The default is to use the current directory.')

    parser.add_argument('-p', '--prefix', default='plain-',
        dest='prefix',
        help='The prefix to prepand to the basename of the binary log file.'
             +'The default is plain-.')

    parser.add_argument('-k', '--keyring_file_data', default=None,
        dest='keyring_file_data',
        help='The path to the keyring file. The same as keyring_file_data in '
             +'the MySQL configuration. This option is mandatory.')

    parser.add_argument('binlogs', nargs=argparse.REMAINDER,
                        help='The binary log files to decrypt.')

    args = parser.parse_args()
    if not args.binlogs:
        print('ERROR: At least one binary log file must be specified.\n',
              file=sys.stderr)
        parser.print_help(file=sys.stderr)
        sys.exit(1)

    if not args.keyring_file_data:
        print('ERROR: The path to the keyring file must be specified.\n',
              file=sys.stderr)
        parser.print_help(file=sys.stderr)
        sys.exit(1)

    decrypt_binlogs(args)


if __name__ == '__main__':
   main(sys.argv[1:])

The start of the script is the handling of the keyring. Then follows the code for decrypting the binary logs which has a total of three functions (from bottom and up):

  • main: The function for handling the command line arguments.
  • decrypt_binlogs: Initializes the keyring and loops over the binary logs.
  • decrypt_binlog: Decrypts a single binary log.

For a closer discussion of the individual steps to decrypt the binary log, I recommend you to read João Gramacho’s blog How to manually decrypt an encrypted binary log file.

MySQL Server 8.0.14: Thanks for the Contributions

MySQL 8.0.14 was released earlier in the week, and again we (Oracle) received several contributions that either made it into the release or at least inspired a feature or bug fix. This blog will go through the the changelog notes for these changes. Thank you for the contributions.

Two of the contributions are new features with patches submitted by Facebook, one by Daniel Black, and one of Facebook’s patches from 8.0.13 has been updated:

  • A new system variable, log_slow_extra, if enabled, causes the server to write additional fields to slow query log lines that provide information about slow statements. In addition, SET lines written to the log to indicate statement timestamps now use the time from the beginning of statement execution, rather than the time at the end of execution. See The Slow Query Log. Thanks to Facebook for the contribution on which this feature is based. (Bug #27535580, Bug #89637)
  • InnoDB: To reduce the size of core files, the innodb_buffer_pool_in_core_file variable can be disabled to prevent InnoDB buffer pool pages from being written to core files. Thanks to Facebook for the contribution. (Bug #27724476, Bug #90144)
  • Previously, for command options that take a numeric value, the value could be given with a suffix of K, M, or G to indicate a multiplier of 1024, 1024^2 or 1024^3. Now a suffix can also be T, P, and E to indicate a multiplier of 1024^4, 1024^5 or 1024^6. Thanks to Daniel Black for the patch. (Bug #27306931, Bug #89017)
  • The code contributed by Facebook for the feature implemented by Bug#27855592 was updated. (Bug #28950397)

While it is great with new features, bug fixes are equally important. There are three bug fix contributions in 8.0.14:

  • Some typos in server source code were fixed. Thanks to Hyunwoo Park for the contribution. (Bug #26189673, Bug #86565)
  • On Ubuntu, the installed /etc/mysql/mysql.conf.d/default-auth-override.cnf file was mistakenly created with executable mode. Thanks to Evgeniy Patlan (from Percona) for the correction contribution. (Bug #28714840, Bug #92587)
  • Comparing log file names as strings using the memcmp() function resulted in uninitialized memory read errors. The comparison now uses the strncmp() function. Thanks to Zsolt Parragi and Laurynas Biveinis (both from Percona) for their contributions. (Bug #28178776, Bug #90238)

Sometimes it happens there is a contribution for a good idea, but in the end – for one reason or another – a different patch is implemented. I would still like to say thanks. I am aware of two such cases in MySQL 8.0.14:

  • Thanks to Facebook for suggesting the idea of a separate admin port – the new admin_address and admin_port options. (Bug #27847672, Bug #90395)
  • Thanks to Zhenghu Wen for reporting and submitting a patch for the bug that a member might not be able to move from the RECOVERING to the ONLINE state, when it joins a group. (Bug #89582, Bug #27511404)

Thanks to all involved. Keep up the good work. Even if your patch was not used, don’t get discouraged.

MySQL Connector/Python 8.0.14 X DevAPI: Default Schema

The MySQL X DevAPI is the new API that provides a uniform API across the supported programming languages. It has from the beginning supported that you can specify a default schema, when you connect. Originally it was not used for SQL statements. Starting with MySQL 8.0.14 the feature has been extended, so SQL statements take the default schema into consideration as well. This blog will explore how this works using MySQL Connector/Python. If you use a different programming language, the change will work in a similar way.

In order to explore the feature, a sample program is needed. A simple program that prints the MySQL Connector/Python version, queries the city table in the default schema, and either catches the exception or prints the resulting rows will work:

import mysql.connector
import mysqlx
from mysqlx.errors import OperationalError
import pprint

printer = pprint.PrettyPrinter(indent=3)

connect_args = {
    "host": "127.0.0.1",
    "port": 33060,
    "user": "pyuser",
    "password": "Py@pp4Demo",
    "schema": "world",
};

print("MySQL Connector/Python {0}".format(mysql.connector.__version__))

db = mysqlx.get_session(connect_args)
try:
    result = db.sql("SELECT * FROM city WHERE ID = 130").execute()
except OperationalError as err:
    print("Error: {0}".format(err))
    result = None
else:
    columns = [column.column_name for column in result.get_columns()]
    for row in result.fetch_all():
        row_data = dict(zip(columns, row))
        printer.pprint(row_data)

db.close()

Notice how the default schema is set in line 13.

If this program is executed with MySQL Connector/Python 8.0.13 or earlier, the query in line 20 will cause an OperationalError exception:

MySQL Connector/Python version: 8.0.13
Error: No database selected

However, if you execute the code with MySQL Connector/Python 8.0.14, the query executes and queries the city table in the world schema:

MySQL Connector/Python 8.0.14
{  'CountryCode': 'AUS',
    'District': 'New South Wales',
    'ID': 130,
    'Name': 'Sydney',
    'Population': 3276207}

This new behaviour is the same as you would be used to from the traditional API.

Tip

If you want to use the default schema with CRUD statements, you can retrieve the default schema with Session.get_default_schema(). That also works in earlier releases of MySQL Connector/Python.

If you want to read more about using the X DevAPI and MySQL Connector/Python in general, then I have written the book MySQL Connector/Python Revealed (see links in the sidebar to the right).

MySQL 8: Drop Several Stored Events, Procedures, or Functions

Maybe the biggest new feature in MySQL 8 is the new transaction data dictionary that improves the consistency of schema objects among other things. To further protect the data in the data dictionary, the data dictionary tables are hidden and their content only exposed through the Information Schema. (One exception is when you use the debug binary, then it is possible to get direct access to the data dictionary tables. This is not recommended at all on production systems!)

A side effect of the data dictionary tables being hidden is that those that have had a habit of manipulating the tables directly in MySQL 5.7 and earlier (I will not recommend doing that) will no longer be able to do so. Examples of manipulating the tables include dropping several stored events, procedures, and/or functions by deleting the corresponding rows in the mysql.event and mysql.proc tables.

Tip

I will use the word object to cover either an event, procedure, or function.

There are some discussions on Stack Overflow on how to do this task. See for example How to delete multiple events in mysql 8.0 database ? it was working in 5.7 but not in 8.0? and MySQL 8 – Remove All Stored Procedures in a Specific Database.

The obvious thought may be to write a stored procedure that can delete the objects, however that requires using a prepared statement which is not supported for DROP EVENT and similar:

mysql> SET @sql = 'DROP EVENT db1.event1';
Query OK, 0 rows affected (0.00 sec)

mysql> PREPARE stmt_drop FROM @sql;
ERROR 1295 (HY000): This command is not supported in the prepared statement protocol yet

There is a feature request to implement this in Bug 42337.

So what to do instead? My suggestion is to use MySQL Shell. There you can generate the queries and use session.sql() to execute them without using a prepared statement. However, first some objects are needed.

Deleting events, functions, and procedures in MySQL Shell
Deleting events, functions, and procedures in MySQL Shell

Setup

Before we start looking at the options to drop several events, procedures, or functions, lets create some objects to work with. For the examples there are three events, three procedures, and three functions that can be created as follows:

mysql> CREATE SCHEMA db1;
Query OK, 1 row affected (0.01 sec)

mysql> use db1;
Database changed
mysql> CREATE EVENT event1 ON SCHEDULE EVERY 1 DAY STARTS NOW() DO DO SLEEP(1);
Query OK, 0 rows affected (0.04 sec)

mysql> CREATE EVENT event2 ON SCHEDULE EVERY 1 DAY STARTS NOW() DO DO SLEEP(1);
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE EVENT `event``3` ON SCHEDULE EVERY 1 DAY STARTS NOW() DO DO SLEEP(1);
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE PROCEDURE proc1() DO SLEEP(1);
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE PROCEDURE proc2() DO SLEEP(1);
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE PROCEDURE `proc``3`() DO SLEEP(1);
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE FUNCTION func1() RETURNS int DETERMINISTIC RETURN 1;
Query OK, 0 rows affected (0.40 sec)

mysql> CREATE FUNCTION func2() RETURNS int DETERMINISTIC RETURN 1;
Query OK, 0 rows affected (0.08 sec)

mysql> CREATE FUNCTION `func``3`() RETURNS int DETERMINISTIC RETURN 1;
Query OK, 0 rows affected (0.30 sec)

mysql> SELECT EVENT_SCHEMA, EVENT_NAME
         FROM information_schema.EVENTS
        WHERE EVENT_SCHEMA = 'db1';
+--------------+------------+
| EVENT_SCHEMA | EVENT_NAME |
+--------------+------------+
| db1          | event1     |
| db1          | event2     |
| db1          | event`3    |
+--------------+------------+
3 rows in set (0.01 sec)

mysql> SELECT ROUTINE_SCHEMA, ROUTINE_NAME, ROUTINE_TYPE
         FROM information_schema.ROUTINES
        WHERE ROUTINE_SCHEMA = 'db1';
+----------------+--------------+--------------+
| ROUTINE_SCHEMA | ROUTINE_NAME | ROUTINE_TYPE |
+----------------+--------------+--------------+
| db1            | func1        | FUNCTION     |
| db1            | func2        | FUNCTION     |
| db1            | func`3       | FUNCTION     |
| db1            | proc1        | PROCEDURE    |
| db1            | proc2        | PROCEDURE    |
| db1            | proc`3       | PROCEDURE    |
+----------------+--------------+--------------+
6 rows in set (0.01 sec)

This also shows how the events and routines can be found from the Information Schema.

The third event, procedure, and function are named with a backtick before the digit. This is to ensure our delete method quotes the identifier names correctly. If you are deleting events and routines from a production system, you are encouraged to test carefully that you quote the identifiers correctly.

Now on to delete the objects.

MySQL Shell

MySQL Shell is one of the great recent additions to MySQL. It supports performing DBA and devops tasks using SQL, Python, and JavaScript. Having a full featured programing language available allows you to automate tasks that otherwise would have been cumbersome or impossible.

The Python and JavaScript modes support executing queries either using NoSQL CRUD methods or SQL statements. So, now the task of deleting events, procedures, or functions becomes as simple as executing a query against the Information Schema to find the names of the objects and then generate the queries in a loop. The following will use Python, but it can also be written in JavaScript if you prefer.

I will first go through dropping events, then functions and procedures.

Events – Interactive

It is possible to get the names using the select() CRUD method on the information_schema.EVENTS table:

mysql-py> i_s = session.get_schema("information_schema")

mysql-py> events = i_s.EVENTS \
       ->             .select("sys.quote_identifier(EVENT_SCHEMA) AS EventSchema", "sys.quote_identifier(EVENT_NAME) AS EventName") \
       ->             .where("EVENT_SCHEMA = 'db1'").execute().fetch_all();

mysql-py> events
[
    [
        "`db1`",
        "`event1`"
    ], 
    [
        "`db1`",
        "`event2`"
    ], 
    [
        "`db1`",
        "`event``3`"
    ]
]

First an object of the information_schema is stored in the i_s variable. This allows you to access the Information Schema EVENTS table using i_s.EVENTS. From here the select() CRUD method is used to find all events in the db1 schema.

Tip

If you have a different criteria for which events to delete, edit the argument to the where() method.

The sys.quote_identifier() function is used to ensure that the schema and event names are quoted correctly, so for example the backtick in event`3 is escaped correctly (by adding a second backtick).

Important

The sys.quote_identifier() only works guaranteed to work correctly when the ANSI_QUOTES SQL mode is not enabled. When ANSI quotes are used, double quotes (“) are using to quote identifiers though backticks are still supported.

Now events is a list of the three events that were created earlier. You can easily check which events where found by just “executing” events. With that, you can delete the events one by one in a loop. For this a query template is first defined, then used to add the name of the schema and event. Usually it is not safe to do that using the Python format() function, however here we have ensured the names are quotes appropriately by using sys.quote_identifier().

mysql-py> sql_fmt = "DROP EVENT {0}.{1}"
mysql-py> for event in events:
       ->     print(sql_fmt.format(*event))
       ->     session.sql(sql_fmt.format(*event)).execute()
       ->
DROP EVENT `db1`.`event1`
DROP EVENT `db1`.`event2`
DROP EVENT `db1`.`event``3`
Query OK, 0 rows affected (0.0038 sec)

That’s it. Not exactly a one liner, but not hard either.

This is fine if it is a one off that you need to delete some events, but what if this is a more regular task? MySQL Shell can accommodate for that as well.

Events – Function

I have previously discussed how you can use external libraries in MySQL Shell and even shown an example of using it to have auto-repeating reports. It can also be used to implement a function where you give the schema you want to delete events from as an argument.

Let’s look at how the code looks for that:

from mysqlsh import globals

def events_by_schema(schema, verbose=True):
    """Drops all events in the schema passed as an argument.

    Requires a connection to exist to the MySQL instance where the
    events are to be dropped.

    Optionally specify whether to be verbose (enabled by default) by
    printing each DROP EVENT statement executed.

    Returns the number of events dropped and False if an issue is
    encountered so no attempts are made to drop the events.
    """

    # Get the session and ensure it is connected
    if not globals.session.is_open():
        print("Please create a connection first.")
        return False

    # Define the query to get the events
    i_s = globals.session.get_schema("information_schema")
    stmt = i_s.EVENTS.select(
        "sys.quote_identifier(EVENT_SCHEMA) AS EventSchema",
        "sys.quote_identifier(EVENT_NAME) AS EventName")
    stmt = stmt.where("EVENT_SCHEMA = :schema")

    # Execute the query and check for warnings
    result = stmt.bind("schema", schema).execute()
    events = result.fetch_all()
    if (result.get_warnings_count() > 0):
        # Bail out and print the warnings
        print("Warnings occurred - bailing out:")
        print(result.get_warnings())
        return False

    # Drop the events and check for warnings after each event
    sql_fmt = "DROP EVENT {0}.{1}"
    for event in events:
        sql = sql_fmt.format(*event)
        if verbose:
            print(sql)

        drop_result = globals.session.sql(sql).execute()
        if (drop_result.get_warnings_count() > 0):
            print("Warnings occurred:")
            print(result.get_warnings())

    return len(events)

Yes, this code is a lot longer than the previous example, but in reality it is the same with some extra checks added and obtaining the session (the connection to the MySQL instance) from the mysqlsh.globals module. Since the schema is now provided as an argument to the function, a placeholder and bind() are used to ensure it is safe to use the value in a query.

If you save the code in a file in the Python search path, you can import it into MySQL Shell. Recall from my earlier blog that you can include extra directories in your search path by updating mysqlshrc.py in one of the places where MySQL Shell looks for it; for example in ${HOME}/.mysqlsh on Linux or %APPDATA%\MySQL\mysqlsh on Windows. An example of the mysqlshrc.py file is:

import sys
sys.path.append("D:\MySQL\Shell\Python")

In this example, the code has been saved in the file delete_objects.py, so you can delete all events in the db1 schema like the following example:

mysql-py> import delete_objects
mysql-py> num_delete_events = delete_objects.events_by_schema('db1')
DROP EVENT `db1`.`event1`
DROP EVENT `db1`.`event2`
DROP EVENT `db1`.`event``3`

That completes the example for events. Now let’s turn to functions and procedures.

Procedures and Functions – Interactive

The example of deleting functions and/or procedures is very similar to what have just been seen for events, other than using the information_schema.ROUTINES table instead of information_schema.EVENTS to get the routine names and that your need to set a filter on ROUTINE_TYPE to specify whether you want procedures or functions (or both if the filter is absent). So, I will jump straight into the example by first dropping all functions, then all procedures in the db1 schema:

mysql-py> i_s = session.get_schema("information_schema")

mysql-py> functions = i_s.ROUTINES \
       ->                .select("sys.quote_identifier(ROUTINE_SCHEMA) AS RoutineSchema", "sys.quote_identifier(ROUTINE_NAME) AS RoutineName") \
       ->                .where("ROUTINE_SCHEMA = 'db1' AND ROUTINE_TYPE = 'FUNCTION'").execute().fetch_all()

mysql-py> functions
[
    [
        "`db1`",
        "`func1`"
    ], 
    [
        "`db1`",
        "`func2`"
    ], 
    [
        "`db1`",
        "`func``3`"
    ]
]

mysql-py> for function in functions:
       ->     print(sql_fmt.format(*function))
       ->     session.sql(sql_fmt.format(*function)).execute()
       ->
DROP FUNCTION `db1`.`func1`
DROP FUNCTION `db1`.`func2`
DROP FUNCTION `db1`.`func``3`
Query OK, 0 rows affected (0.0684 sec)

Similar for the procedures:

mysql-py> i_s = session.get_schema("information_schema")

mysql-py> procedures = i_s.ROUTINES \
       ->             .select("sys.quote_identifier(ROUTINE_SCHEMA) AS RoutineSchema", "sys.quote_identifier(ROUTINE_NAME) AS RoutineName") \
       ->             .where("ROUTINE_SCHEMA = 'db1' AND ROUTINE_TYPE = 'PROCEDURE'").execute().fetch_all()

mysql-py> procedures
[
    [
        "`db1`",
        "`proc1`"
    ], 
    [
        "`db1`",
        "`proc2`"
    ], 
    [
        "`db1`",
        "`proc``3`"
    ]
]

mysql-py> sql_fmt = "DROP PROCEDURE {0}.{1}"

mysql-py> for procedure in procedures:
       ->     print(sql_fmt.format(*procedure))
       ->     session.sql(sql_fmt.format(*procedure)).execute()
DROP PROCEDURE `db1`.`proc1`
DROP PROCEDURE `db1`.`proc2`
DROP PROCEDURE `db1`.`proc``3`
Query OK, 0 rows affected (0.0976 sec)

Again, it is possible to create a function, so the code can be re-used.

Functions and Procedures – Function

Since the code for deleting functions and procedures is so similar – it is just the filter and DROP FUNCTION versus DROP PROCEDURE that is the difference – it is simple to use the same function to delete either. In fact, it would not take much to combine it with events_by_schema() from above, but that will be left as an exercise for the reader (there is a hint in the next section using MySQL Connector/Python).

The code for the routines_by_schema() function is:

from mysqlsh import globals

def routines_by_schema(schema, routine_type=None, verbose=True):
    """Drops all routines of a given type in the schema passed as an
    argument. If no type is given, both functions and procedures are
    dropped. The default is to drop both.

    Requires a connection to exist to the MySQL instance where the
    routines are to be dropped.

    Optionally specify whether to be verbose (enabled by default) by
    printing each DROP FUNCTION|PROCEDURE statement executed.

    Returns the number of routines dropped and False if an issue is
    encountered so no attempts are made to drop the routines.
    """

    # Get the session and ensure it is connected
    if not globals.session.is_open():
        print("Please create a connection first.")
        return False

    # Define the query to get the routines
    i_s = globals.session.get_schema("information_schema")
    filters = ["ROUTINE_SCHEMA = :schema"]
    if routine_type is not None:
        filters.append("ROUTINE_TYPE = :type")
    stmt = i_s.ROUTINES.select(
        "sys.quote_identifier(ROUTINE_SCHEMA) AS RoutineSchema",
        "sys.quote_identifier(ROUTINE_NAME) AS RoutineName",
        "ROUTINE_TYPE")
    stmt = stmt.where(" AND ".join(filters))

    # Execute the query and check for warnings
    stmt = stmt.bind("schema", schema)
    if routine_type is not None:
        stmt = stmt.bind("type", routine_type)
    result = stmt.execute()
    routines = result.fetch_all()
    routines = result.fetch_all()
    if (result.get_warnings_count() > 0):
        # Bail out and print the warnings
        print("Warnings occurred - bailing out:")
        print(result.get_warnings())
        return False

    # Drop the routines and check for warnings after each routine
    sql_fmt = "DROP {2} {0}.{1}"
    for routine in routines:
        sql = sql_fmt.format(*routine)
        if verbose:
            print(sql)

        drop_result = globals.session.sql(sql).execute()

        if (drop_result.get_warnings_count() > 0):
            print("Warnings occurred:")
            print(result.get_warnings())

    return len(routines)

The function takes an extra argument compared to events_by_schema(), so it is possible to either delete both functions and procedures (the default) or just one type. The use of the function is also similar to what have been seen before:

mysql-py> num_delete_functions = delete_objects.routines_by_schema('db1', 'FUNCTION')
DROP FUNCTION `db1`.`func1`
DROP FUNCTION `db1`.`func2`
DROP FUNCTION `db1`.`func``3`

mysql-py> num_delete_procedure = delete_objects.routines_by_schema('db1', 'PROCEDURE')
DROP PROCEDURE `db1`.`proc1`
DROP PROCEDURE `db1`.`proc2`
DROP PROCEDURE `db1`.`proc``3`

# Restore the functions and procedures
mysql-py> num_delete_routines = delete_objects.routines_by_schema('db1')
DROP FUNCTION `db1`.`func1`
DROP FUNCTION `db1`.`func2`
DROP FUNCTION `db1`.`func``3`
DROP PROCEDURE `db1`.`proc1`
DROP PROCEDURE `db1`.`proc2`
DROP PROCEDURE `db1`.`proc``3`

The last thing is to look at how the same actions can be done with MySQL Connector/Python.

MySQL Connector/Python

Given how similar using Python in MySQL Shell is with MySQL Connector/Python, it is natural to make the functions available as a command-line tool or function for Connector/Python programs. The following shows an example of how that can be done – combining events_by_schema() and routines_by_schema() into one function. The source code is:

import mysqlx

def objects_by_schema(session, schema, object_type=None, verbose=True):
    """Drops all events, functions, and/or procedures in the schema
    passed as an argument.

    Requires a connection to exist to the MySQL instance where the
    events are to be dropped.

    The object_type can be None (drop all events, functions, and
    procedures - the default), "EVENT", "FUNCTION", or "PROCEDURE".

    Optionally specify whether to be verbose (enabled by default) by
    printing each DROP statement executed.

    Returns the number of events, functions, and procedures dropped
    as a dictionary or False if an issue is encountered so no attempts
    are made to drop the events.
    """

    # Get the session and ensure it is connected
    if not session.is_open():
        print("Please create a connection first.")
        return False

    # Get an object for the Information Schema
    # and whether ANSI quotes are used
    i_s = session.get_schema("information_schema")
    sql_mode = session.sql("SELECT @@session.sql_mode AS sql_mode").execute() \
                      .fetch_one()["sql_mode"]
    sql_modes = sql_mode.split(",")
    ansi_quotes = "ANSI_QUOTES" if "ANSI_QUOTES" in sql_modes else ""

    events = []
    # If dropping events, get a list of all events
    if object_type is None or object_type == "EVENT":
        events_table = i_s.get_table("EVENTS")
        stmt = events_table.select(
            "EVENT_SCHEMA AS ObjectSchema", "EVENT_NAME AS ObjectName",
            "'EVENT' AS ObjectType")
        stmt = stmt.where("EVENT_SCHEMA = :schema")

        # Execute the query and check for warnings
        result = stmt.bind("schema", schema).execute()
        events = result.fetch_all()
        if (result.get_warnings_count() > 0):
            # Bail out and print the warnings
            print("Warnings occurred - bailing out:")
            print(result.get_warnings())
            return False

    routines = []
    if object_type is None or object_type in ("FUNCTION", "PROCEDURE"):
        routines_table = i_s.get_table("ROUTINES")
        filters = ["ROUTINE_SCHEMA = :schema"]
        if object_type is not None:
            filters.append("ROUTINE_TYPE = :type")
        stmt = routines_table.select(
            "ROUTINE_SCHEMA AS ObjectSchema", "ROUTINE_NAME AS ObjectName",
            "ROUTINE_TYPE AS ObjectType")
        stmt = stmt.where(" AND ".join(filters))

        # Execute the query and check for warnings
        stmt = stmt.bind("schema", schema)
        if object_type is not None:
            stmt = stmt.bind("type", object_type)
        result = stmt.execute()
        routines = result.fetch_all()
        if (result.get_warnings_count() > 0):
            # Bail out and print the warnings
            print("Warnings occurred - bailing out:")
            print(result.get_warnings())
            return False

    # Drop the routines and check for warnings after each routine
    objects = events + routines
    sql_fmt = "DROP {0} {1}.{2}"
    objects_deleted = {
        "EVENT": 0,
        "FUNCTION": 0,
        "PROCEDURE": 0,
    }
    for obj in objects:
        objects_deleted[obj[2]] += 1
        sql = sql_fmt.format(obj[2],
                  mysqlx.helpers.quote_identifier(obj[0], ansi_quotes),
                  mysqlx.helpers.quote_identifier(obj[1], ansi_quotes))
        if verbose:
            print(sql)

        drop_result = session.sql(sql).execute()

        if (drop_result.get_warnings_count() > 0):
            print("Warnings occurred:")
            print(result.get_warnings())

    return objects_deleted

if __name__ == "__main__":
    import sys
    import argparse
    import getpass

    # Handle the command line arguments
    parser = argparse.ArgumentParser(
        prog="delete_objects.py",
        description="Delete all events, functions, and/or procedures from a "
                   + "single schema.")

    parser.add_argument("--type", default="ALL",
        choices=["ALL", "EVENT", "FUNCTION", "PROCEDURE"],
        help="The object type to drop.")
    parser.add_argument("--schema", default=None,
        help="The schema to drop objects from.")
    parser.add_argument("--verbose", type=int, default=1,
        help="Whether to produce verbose output.")
    parser.add_argument("--user", default=None,
        help="The user to connect to MySQL as.")
    parser.add_argument("--host", default="127.0.0.1",
        help="The host to connect to MySQL on.")
    parser.add_argument("--port", type=int, default=33060,
        help="The mysqlx port to connect to MySQL on.")

    # Parse the arguments and get the password used when connecting to MySQL
    args = parser.parse_args(sys.argv[1:])
    object_type = args.type
    if args.type == "ALL":
        object_type = None
    verbose = True if args.verbose > 0 else False
    password = getpass.getpass("MySQL Password: ")

    # Connect to MySQL
    session = mysqlx.get_session(user=args.user, password=password,
                                 host=args.host, port=args.port)

    # Drop the objects
    objects_deleted = objects_by_schema(session, args.schema, object_type,
                                        verbose)
    if verbose:
        print(objects_deleted)

While the example is pretty extensive, it is not quite complete. There is only support for TCP connections using the user, password, host, and port options, and the password is always provided interactively (the most secure). Support for a configuration file and other options can be added if needed.

Instead of using the sys.quote_identifier() function, the schema and object names are quoted using the mysqlx.helpers.quote_identifier() function. Otherwise the program should be self explanatory given the previous examples. You can use python delete_objects.py --help to get help with the supported arguments.

MySQL X DevAPI Connection Pool with Connector/Python

If you have an application that need to use multiple connections to the MySQL database for short periods of times, it can be a good to use a connection pool to avoid creating a new connection and going through the whole authentication process every time a connection is needed. For the Python Database API (PEP249), MySQL Connector/Python has had support for connection pools for a long time. With the release of MySQL 8.0.13, the new X DevAPI also has support for connection pools.

MySQL Connector/Python X DevAPI connection pool code snippet.

This blog will first cover the background of the X DevAPI connection pool feature in MySQL Connector/Python. Then provide an example.

Background

You create a connection pool using the mysqlx.get_client() function. You may wonder why you are creating a client and not a pool? As will be shown later, there is a little more to this feature than just a connection pool. So, it makes sense to use a more generic term.

The get_client() function takes two arguments: The connection options and the client options. The connection options are the usual arguments defining which MySQL instance to connect to, authentication related options, how to connect, etc. The client options are the interesting ones in the discussion of a connection pool.

The client options is a dictionary or a JSON document written as a string. Currently, the only supported client options are the ones defining the connection pool. These are specified under the pooling field (and example will be provided shortly). This leaves room for the possibility to expand get_client() later with other features than a connection pool.

There are currently four connection pool options:

  • enabled: Whether the connection pool is enabled. The default is True.
  • max_size: The maximum number of connections that can be in the pool. The default is 25.
  • max_idle_time: How long time in milliseconds a connection can be idle before it is closed. The default is 0 which means “infinite” (in practice 2147483000 milliseconds).
  • queue_timeout: The maximum amount of time in milliseconds that an attempt to get a connection from the pool will block. If no connections have become available before the timeout, a mysqlx.errors.PoolError exception is raised. The default is 0 which means “infinite” (in practice 2147483000 milliseconds).

What happens if you disable the connection pool? In that case the client that is returned simply work as a template for connections and you can keep creating connections until MySQL Server runs out of connections. In that case, the session you end up with is a regular connection, so it when you close it, it will disconnect to MySQL.

Back to the case where the connection pool is enabled. Once you have the client object, you can start using the pool. You retrieve a connection from the pool with the get_session() method. No arguments are used. After this you can use the session just as a regular standalone connection. The only difference is that when you close the session, it is returned to the pool rather than disconnected.

Enough background. Let’s see an example.

Example

The following example creates a connection pool with at most two connections. Then two sessions are fetched from the pool and their connection IDs are printed. A third session will be requested before one of the original sessions is returned to the pool. Finally, a session is reused and its connection ID is printed.

import mysqlx
from datetime import datetime

cnxid_sql = "SELECT CONNECTION_ID() AS ConnectionID"
fmt_id = "Connection {0} ID ..........................: {1}"

connect_args = {
    "host": "127.0.0.1",
    "port": 33060,
    "user": "pyuser",
    "password": "Py@pp4Demo",
};

client_options = {
    "pooling": {
        "enabled": True,
        "max_size": 2,
        "max_idle_time": 60000,
        "queue_timeout": 3000,
    }
}

# Create the connection pool
pool = mysqlx.get_client(connect_args, client_options)

# Fetch two connections (exhausting the pool)
# and get the connection ID for each
connection1 = pool.get_session()
id1_row = connection1.sql(cnxid_sql).execute().fetch_one()
print(fmt_id.format(1, id1_row["ConnectionID"]))
connection2 = pool.get_session()
id2_row = connection2.sql(cnxid_sql).execute().fetch_one()
print(fmt_id.format(2, id2_row["ConnectionID"]))

# Attempt to get a third connection
time = datetime.now().strftime('%H:%M:%S')
print("Starting to request connection 3 .........: {0}".format(time))
try:
    connection3 = pool.get_session()
except mysqlx.errors.PoolError as err:
    print("Unable to fetch connection 3 .............: {0}".format(err))
time = datetime.now().strftime('%H:%M:%S')
print("Request for connection 3 completed .......: {0}".format(time))

# Return connection 1 to the pool
connection1.close()

# Try to get connection 3 again
connection3 = pool.get_session()
id3_row = connection3.sql(cnxid_sql).execute().fetch_one()
print(fmt_id.format(3, id3_row["ConnectionID"]))

# Close all connetions
pool.close()

The first thing to notice is the client options defined in lines 14-21. In this case all four options are set, but you only need to set those where you do not want the default value. The settings allow for at most two connections in the pool, when requesting a session it is allowed to take at most 3 seconds, and idle sessions should be disconnected after 60 seconds.

In line 24 the connection pool (client) is created and subsequent two sessions are fetched from the pool. When a third session is requested, it will trigger a PoolError exception as the pool is exhausted. Lines 38-41 shows how to handle the exception.

Finally the first connection is returned to the pool and it is possible to get the third request to complete.

An example of the output is (the connection IDs and timestamps will differ from execution to execution):

Connection 1 ID ..........................: 239
Connection 2 ID ..........................: 240
Starting to request connection 3 .........: 18:23:14
Unable to fetch connection 3 .............: pool max size has been reached
Request for connection 3 completed .......: 18:23:44
Connection 3 ID ..........................: 241

From the output you can see that the first attempt to fetch connection 3 takes three seconds before it times out and raises the exception – just as specified by the queue_timeout setting.

What may surprise you (at least if you have studied Chapter 5 from MySQL Connector/Python Revealed) from this output is that once connection 1 has been returned to the pool and connection 3 fetches the session again, it has a new connection ID. Does that mean the pool is not working? No, the pool is working alright. However, the X Plugin (the plugin in MySQL Server handling connections using the X Protocol) works differently than the connection handling for the traditional MySQL protocol.

The X Plugin distinguishes between the connection to the application and the thread inside MySQL. So, when the session is returned to the pool and the session is reset (to set the session variables back to the defaults and remove user variables) the thread inside MySQL is removed. As MySQL uses threads, it is cheap to create a new thread as it is needed, so this is not a performance problem. However, the connection to the application is maintained. This means you safe the expensive steps of creating the connection and authenticating, while the threads only actually exists inside MySQL while it is out of the pool.

If you are interested in learning more about MySQL Connector/Python 8 including how to use the X DevAPI, then I am the author of MySQL Connector/Python Revealed (Apress). It is available from Apress, Amazon, and other book stores.

Tracking Foreign Keys

The other day, I was reading a blog by Magnus Hagander about tracking foreign keys throughout a schema in PostgreSQL. I thought it was a good idea, so I decided to look at how you can track foreign key in MySQL.

MySQL Workbench EER Diagram

The way I decided to do it was to start out with a table, then find all tables referencing the table by a foreign key. From this basic (and simple as it will be shown) query, it is possible to create a chain of relations. The key table for the queries is information_schema.KEY_COLUMN_USAGE which has information about all the foreign key relationships.

However, before getting that far, it is necessary to have some tables to work with.

Sample Schema

A small schema, but with relatively complex foreign keys relations, can be created with the following statements:

DROP SCHEMA IF EXISTS db1;
CREATE SCHEMA db1;

CREATE TABLE db1.a (
   a_id int,
   PRIMARY KEY (a_id)
);

CREATE TABLE db1.b (
   b_id1 int,
   b_id2 int,
   a_id int,
   PRIMARY KEY (b_id1, b_id2),
   FOREIGN KEY (a_id) REFERENCES a (a_id)
);

CREATE TABLE db1.c (
   c_id int,
   b_id1 int,
   b_id2 int,
   PRIMARY KEY (c_id),
   FOREIGN KEY (b_id1, b_id2) REFERENCES b (b_id1, b_id2)
);

CREATE TABLE db1.d (
   d_id int,
   a_id int,
   c_id int,
   PRIMARY KEY (d_id),
   FOREIGN KEY (a_id) REFERENCES a (a_id),
   FOREIGN KEY (c_id) REFERENCES c (c_id)
);

CREATE TABLE db1.e (
   e_id int,
   a_id int,
   PRIMARY KEY (e_id),
   FOREIGN KEY (a_id) REFERENCES a (a_id)
);

CREATE TABLE db1.f (
   f_id int,
   b_id1 int,
   b_id2 int,
   c_id int,
   PRIMARY KEY (f_id),
   FOREIGN KEY (b_id1, b_id2)
   REFERENCES b (b_id1, b_id2),
   FOREIGN KEY (c_id) REFERENCES c (c_id)
);

The base table is a. There are three tables, b, d, and e, with a direct foreign key to table a.  Tables c and f in turn references table b through a two column foreign key, and tables d and f references table c. So there are quite a few paths to get to table a from another table. Before looking at how the relationship can be found from the information_schema.KEY_COLUMN_USAGE, it is worth creating a visual representation of the schema.

MySQL Workbench EER Diagrams

A long standing feature of MySQL Workbench is its ability to create enhanced entity–relationship (EER) diagrams. This shows a box with information about the columns and indexes of each table in a schema. Additionally there are lines connecting tables related by foreign keys. So, an EER diagram includes what we are looking for – the chain of foreign keys.

You can create an ERR diagram by clicking on Database and then choose Reverse Engineer … from the menu in MySQL Workbench:

Choose Reverse Engineer in the MySQL Workbench menu.
Choose Reverse Engineer in the MySQL Workbench menu.

Alternatively use CTRL + R. You can do this from the homepage or from a database connection. Either way, you are taken to the connection options screen where you can choose an already defined connection or create a new one – this works the same as when you connect to a MySQL instance to execute queries:

Specify the connections options to create to the MySQL instance with the schema for the EER diagram.
Specify the connections options to create to the MySQL instance with the schema for the EER diagram.

When you continue, MySQL Workbench will connect to MySQL Server and get a list of the schemas available. Once you get to the Select Schemas page, you can choose the schema or schemas you want to create the EER diagram for. In this case choose the db1 schema (unless you created the tables in another schema):

Choose the schemas to import.
Choose the schemas to import.

For this example, you can use the defaults for the rest of the screens. On the Select Objects screen, you can optionally choose to select a subset of tables for the diagram. On the same screen, you choose whether you want to place the imported objects into a diagram (enabled by default); you want to do that for this example.

Tip

If MySQL Workbench crashes when creating the diagram, try open Edit → Configuration… → Modelling in the menu and check the Force use of software based rendering for EER diagrams option.

At the end, you have the diagram. You can move the tables around to place them as you like. One example of the diagram is:

MySQL Workbench EER Diagram
MySQL Workbench EER Diagram

This makes it easy to see the relations between the tables.

But what do you do, if you want to analyze the relationship in a program or for some other reason have the relationships in a text format? Let’s look at that.

Querying the Foreign Key Relationship

As mentioned, the base table for looking at foreign key relations is the information_schema.KEY_COLUMN_USAGE table. It has the following definition:

mysql> DESC information_schema.KEY_COLUMN_USAGE;
+-------------------------------+------------------+------+-----+---------+-------+
| Field                         | Type             | Null | Key | Default | Extra |
+-------------------------------+------------------+------+-----+---------+-------+
| CONSTRAINT_CATALOG            | varchar(64)      | YES  |     | NULL    |       |
| CONSTRAINT_SCHEMA             | varchar(64)      | YES  |     | NULL    |       |
| CONSTRAINT_NAME               | varchar(64)      | NO   |     | NULL    |       |
| TABLE_CATALOG                 | varchar(64)      | YES  |     | NULL    |       |
| TABLE_SCHEMA                  | varchar(64)      | YES  |     | NULL    |       |
| TABLE_NAME                    | varchar(64)      | YES  |     | NULL    |       |
| COLUMN_NAME                   | varchar(64)      | YES  |     | NULL    |       |
| ORDINAL_POSITION              | int(10) unsigned | NO   |     | NULL    |       |
| POSITION_IN_UNIQUE_CONSTRAINT | binary(0)        | YES  |     | NULL    |       |
| REFERENCED_TABLE_SCHEMA       | binary(0)        | YES  |     | NULL    |       |
| REFERENCED_TABLE_NAME         | binary(0)        | YES  |     | NULL    |       |
| REFERENCED_COLUMN_NAME        | binary(0)        | YES  |     | NULL    |       |
+-------------------------------+------------------+------+-----+---------+-------+
12 rows in set (0.01 sec)

In MySQL 8.0 this is a view on the new data dictionary, so effectively a plain InnoDB query and it is fast to query. In MySQL 5.7 and earlier, querying it requires opening the tables which can be slow and all tables must be opened. If you have many tables and they are not cached in the table caches yet, querying KEY_COLUMN_USAGE can be slow and impact the general performance.

Basic Query – Single Column per Foreign Key

The three columns prefixed with REFERENCED_ contains the information about a foreign key. For example, for the tables used in this blog, if you want to know which tables have a direct foreign key to table a in the db1 schema, you can query KEY_COLUMN_USAGE with a WHERE clause on REFERENCED_TABLE_SCHEMA and REFERENCED_TABLE_NAME like:

mysql> SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME,
              REFERENCED_TABLE_SCHEMA, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
         FROM information_schema.KEY_COLUMN_USAGE
        WHERE REFERENCED_TABLE_SCHEMA = 'db1'
              AND REFERENCED_TABLE_NAME = 'a';
+--------------+------------+-------------+-------------------------+-----------------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | REFERENCED_TABLE_SCHEMA | REFERENCED_TABLE_NAME | REFERENCED_COLUMN_NAME |
+--------------+------------+-------------+-------------------------+-----------------------+------------------------+
| db1          | b          | a_id        | db1                     | a                     | a_id                   |
| db1          | d          | a_id        | db1                     | a                     | a_id                   |
| db1          | e          | a_id        | db1                     | a                     | a_id                   |
+--------------+------------+-------------+-------------------------+-----------------------+------------------------+
3 rows in set (0.01 sec)

So, the tables b, d, and e has a foreign key to a_id in the db1.a table, and the column name for each of the three tables is also called a_id. This is just as expected.

The query works great for finding the immediate relations where the foreign key only includes a single column. However, for cases where there are multiple columns in the foreign key, there will be two rows for each referencing table. So what to do?

Basis Query – Multiple Columns per Foreign Key

To avoid having one row per column in a multi-column foreign key, you need to perform an aggregation. You can for example use the GROUP_CONCAT() to generate a single value combining the column names. In MySQL 8.0, you can also consider creating a JSON array by using the JSON_ARRAYAGG() function:

mysql> SELECT TABLE_SCHEMA, TABLE_NAME,
              JSON_ARRAYAGG(COLUMN_NAME) ColumnNames
         FROM information_schema.KEY_COLUMN_USAGE
        WHERE REFERENCED_TABLE_SCHEMA = 'db1'
              AND REFERENCED_TABLE_NAME = 'b'
        GROUP BY TABLE_SCHEMA, TABLE_NAME;
+--------------+------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | ColumnNames        |
+--------------+------------+--------------------+
| db1          | c          | ["b_id1", "b_id2"] |
| db1          | f          | ["b_id1", "b_id2"] |
+--------------+------------+--------------------+
2 rows in set (0.01 sec)

This queries the foreign keys to the b tables. The c and f tables have a foreign key using the b_id1 and b_id2 columns.

This query result also means that the c and f tables are related to the a table through the b table. Would it not be great, if there was a single query that could provide the foreign key chains? Well, in MySQL 8 you can get this using a common table expression (CTE).

Querying Foreign Key Chains – Step by Step

Tip

If you are just interested in the final query, skip to the next subsection.

The query will use a recursive common table expression. This requires a seed query and a recursive query (that works on the rows generated in the previous iteration). A good seed query is similar to what we had for the basis query. However, to make it possible to aggregate all of the steps in the chain, the chain will be generated as a JSON array with each part of the chain being a JSON object. The seed query becomes:

mysql> SELECT 1 AS Level, TABLE_SCHEMA, TABLE_NAME,
              JSON_ARRAY(
                  JSON_OBJECT('Level', 1,
                              'TableSchema', TABLE_SCHEMA,
                              'TableName', TABLE_NAME,
                              'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                  )
              ) AS ReferenceChain
         FROM information_schema.KEY_COLUMN_USAGE
        WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'a'
        GROUP BY TABLE_SCHEMA, TABLE_NAME;
+-------+--------------+------------+---------------------------------------------------------------------------------+
| Level | TABLE_SCHEMA | TABLE_NAME | ReferenceChain                                                                  |
+-------+--------------+------------+---------------------------------------------------------------------------------+
|     1 | db1          | b          | [{"Level": 1, "TableName": "b", "ColumnNames": ["a_id"], "TableSchema": "db1"}] |
|     1 | db1          | d          | [{"Level": 1, "TableName": "d", "ColumnNames": ["a_id"], "TableSchema": "db1"}] |
|     1 | db1          | e          | [{"Level": 1, "TableName": "e", "ColumnNames": ["a_id"], "TableSchema": "db1"}] |
+-------+--------------+------------+---------------------------------------------------------------------------------+
3 rows in set (0.01 sec)

Now, you can take each of these relations and look for tables having a foreign key to them, and so forth. That is the recursive part of the query. There is one complication though: GROUP BY is not allowed in the recursive part. The workaround is to use a subquery:

mysql> WITH RECURSIVE `reference` (Level, TABLE_SCHEMA, TABLE_NAME, ReferenceChain) AS (
            SELECT 1, TABLE_SCHEMA, TABLE_NAME,
                   JSON_ARRAY(
                        JSON_OBJECT('Level', 1,
                                    'TableSchema', TABLE_SCHEMA,
                                    'TableName', TABLE_NAME,
                                    'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                        )
                   )
              FROM information_schema.KEY_COLUMN_USAGE
             WHERE REFERENCED_TABLE_SCHEMA = 'db1' AND REFERENCED_TABLE_NAME = 'a'
             GROUP BY TABLE_SCHEMA, TABLE_NAME
             UNION
            SELECT r.Level+1, ko.TABLE_SCHEMA, ko.TABLE_NAME,
                   JSON_ARRAY_APPEND(
                       ReferenceChain, '$',
                       (SELECT JSON_OBJECT('Level', r.Level+1,
                                           'TableSchema', TABLE_SCHEMA,
                                           'TableName', TABLE_NAME,
                                           'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                               )
                          FROM information_schema.KEY_COLUMN_USAGE ki
                         WHERE ki.TABLE_SCHEMA = ko.TABLE_SCHEMA
                               AND TABLE_NAME = ko.TABLE_NAME
                               AND ki.REFERENCED_TABLE_SCHEMA = ko.REFERENCED_TABLE_SCHEMA
                               AND ki.REFERENCED_TABLE_NAME = ko.REFERENCED_TABLE_NAME)
                   )
              FROM `reference` r
                   INNER JOIN information_schema.KEY_COLUMN_USAGE ko ON
                         ko.REFERENCED_TABLE_SCHEMA = r.TABLE_SCHEMA
                         AND ko.REFERENCED_TABLE_NAME = r.TABLE_NAME
       )
       SELECT Level, TABLE_SCHEMA, TABLE_NAME,
              JSON_PRETTY(ReferenceChain) AS ReferenceChain
         FROM `reference`\G
*************************** 1. row ***************************
         Level: 1
  TABLE_SCHEMA: db1
    TABLE_NAME: b
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 2. row ***************************
         Level: 1
  TABLE_SCHEMA: db1
    TABLE_NAME: d
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "d",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 3. row ***************************
         Level: 1
  TABLE_SCHEMA: db1
    TABLE_NAME: e
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "e",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 4. row ***************************
         Level: 2
  TABLE_SCHEMA: db1
    TABLE_NAME: c
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  }
]
*************************** 5. row ***************************
         Level: 2
  TABLE_SCHEMA: db1
    TABLE_NAME: f
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "f",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  }
]
*************************** 6. row ***************************
         Level: 3
  TABLE_SCHEMA: db1
    TABLE_NAME: d
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 3,
    "TableName": "d",
    "ColumnNames": [
      "c_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 7. row ***************************
         Level: 3
  TABLE_SCHEMA: db1
    TABLE_NAME: f
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 3,
    "TableName": "f",
    "ColumnNames": [
      "c_id"
    ],
    "TableSchema": "db1"
  }
]
7 rows in set (0.02 sec)

Here the ARRAY_APPEND() function is used to add the next part of the chain to ReferenceChain. The query relies on that the UNION is a UNION DISTINCT by default, so for the cases where there are two columns in the foreign key, the second (duplicate) row is automatically filtered out. For the main query, JSON_PRETTY() is used to make it easier to read the JSON document. If you are using the query in an application, this is not needed.

You can stop here. The result is correct. However, you may think there are more rows than you would expect. For example the chain a → b is there on its own (1st row) even though there are also tables with foreign keys to b. If you want to include subchains in the result, then you are all set. If you want to filter chains out that are part of another chain, a little more work is needed.

To filter out chains that are also included in subsequent rows, it is in one way or another necessary to keep track of whether a row has any child rows (i.e. that a subsequent row is generated based on the row). One way to do this is to have a serialized form of the chain, however the disadvantage is that you don’t know how long a string you need to store that (and the string length must be specified in the seed query). Another option is to generate an ID for each row – for example using the UUID() function. Then in rows generated from the row make a reference to the parent row. This is the option used here.

A disadvantage of this approach is that for tables with more then one column in the foreign key, the two rows generated are no longer identical. So, it is necessary to handle this in the main query. However, it is now easy to only include the end of the chains as these will not have another row with the parent ID set to the row’s ID. To find this, use a LEFT OUTER JOIN and look for rows where the optional row returns a NULL ID (that is, a row was not found).

Final Query

The final query thus becomes:

mysql> WITH RECURSIVE `reference` (ID, ParentID, Level,
                                   TABLE_SCHEMA, TABLE_NAME,
                                   ReferenceChain) AS (
            SELECT UUID(), CAST(NULL AS CHAR(36)), 1,
                   TABLE_SCHEMA, TABLE_NAME,
                   JSON_ARRAY(
                       JSON_OBJECT('Level', 1,
                                   'TableSchema', TABLE_SCHEMA,
                                   'TableName', TABLE_NAME,
                                   'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                       )
                   )
              FROM information_schema.KEY_COLUMN_USAGE
             WHERE REFERENCED_TABLE_SCHEMA = 'db1'
                   AND REFERENCED_TABLE_NAME = 'a'
             GROUP BY TABLE_SCHEMA, TABLE_NAME
             UNION
            SELECT UUID(), r.ID, r.Level+1,
                   ko.TABLE_SCHEMA, ko.TABLE_NAME,
                   JSON_ARRAY_APPEND(
                       ReferenceChain, '$',
                       (SELECT JSON_OBJECT('Level', r.Level+1,
                                           'TableSchema', TABLE_SCHEMA,
                                           'TableName', TABLE_NAME,
                                           'ColumnNames', JSON_ARRAYAGG(COLUMN_NAME)
                               )
                          FROM information_schema.KEY_COLUMN_USAGE ki
                         WHERE ki.TABLE_SCHEMA = ko.TABLE_SCHEMA
                               AND TABLE_NAME = ko.TABLE_NAME
                               AND ki.REFERENCED_TABLE_SCHEMA = ko.REFERENCED_TABLE_SCHEMA
                               AND ki.REFERENCED_TABLE_NAME = ko.REFERENCED_TABLE_NAME)
                   )
              FROM `reference` r
                   INNER JOIN information_schema.KEY_COLUMN_USAGE ko ON
                         ko.REFERENCED_TABLE_SCHEMA = r.TABLE_SCHEMA
                         AND ko.REFERENCED_TABLE_NAME = r.TABLE_NAME
       )
       SELECT DISTINCT r.Level AS NumLevels,
              JSON_PRETTY(r.ReferenceChain) AS ReferenceChain
         FROM `reference` r
              LEFT OUTER JOIN `reference` rc ON rc.ParentID = r.ID
        WHERE rc.ID IS NULL\G
*************************** 1. row ***************************
     NumLevels: 1
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "d",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 2. row ***************************
     NumLevels: 1
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "e",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 3. row ***************************
     NumLevels: 2
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "f",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  }
]
*************************** 4. row ***************************
     NumLevels: 3
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 3,
    "TableName": "d",
    "ColumnNames": [
      "c_id"
    ],
    "TableSchema": "db1"
  }
]
*************************** 5. row ***************************
     NumLevels: 3
ReferenceChain: [
  {
    "Level": 1,
    "TableName": "b",
    "ColumnNames": [
      "a_id"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 2,
    "TableName": "c",
    "ColumnNames": [
      "b_id1",
      "b_id2"
    ],
    "TableSchema": "db1"
  },
  {
    "Level": 3,
    "TableName": "f",
    "ColumnNames": [
      "c_id"
    ],
    "TableSchema": "db1"
  }
]
5 rows in set (0.03 sec)

The DISTINCT in the main part of the query ensures that duplicates due to multiple columns in the foreign key are filtered out.

Note

One thing this version of the query does not handle is circular key relations. For example if you add the column c_id to a with a foreign key to the c table, then an infinite number of chains will be created. So, there need to be a condition that detects when a loop is getting created. That is an exercise for the reader – or for a later blog.

Thus, this schema has five unique chains leading to the a tables. You can also verify this from the EER diagram – for reference, here it is again:

MySQL Workbench EER Diagram
MySQL Workbench EER Diagram

Slides and Workbooks From Oracle OpenWorld & CodeOne

First of all, thanks to everyone who attended my sessions at the recent Oracle OpenWorld and Code One in San Francisco. It was a great privilege to be allowed to make a number of presentations.

All of the workbooks and scripts from the hands-on labs (HOL) and the slides from the talks have been made available at OpenWorld’s and Code One’s session catalogues. You can download the files by using the OpenWorld catalog searching for my name or the session number (see below). Click on the download icon for each of the presentations you are interested in.

Click on the download link in the Oracle OpenWorld session catalog to download the presentations.
Click on the download icon in the Oracle OpenWorld session catalog to download the presentations.

For the hands-on labs the downloadable file includes the workbook as well as the scripts related to the exercises. The workbook contains the instructions for setting up the system used for the exercises as well as the exercises themselves and some additionaly background information. For the talks, the download consists of a PDF version of the slides.

The three hands-on labs and three talks I had were:

  • DEV5957: Develop Python Applications with MySQL Connector/Python
    This covered MySQL Connector/Python in general from installation to best practices. The talk focused on using the connector with SQL tables using both SQL statements and NoSQL CRUD methods. If you are interested in how I installed MySQL Connector/Python on iPad (the screen shot on in the right hand side of slide showing the pip command), see my previous blog about installing MySQL Connector/Python in Pythonista 3.
  • DEV5959: Python and the MySQL Document Store
    This covered how to use MySQL Connector/Python (and a little of MySQL Shell in Python mode) with the MySQL JSON Document Store using the NoSQL API (the X DevAPI).
  • HOL1703: A Practical Introduction to the MySQL Document Store
    This hands-on lab used the MySQL Shell in Python mode to use the MySQL Document Store including an introduction to the CRUD methods. The lab also includes a comparison of implementing the same X DevAPI program using Python, Node.js, and Java.
  • HOL1706: Developing Modern Applications with the MySQL Document Store and NodeJS
    This lab is similar to HOL1703 except it mainly uses Node.js programs to use the MySQL Document Store.
  • HOL2986: Using MySQL Common Table Expressions and Window Functions
    An introduction to the new MySQL 8.0 query features: common table expressions (CTEs) and the window functions that can be used for analytic queries.
  • THT6703: NoSQL + SQL = MySQL
    A brief introduction to MySQL, MySQL 8, and how you can use it both as a NoSQL document store and a traditional SQL database.

Enjoy.

MySQL Server 8.0.13: Thanks for the 10 Facebook and Community Contributions

MySQL 8.0.13 was released this week. There are several exciting changes including functional indexes and using general expressions as the default value for your columns. So, I will recommend you to get MySQL 8.0.13 installed and try out the new features. You can read about changed in the release notes section of the MySQL documentation and in Geir’s release blog.

Thank you for the contributions to MySQL 8.0.13

However, what I would like to focus on in this blog is the external contributions that has been included in this release. There are five patches contributed by Facebook as well as five contributions from other MySQL users.

The patches contributed by Facebook are:

  • The MySQL client library now returns better error messages for OpenSSL errors. (Bug #27855668, Bug #90418)
  • The optimizer now supports a Skip Scan access method that enables range access to be used in previously inapplicable situations to improve query performance. For more information, see Skip Scan Range Access Method. (Bug #26976512, Bug #88103)
  • A new Performance Schema stage, waiting for handler commit, is available to detect threads going through transaction commit. (Bug #27855592, Bug #90417)
  • For mysqldump --tables output, file names now always include a .txt or .sql suffix, even for file names that already contain a dot. Thanks to Facebook for the contribution. (Bug #28380961, Bug #91745)
  • Failure to create a temporary table during a MyISAM query could cause a server exit. Thanks to Facebook for the patch. (Bug #27724519, Bug #90145)

Other contributions are:

  • Previously, file I/O performed in the I/O cache in the mysys library was not instrumented, affecting in particular file I/O statistics reported by the Performance Schema about the binary log index file. Now, this I/O is instrumented and Performance Schema statistics are accurate. Thanks to Yura Sorokin for the contribution. (Bug #27788907, Bug #90264)
  • Performance for locating user account entries in the in-memory privilege structures has been improved. Thanks to Eric Herman for the contribution. (Bug #27772506, Bug #90244)
  • InnoDB: A helper class was introduced to improve performance associated with reading from secondary keys when there are multiple versions of the same row. Thanks to Domas Mituzas for the contribution. (Bug #25540277, Bug #84958)
  • Replication: When the binlog_group_commit_sync_delay system variable is set to a wait time to delay synchronization of transactions to disk, and the binlog_group_commit_sync_no_delay_count system variable is also set to a number of transactions, the MySQL server exits the wait procedure if the specified number of transactions is reached before the specified wait time is reached. The server manages this process by checking on the transaction count after a delta of one tenth of the time specified by binlog_group_commit_sync_delay has elapsed, then subtracting that interval from the remaining wait time. If rounding during calculation of the delta meant that the wait time was not a multiple of the delta, the final subtraction of the delta from the remaining wait time would cause the value to be negative, and therefore to wrap to the maximum wait time, making the commit hang. The data type for the remaining wait time has now been changed so that the value does not wrap in this situation, and the commit can proceed when the original wait time has elapsed. Thanks to Yan Huang for the contribution. (Bug #28091735, Bug #91055)
  • Replication: In code for replication slave reporting, a rare error situation raised an assertion in debug builds, but in release builds, returned leaving a mutex locked. The mutex is now unlocked before returning in this situation. Thanks to Zsolt Parragi for the patch. (Bug #27448019, Bug #89421)

Thanks a lot for the contributions.

MySQL 8: Performance Schema Digests Improvements

Since MySQL 5.6, the digest feature of the MySQL Performance Schema has provided a convenient and effective way to obtain statistics of queries based on their normalized form. The feature works so well that it has almost completely (from my experience) replaced the connector extensions and proxy for collecting query statistics for the Query Analyzer (Quan) in MySQL Enterprise Monitor (MEM).

MySQL 8 adds further improvements to the digest feature in the Performance Schema including a sample query with statistics for each digest, percentile information, and a histogram summary. This blog will explore these new features.

The MySQL Enterprise Monitor Query Analyzer
MySQL Enterprise Monitor is one of the main users of the Performance Schema digests for its Query Analyzer.

Let’s start out looking at the the good old summary by digest table.

Query Sample

The base table for digest summary information is the events_statements_summary_by_digest table. This has been around since MySQL 5.6. In MySQL 8.0 it has been extended with six columns of which three have data related to a sample query will be examined in this section.

The three sample columns are:

  • QUERY_SAMPLE_TEXT: An actual example of a query.
  • QUERY_SAMPLE_SEEN: When the sample query was seen.
  • QUERY_SAMPLE_TIMER_WAIT: How long time the sample query took to execute (in picoseconds).

As an example consider the query SELECT * FROM world.city WHERE id = <value>. The sample information for that query as well as the digest and digest text (normalized query) may look like:

mysql> SELECT DIGEST, DIGEST_TEXT, QUERY_SAMPLE_TEXT, QUERY_SAMPLE_SEEN,
              sys.format_time(QUERY_SAMPLE_TIMER_WAIT) AS SampleTimerWait
         FROM performance_schema.events_statements_summary_by_digest
        WHERE DIGEST_TEXT LIKE '%`world` . `city`%'\G
*************************** 1. row ***************************
           DIGEST: 9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366
      DIGEST_TEXT: SELECT * FROM `world` . `city` WHERE `ID` = ?
QUERY_SAMPLE_TEXT: SELECT * FROM world.city WHERE ID = 130
QUERY_SAMPLE_SEEN: 2018-10-09 17:19:20.500944
  SampleTimerWait: 17.34 ms
1 row in set (0.00 sec)

There are a few things to note here:

  • The digest in MySQL 8 is a sha256 hash whereas in 5.6 and 5.7 it was an md5 hash.
  • The digest text is similar to the normalized query that the mysqldumpslow script can generate for queries in the slow query log; just that the Performance Schema uses a question mark as a placeholder.
  • The QUERY_SAMPLE_SEEN value is in the system time zone.
  • The sys.format_time() function is in the query used to convert the picoseconds to a human readable value.

The maximum length of the sample text is set with the performance_schema_max_sql_text_length option. The default is 1024 bytes. It is the same option that is used for the SQL_TEXT columns in the statement events tables. It requires a restart of MySQL to change the value. Since the query texts are stored in several contexts and some of the Performance Schema tables can have thousands of rows, do take care not to increase it beyond what you have memory for.

How is the sample query chosen? The sample is the slowest example of a query with the given digest. If the performance_schema_max_digest_sample_age option is set to a non-zero value (the default is 60 seconds) and the existing sample is older than the specified value, it will always be replaced.

The events_statements_summary_by_digest also has another set of new columns: percentile information.

Percentile Information

Since the beginning, the events_statements_summary_by_digest table has included some statistical information about the query times for a given digest: the minimum, average, maximum, and total query time. In MySQL 8 this has been extended to include information about the 95th, 99th, and 99.9th percentile. The information is available in the QUANTILE_95, QUANTILE_99, and QUANTILE_999 column respectively. All of the values are in picoseconds.

What does the new columns mean? Based on the histogram information of the query (see the next section), MySQL calculates a high estimate of the query time. For a given digest, 95% of the executed queries are expected to be faster than the query time given by QUANTILE_95. Similar for the two other columns.

As an example consider the same digest as before:

mysql> SELECT DIGEST, DIGEST_TEXT, QUERY_SAMPLE_TEXT,
              sys.format_time(SUM_TIMER_WAIT) AS SumTimerWait,
              sys.format_time(MIN_TIMER_WAIT) AS MinTimerWait,
              sys.format_time(AVG_TIMER_WAIT) AS AvgTimerWait,
              sys.format_time(MAX_TIMER_WAIT) AS MaxTimerWait,
              sys.format_time(QUANTILE_95) AS Quantile95,
              sys.format_time(QUANTILE_99) AS Quantile99,
              sys.format_time(QUANTILE_999) AS Quantile999
         FROM performance_schema.events_statements_summary_by_digest
        WHERE DIGEST_TEXT LIKE '%`world` . `city`%'\G
*************************** 1. row ***************************
           DIGEST: 9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366
      DIGEST_TEXT: SELECT * FROM `world` . `city` WHERE `ID` = ?
QUERY_SAMPLE_TEXT: SELECT * FROM world.city WHERE ID = 130
     SumTimerWait: 692.77 ms
     MinTimerWait: 50.32 us
     AvgTimerWait: 68.92 us
     MaxTimerWait: 17.34 ms
       Quantile95: 104.71 us
       Quantile99: 165.96 us
      Quantile999: 363.08 us
1 row in set (0.00 sec)

Having the 95th, 99th, and 99.9th percentile helps predict the performance of a query and show the spread of the query times. Even more information about the spread can be found using the new family member: histograms.

Histograms

Histograms is a way to put the query execution times into buckets, so it is possible to see how the query execution times spread. This can for example be useful to see how evenly the query time is. The average query time may be fine, but if that is based on some queries executing super fast and others very slow, it will still result in unhappy users and customers.

The MAX_TIMER_WAIT column of the events_statements_summary_by_digest table discussed this far shows the high watermark, but it does not say whether it is a single outlier or a result of general varying query times. The histograms give the answer to this.

Using the query digest from earlier in the blog, the histogram information for the query can be found in the events_statements_histogram_by_digest table like:

mysql> SELECT BUCKET_NUMBER,
              sys.format_time(BUCKET_TIMER_LOW) AS TimerLow,
              sys.format_time(BUCKET_TIMER_HIGH) AS TimerHigh,
              COUNT_BUCKET, COUNT_BUCKET_AND_LOWER, BUCKET_QUANTILE
         FROM performance_schema.events_statements_histogram_by_digest
        WHERE DIGEST = '9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366'
              AND COUNT_BUCKET > 0
        ORDER BY BUCKET_NUMBER;
+---------------+-----------+-----------+--------------+------------------------+-----------------+
| BUCKET_NUMBER | TimerLow  | TimerHigh | COUNT_BUCKET | COUNT_BUCKET_AND_LOWER | BUCKET_QUANTILE |
+---------------+-----------+-----------+--------------+------------------------+-----------------+
|            36 | 50.12 us  | 52.48 us  |          524 |                    524 |        0.052400 |
|            37 | 52.48 us  | 54.95 us  |         2641 |                   3165 |        0.316500 |
|            38 | 54.95 us  | 57.54 us  |          310 |                   3475 |        0.347500 |
|            39 | 57.54 us  | 60.26 us  |          105 |                   3580 |        0.358000 |
|            40 | 60.26 us  | 63.10 us  |           48 |                   3628 |        0.362800 |
|            41 | 63.10 us  | 66.07 us  |         3694 |                   7322 |        0.732200 |
|            42 | 66.07 us  | 69.18 us  |          611 |                   7933 |        0.793300 |
|            43 | 69.18 us  | 72.44 us  |          236 |                   8169 |        0.816900 |
|            44 | 72.44 us  | 75.86 us  |          207 |                   8376 |        0.837600 |
|            45 | 75.86 us  | 79.43 us  |          177 |                   8553 |        0.855300 |
|            46 | 79.43 us  | 83.18 us  |          236 |                   8789 |        0.878900 |
|            47 | 83.18 us  | 87.10 us  |          186 |                   8975 |        0.897500 |
|            48 | 87.10 us  | 91.20 us  |          203 |                   9178 |        0.917800 |
|            49 | 91.20 us  | 95.50 us  |          116 |                   9294 |        0.929400 |
|            50 | 95.50 us  | 100.00 us |          135 |                   9429 |        0.942900 |
|            51 | 100.00 us | 104.71 us |          105 |                   9534 |        0.953400 |
|            52 | 104.71 us | 109.65 us |           65 |                   9599 |        0.959900 |
|            53 | 109.65 us | 114.82 us |           65 |                   9664 |        0.966400 |
|            54 | 114.82 us | 120.23 us |           59 |                   9723 |        0.972300 |
|            55 | 120.23 us | 125.89 us |           40 |                   9763 |        0.976300 |
|            56 | 125.89 us | 131.83 us |           34 |                   9797 |        0.979700 |
|            57 | 131.83 us | 138.04 us |           33 |                   9830 |        0.983000 |
|            58 | 138.04 us | 144.54 us |           27 |                   9857 |        0.985700 |
|            59 | 144.54 us | 151.36 us |           16 |                   9873 |        0.987300 |
|            60 | 151.36 us | 158.49 us |           25 |                   9898 |        0.989800 |
|            61 | 158.49 us | 165.96 us |           20 |                   9918 |        0.991800 |
|            62 | 165.96 us | 173.78 us |            9 |                   9927 |        0.992700 |
|            63 | 173.78 us | 181.97 us |           11 |                   9938 |        0.993800 |
|            64 | 181.97 us | 190.55 us |           11 |                   9949 |        0.994900 |
|            65 | 190.55 us | 199.53 us |            4 |                   9953 |        0.995300 |
|            66 | 199.53 us | 208.93 us |            6 |                   9959 |        0.995900 |
|            67 | 208.93 us | 218.78 us |            6 |                   9965 |        0.996500 |
|            68 | 218.78 us | 229.09 us |            6 |                   9971 |        0.997100 |
|            69 | 229.09 us | 239.88 us |            3 |                   9974 |        0.997400 |
|            70 | 239.88 us | 251.19 us |            2 |                   9976 |        0.997600 |
|            71 | 251.19 us | 263.03 us |            2 |                   9978 |        0.997800 |
|            72 | 263.03 us | 275.42 us |            2 |                   9980 |        0.998000 |
|            73 | 275.42 us | 288.40 us |            4 |                   9984 |        0.998400 |
|            74 | 288.40 us | 302.00 us |            2 |                   9986 |        0.998600 |
|            75 | 302.00 us | 316.23 us |            2 |                   9988 |        0.998800 |
|            76 | 316.23 us | 331.13 us |            1 |                   9989 |        0.998900 |
|            78 | 346.74 us | 363.08 us |            3 |                   9992 |        0.999200 |
|            79 | 363.08 us | 380.19 us |            2 |                   9994 |        0.999400 |
|            80 | 380.19 us | 398.11 us |            1 |                   9995 |        0.999500 |
|            83 | 436.52 us | 457.09 us |            1 |                   9996 |        0.999600 |
|           100 | 954.99 us | 1.00 ms   |            1 |                   9997 |        0.999700 |
|           101 | 1.00 ms   | 1.05 ms   |            1 |                   9998 |        0.999800 |
|           121 | 2.51 ms   | 2.63 ms   |            1 |                   9999 |        0.999900 |
|           162 | 16.60 ms  | 17.38 ms  |            1 |                  10000 |        1.000000 |
+---------------+-----------+-----------+--------------+------------------------+-----------------+
49 rows in set (0.02 sec)

In this example, 3694 times (the COUNT_BUCKET column) when the query were executed, the query time was between 63.10 microseconds and 66.07 microseconds, so the execution time matched the interval of bucket number 41. There has been at total of 7322 executions (the COUNT_BUCKET_AND_LOWER column) of the query with a query time of 66.07 microseconds or less. This means that 73.22% (the BUCKET_QUANTILE column) of the queries have a query time of 66.07 microseconds or less.

In addition to the shown columns, there is SCHEMA_NAME and DIGEST (which together with BUCKET_NUMBER form a unique key). For each digest there are 450 buckets with the width of the bucket (in terms of difference between the low and high timers) gradually becoming larger and larger. The first, middle, and last five buckets are:

mysql> SELECT BUCKET_NUMBER,
              sys.format_time(BUCKET_TIMER_LOW) AS TimerLow,
              sys.format_time(BUCKET_TIMER_HIGH) AS TimerHigh
         FROM performance_schema.events_statements_histogram_by_digest
        WHERE DIGEST = '9431aed9635923565d7bc92cc36d6411c3abb9f52d2c22715be21b5472e3c366'
              AND (BUCKET_NUMBER < 5 OR BUCKET_NUMBER > 444 OR BUCKET_NUMBER BETWEEN 223 AND 227);
+---------------+-----------+-----------+
| BUCKET_NUMBER | TimerLow  | TimerHigh |
+---------------+-----------+-----------+
|             0 | 0 ps      | 10.00 us  |
|             1 | 10.00 us  | 10.47 us  |
|             2 | 10.47 us  | 10.96 us  |
|             3 | 10.96 us  | 11.48 us  |
|             4 | 11.48 us  | 12.02 us  |
|           223 | 275.42 ms | 288.40 ms |
|           224 | 288.40 ms | 302.00 ms |
|           225 | 302.00 ms | 316.23 ms |
|           226 | 316.23 ms | 331.13 ms |
|           227 | 331.13 ms | 346.74 ms |
|           445 | 2.11 h    | 2.21 h    |
|           446 | 2.21 h    | 2.31 h    |
|           447 | 2.31 h    | 2.42 h    |
|           448 | 2.42 h    | 2.53 h    |
|           449 | 2.53 h    | 30.50 w   |
+---------------+-----------+-----------+
15 rows in set (0.02 sec)

The bucket thresholds are fixed and thus the same for all digests. There is also a global histogram in the events_statements_histogram_global.

This includes the introduction to the new Performance Schema digest features. As monitoring tools start to use this information, it will help create a better monitoring experience. Particularly the histograms will benefit from being shown as graphs.