MySQL 8: Performance Schema Digests Improvements

Since MySQL 5.6, the digest feature of the MySQL Performance Schema has provided a convenient and effective way to obtain statistics of queries based on their normalized form. The feature works so well that it has almost completely (from my experience) replaced the connector extensions and proxy for collecting query statistics for the Query Analyzer (Quan) in MySQL Enterprise Monitor (MEM).

MySQL 8 adds further improvements to the digest feature in the Performance Schema including a sample query with statistics for each digest, percentile information, and a histogram summary. This blog will explore these new features.

The MySQL Enterprise Monitor Query Analyzer
MySQL Enterprise Monitor is one of the main users of the Performance Schema digests for its Query Analyzer.

Let’s start out looking at the the good old summary by digest table.

Query Sample

The base table for digest summary information is the events_statements_summary_by_digest table. This has been around since MySQL 5.6. In MySQL 8.0 it has been extended with six columns of which three have data related to a sample query will be examined in this section.

The three sample columns are:

  • QUERY_SAMPLE_TEXT: An actual example of a query.
  • QUERY_SAMPLE_SEEN: When the sample query was seen.
  • QUERY_SAMPLE_TIMER_WAIT: How long time the sample query took to execute (in picoseconds).

As an example consider the query SELECT * FROM world.city WHERE id = <value>. The sample information for that query as well as the digest and digest text (normalized query) may look like:

There are a few things to note here:

  • The digest in MySQL 8 is a sha256 hash whereas in 5.6 and 5.7 it was an md5 hash.
  • The digest text is similar to the normalized query that the mysqldumpslow script can generate for queries in the slow query log; just that the Performance Schema uses a question mark as a placeholder.
  • The QUERY_SAMPLE_SEEN value is in the system time zone.
  • The sys.format_time() function is in the query used to convert the picoseconds to a human readable value.

The maximum length of the sample text is set with the performance_schema_max_sql_text_length option. The default is 1024 bytes. It is the same option that is used for the SQL_TEXT columns in the statement events tables. It requires a restart of MySQL to change the value. Since the query texts are stored in several contexts and some of the Performance Schema tables can have thousands of rows, do take care not to increase it beyond what you have memory for.

How is the sample query chosen? The sample is the slowest example of a query with the given digest. If the performance_schema_max_digest_sample_age option is set to a non-zero value (the default is 60 seconds) and the existing sample is older than the specified value, it will always be replaced.

The events_statements_summary_by_digest also has another set of new columns: percentile information.

Percentile Information

Since the beginning, the events_statements_summary_by_digest table has included some statistical information about the query times for a given digest: the minimum, average, maximum, and total query time. In MySQL 8 this has been extended to include information about the 95th, 99th, and 99.9th percentile. The information is available in the QUANTILE_95, QUANTILE_99, and QUANTILE_999 column respectively. All of the values are in picoseconds.

What does the new columns mean? Based on the histogram information of the query (see the next section), MySQL calculates a high estimate of the query time. For a given digest, 95% of the executed queries are expected to be faster than the query time given by QUANTILE_95. Similar for the two other columns.

As an example consider the same digest as before:

Having the 95th, 99th, and 99.9th percentile helps predict the performance of a query and show the spread of the query times. Even more information about the spread can be found using the new family member: histograms.

Histograms

Histograms is a way to put the query execution times into buckets, so it is possible to see how the query execution times spread. This can for example be useful to see how evenly the query time is. The average query time may be fine, but if that is based on some queries executing super fast and others very slow, it will still result in unhappy users and customers.

The MAX_TIMER_WAIT column of the events_statements_summary_by_digest table discussed this far shows the high watermark, but it does not say whether it is a single outlier or a result of general varying query times. The histograms give the answer to this.

Using the query digest from earlier in the blog, the histogram information for the query can be found in the events_statements_histogram_by_digest table like:

In this example, 3694 times (the COUNT_BUCKET column) when the query were executed, the query time was between 63.10 microseconds and 66.07 microseconds, so the execution time matched the interval of bucket number 41. There has been at total of 7322 executions (the COUNT_BUCKET_AND_LOWER column) of the query with a query time of 66.07 microseconds or less. This means that 73.22% (the BUCKET_QUANTILE column) of the queries have a query time of 66.07 microseconds or less.

In addition to the shown columns, there is SCHEMA_NAME and DIGEST (which together with BUCKET_NUMBER form a unique key). For each digest there are 450 buckets with the width of the bucket (in terms of difference between the low and high timers) gradually becoming larger and larger. The first, middle, and last five buckets are:

The bucket thresholds are fixed and thus the same for all digests. There is also a global histogram in the events_statements_histogram_global.

This includes the introduction to the new Performance Schema digest features. As monitoring tools start to use this information, it will help create a better monitoring experience. Particularly the histograms will benefit from being shown as graphs.

Replication Monitoring with the Performance Schema

The traditional way to monitor replication in MySQL is the SHOW SLAVE STATUS command. However as it will be shown, it has its limitations and in MySQL 5.7 and 8.0 the MySQL developers have started to implement the information as Performance Schema tables. This has several advantages including better monitoring of the replication delay in MySQL 8.0. This blog discusses why SHOW SLAVE STATUS should be replaced with the Performance Schema tables.

The Setup

The replication setup that will be used for the examples in this blog can be seen in the following figure.

Replication Setup with Multi-Source and Chained Replication
Replication Setup with Multi-Source and Chained Replication

There are two source instances (replication masters). Source 1 replicates to the Relay instance (i.e. it acts both as a replica and source in the setup). The Relay instance replicates to the Replica instance which also replicates from Source 2. That is, the Replica instance uses multi-source replication to replicate from the Source 1Relay chain as well as directly from Source 2.

This blog will use the Replica instance to demonstrate SHOW SLAVE STATUS and Performance Schema replication tables.

SHOW SLAVE STATUS

The SHOW SLAVE STATUS command has been around since the addition of replication to MySQL. Thus it is familiar to all database administrators who have been working with replication. It is also very simple to use, and it is easy to remember the syntax. So far so good. However, it also has some limitations.

Let’s take a look at how the output of SHOW SLAVE STATUS looks in the setup described above:

The first thought: that’s a lot of lines of output. That is one of the issues – there is no way to limit the output. So summarize some of the issues with SHOW SLAVE STATUS:

  • There is no support for filter conditions or choosing which columns to include, so the output is very verbose. In this case with two replication channels, all available data for both channels are always included.
  • The order of the columns reflects the order they were added rather than how they logically belong together. Over the years many new columns have been added as new features have been added or the feature has been changed from an option configured in my.cnf to an option configured with CHANGE MASTER TO. For example the channel name is the fourth column from the end even if it would be more natural to have it as the first column (as it’s it the “primary key” of the output).
  • For multi-threaded appliers (replication_slave_workers > 1) there is no information for the individual workers.
  • Information related to the connection (I/O thread) and applier (SQL thread) as well configuration and status are mixed.
  • What does Seconds_behind_master mean? For the source_2 channel (the direct replication from Source 2 it is relatively easy to understand, but for the relay channel is it relative to Source 1 (yes) or to Replica (no)? More about this later.

To look at what can be done to solve these limitations, let’s look at the Performance Schema.

Performance Schema Replication Tables

To overcome these limitations, MySQL 5.7 introduced a series of replication tables in the Performance Schema. These have been extended in MySQL 8.0 to make them even more useful. One of the advantages of using the Performance Schema is that queries are executed as regular SELECT statements with the usual support for choosing columns and manipulating them and to apply a WHERE clause. First, let’s take a look at which replication related tables that are available.

Overview of Tables

As of MySQL 8.0.12 there are 11 replication related tables. The tables are:

  • log_status: This table is new in MySQL 8 and provides information about the binary log, relay log, and InnoDB redo log in a consistent manner (i.e. all values are for the same point in time).
  • Applier:
    • replication_applier_configuration: This table shows the configuration of the applier threads for each replication channel. Currently the only setting is the configured replication delay.
    • replication_applier_filters: The channel specific replication filters including where and when they were configured.
    • replication_applier_global_filters: The global replication filters including how and when they were configured.
    • replication_applier_status: This table shows the replication filters for each replication channel. The information includes the service state, remaining delay, and number of transaction retries.
    • replication_applier_status_by_coordinator: For multi-threaded replicas this table shows the status of the thread that manages the actual worker threads. In MySQL 8 this includes several timestamps to give detailed knowledge of the replication delay.
    • replication_applier_status_by_worker: The status of each worker thread (for single-threaded replication there is one per channel). In MySQL 8 this includes several timestamps to give detailed knowledge of the replication delay.
  • Connection:
    • replication_connection_configuration: The configuration of each of the replication channels.
    • replication_connection_status: The status of the replication channels. In MySQL 8 this includes information about the timestamps showing when the currently queuing transaction was originally committed, when it was committed on the immediate source instance, and when it was written to the relay log. This makes it possible to describe much more accurately what the replication delay is.
  • Group Replication:

The Group Replication tables will not be discussed further.

Since the information from SHOW SLAVE STATUS has been split up into several tables, it can be useful to take a look at how the information map.

Old Versus New

The following table shows how to get from a column in the SHOW SLAVE STATUS output to a table and column in the Performance Schema. The channel name is present in all of the Performance Schema replication tables (it’s the primary key or part of it). The replication filters and rewrite rules are split into two tables. The I/O and SQL thread states can be found in the performance_schema.threads by joining using the THREAD_ID column for the corresponding threads.

SHOW SLAVE STATUSPerformance Schema
ColumnTableColumn
Slave_IO_StatethreadsPROCESSLIST_STATE
Master_Hostreplication_connection_configurationHOST
Master_Userreplication_connection_configurationUSER
Master_Portreplication_connection_configurationPORT
Connect_Retryreplication_connection_configurationCONNECTION_RETRY_INTERVAL
Master_Log_File
Read_Master_Log_Pos
Relay_Log_Filelog_statusREPLICATION->>'$.channels[*].relay_log_file'
Relay_Log_Poslog_statusREPLICATION->>'$.channels[*].relay_log_position'
Relay_Master_Log_File
Slave_IO_Runningreplication_connection_statusSERVICE_STATE
Slave_SQL_Runningreplication_applier_status_by_coordinatorSERVICE_STATE
Replicate_Do_DBreplication_applier_filters
replication_applier_global_filters
Replicate_Ignore_DBreplication_applier_filters
replication_applier_global_filters
Replicate_Do_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Ignore_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Wild_Do_Tablereplication_applier_filters
replication_applier_global_filters
Replicate_Wild_Ignore_Tablereplication_applier_filtersreplication_applier_global_filters
Last_Errno
Last_Error
Skip_Counter
Exec_Master_Log_Pos
Relay_Log_Space
Until_Condition
Until_Log_File
Until_Log_Pos
Master_SSL_Allowedreplication_connection_configurationSSL_ALLOWED
Master_SSL_CA_Filereplication_connection_configurationSSL_CA_FILE
Master_SSL_CA_Pathreplication_connection_configurationSSL_CA_PATH
Master_SSL_Certreplication_connection_configurationSSL_CERTIFICATE
Master_SSL_Cipherreplication_connection_configurationSSL_CIPHER
Master_SSL_Keyreplication_connection_configurationSSL_KEY
Seconds_Behind_Master
Master_SSL_Verify_Server_Certreplication_connection_configurationSSL_VERIFY_SERVER_CERTIFICATE
Last_IO_Errnoreplication_connection_statusLAST_ERROR_NUMBER
Last_IO_Errorreplication_connection_statusLAST_ERROR_MESSAGE
Last_SQL_Errnoreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_NUMBER
Last_SQL_Errorreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_MESSAGE
Replicate_Ignore_Server_Ids
Master_Server_Id
Master_UUIDreplication_connection_statusSOURCE_UUID
Master_Info_File
SQL_Delayreplication_applier_configurationDESIRED_DELAY
SQL_Remaining_Delayreplication_applier_statusREMAINING_DELAY
Slave_SQL_Running_StatethreadsPROCESSLIST_STATE
Master_Retry_Countreplication_connection_configurationCONNECTION_RETRY_COUNT
Master_Bindreplication_connection_configurationNETWORK_INTERFACE
Last_IO_Error_Timestampreplication_connection_statusLAST_ERROR_TIMESTAMP
Last_SQL_Error_Timestampreplication_applier_status_by_worker
replication_applier_status_by_coordinator
LAST_ERROR_TIMESTAMP
Master_SSL_Crlreplication_connection_configurationSSL_CRL_FILE
Master_SSL_Crlpathreplication_connection_configurationSSL_CRL_PATH
Retrieved_Gtid_Setreplication_connection_statusRECEIVED_TRANSACTION_SET
Executed_Gtid_Set
Auto_Positionreplication_connection_configurationAUTO_POSITION
Replicate_Rewrite_DBreplication_applier_filters
replication_applier_global_filters
Channel_Name
Master_TLS_Versionreplication_connection_configurationTLS_VERSION
Master_public_key_pathreplication_connection_configurationPUBLIC_KEY_PATH
Get_master_public_keyreplication_connection_configurationGET_PUBLIC_KEY

As it can be seen, there are a few columns from SHOW SLAVE STATUS that do not have any corresponding tables and columns in the Performance Schema yet. One that probably is familiar to many as the main mean of monitoring the replication lag is the the Seconds_Behind_Master column.  This is no longer needed. It is now possible to get a better value using the timestamp columns in the replication_applier_status_by_coordinator, replication_applier_status_by_worker, and replication_connection_status tables. Talking about that, it is time to see the Performance Schema replication tables in action.

Examples

The rest of the blog shows example outputs each of the replication tables (except the ones related to Group Replication) in the Performance Schema. For some of the tables there is a short discussion following the output. The queries have been executed in rapid succession after the above SHOW SLAVE STATUS output was generated. As the outputs have been generated using separate queries, they do not correspond to the exact same point in time.

log_status

The log_status table shows the replication and engine log data so the data is consistent:

replication_applier_configuration

The replication_applier_configuration table shows the configuration of the applier threads:

replication_applier_filters

The replication_applier_filters table shows the channel specific replication filters:

There is one filter specifically for the relay channel: the channel will ignore changes to tables in the world schema and the filter was set using the the replicate_wild_ignore_table option in the MySQL configuration file.

replication_applier_global_filters

The replication_applier_global_filters table shows the replication filters that are shared for all channels:

There is also one global replication filter. This has been set using the CHANGE REPLICATION FILTER statement.

replication_applier_status

The replication_applier_status table shows the overall status for the applier threads:

replication_applier_status_by_coordinator

The replication_applier_status_by_coordinator table shows the status for the coordinator when multi-threaded appliers has been configured (slave_parallel_workers > 1):

This is an example of MySQL 8 has detailed information about the timings were for the various stages of the applied events. For example for the relay channel, it can be seen that for the last processed transaction, it took 18 seconds from the transaction was committed on Source 1 (original commit) until it was committed on Relay (immediate commit), but then it only took around half a second until the coordinate was done processing the transaction (i.e. sending it to a worker). Which brings us to the status by worker.

replication_applier_status_by_worker

The replication_applier_status_by_worker table shows the status for each worker thread:

The timestamps for the relay channel’s workers (only one has been active as it can be seen) can be used to see that the last transaction took around 19 seconds to apply and it was committed also 19 seconds after it committed on the immediate source (the Relay instance).

You can compare this delay of 19 seconds with the 49 seconds claimed by Seconds_Behind_Master in the SHOW SLAVE STATUS output. Why the difference? Seconds_Behind_Master is really the time from the original source started to execute the current transaction until now. So that includes the time it took to execute the transaction not only on Source 1 but also on Relay and the time used until now on Replica.

replication_connection_configuration

The replication_connection_configuration table shows the configuration for each connection to the source of the replication:

replication_connection_status

The replication_connection_status table shows the status of each connection:

Similar to the applier timestamps, the connection timestamps can be used to check the lag caused by the connection threads fetching the transactions from its source’s binary log and writing it to its own relay log. For the relay channel it took around 0.14 second from the transaction was committed on the immediate source (the Relay instance) until the connection thread started to write the transaction to the relay log (LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP) and then further 0.007 second to complete the write.

Conclusion

The replication related tables in the Performance Schema, particularly the MySQL 8 version of them, provides a very useful way to get information about the replication configuration and status. As a support engineer myself, I look forward to have access to the new timestamp values, when I investigate replication delays.

There is still some work remaining such. For example, it could be useful to know when the transaction started on the original source. That way it is possible to compare the execution time between a source and a replica. However, the current information that is available is already a great improvement.

Shutdown and Restart Statements

There are various ways to shutdown MySQL. The traditional cross platform method is to use the shutdown command in the mysqladmin client. One drawback is that it requires shell access; another is that it cannot start MySQL again automatically. There are platform specific options that can perform a restart such as using systemctl on Linux or install MySQL as a service on Microsoft Windows. What I will look at here though is the built in support for stopping and restarting MySQL using SQL statements.

Stop Sign
Photo by Michael Mroczek on Unsplash

MySQL 5.7 added the SHUTDOWN statement which allows you to shut down MySQL using the MySQL command-line client or MySQL Shell. The command is straight forward to use:

The SHUTDOWN command available in MySQL 5.7 and later.
The SHUTDOWN command available in MySQL 5.7 and later.

You will need the SHUTDOWN privilege to use the statement – this is the same as it required to use mysqladmin to shutdown MySQL. There is one gotcha to be aware of with the SHUTDOWN statement: it only works with the old (traditional) MySQL protocol. If you attempt to use it when connected to MySQL using the new X Protocol, you get the error: ERROR: 3130: Command not supported by pluggable protocols as shown in the next example:

Executing SHUTDOWN when connected through the X Protocol causes error 3130.
Executing SHUTDOWN when connected through the X Protocol causes error 3130.

The RESTART statement, on the other hand, works through both protocols and also requires the SHUTDOWN privilege:

The RESTART command available in MySQL 8.0.
The RESTART command available in MySQL 8.0.

For the restart to work, it is necessary that MySQL has been started in presence of a “monitoring service”. This is the default on Microsoft Windows (to disable the monitoring service start MySQL with --no-monitor). On Linux the monitoring service can for example be systemd or mysqld_safe.

As an example of where the RESTART statement comes in handy is for MySQL Shell’s AdminAPI for administrating a MySQL InnoDB Cluster cluster. MySQL Shell can when connected to MySQL Server 8.0 use the new SET PERSIST syntax to make the required configuration changes and then use the RESTART statement to restart the instance to make non-dynamic configuration changes take effect.

The SHUTDOWN and RESTART statements may not be the most important changes in MySQL 5.7 and 8.0, but they can be handy to know of in some cases.

Meet MySQL Support at Oracle OpenWorld and Code One 2018

Oracle MySQL Support will this year again take part in the Oracle OpenWorld conference in San Francisco. Additionally, we will present at the developer focused Code One that is held at the same time. Oracle OpenWorld and Code One 2018 takes place at the Moscone Center and nearby hotels in San Francisco on Monday 22 October to Thursday 25 October.

Oracle OpenWorld and Code One 2018

MySQL Support will be represented by Lig Isler-turmelle and myself (Jesper Wisborg Krogh), and we will organize three hands-on labs (HOL) in Oracle OpenWorld, two talks in Code One, and two mini-briefings at the Support Stars Bar. The hands-on lab and Code One talks are:

TimeSpeaker(s)SessionTitle
Monday
22 October
12:15 pm - 13:15 pm
Lig Isler-turmelle
Jesper Wisborg Krogh
HOL1703A Practical Introduction to the MySQL Document Store
Tuesday
23 October
11:15 am - 12:15pm
Lig Isler-turmelle
Jesper Wisborg Krogh
HOL1706Developing Modern Applications with the MySQL Document Store and NodeJS
Wednesday
24 October
09:00 am - 09:45 am
Jesper Wisborg KroghDEV5957Develop Python Application with MySQL Connector/Python
Wednesday
24 October
02:30 pm - 03:15 am
Jesper Wisborg KroghDEV5959Python and the MySQL Document Store
Thursday
25 October
09:00 am - 10:00 am
Lig Isler-turmelle
Jesper Wisborg Krogh
HOL2986Using MySQL Common Table Expressions and Window Functions

The HOL sessions are in Salon 9A at the Mariott Marquis on the Yearba Buena Level and the DEV sessions are the Code One talks and will be in Room 2014 at Moscone West.

Additionally, we will be present at the Support Stars Bar. This is an area full of people from Oracle Support. Throughout the four days there will be 15-minute mini-briefings. Attendees to the mini-briefings can enter a drawing with a chance to win a prize. MySQL Support will have two mini-briefings (the schedule is still to be finalized). The Support Stars Bar will be located at level 3 in Moscone West. You can read more about the Support Stars Bar at the Oracle Support homepage.

There are of course many more talks both by Oracle developers, product management, as well as MySQL users. Registration is open, and I will encourage you to register and come and meet the people behind MySQL and hear about the new features and how MySQL is used. You can read more about MySQL’s general presence in LeFred’s blog and see a list of all the MySQL sessions at both OpenWorld and Code One in the Code One session catalogue.

Register for Oracle OpenWorld 2018

NoSQL/X DevAPI Tutorial with MySQL Connector/Python 8.0

The MySQL Document Store became general available (GA) with MySQL 8. One of the nice features of the MySQL Document Store is the X DevAPI that allows you to query the data from a multitude of programming languages using the same API (but while retaining the conventions of the language). The programming languages with support for the X DevAPI includes JavaScript (Node.js), PHP, Java, DotNet, and C++.

I will be using MySQL Connector/Python 8.0.12 for the example in this blog. The example is executed on Microsoft Windows with Python 3.6 installed, but it has also been tested on Oracle Linux 7 with Python 2.7. I do assume that MySQL Connector/Python has been installed. If that is not the case, you can read how to do it in the Installing Connector/Python from a Binary Distribution section in the manual or Chapter 1 of MySQL Connector/Python Revealed from Apress.

The output of the example program
The output of the example program

The example will go through the following steps:

  • Getting Ready:
    1. Load the mysqlx module.
    2. Create a database connection.
  • Setup:
    1. Create a schema.
    2. Create a collection.
  • CRUD – Create:
    1. Insert some documents into the collection.
  • CRUD – Read:
    1. Query the documents.
  • Cleanup:
    1. Drop the schema.
You can download the complete example program here: Example Code for Tutorial

The program uses the pyuser@localhost user. The connection parameters can be changed as described in the “Getting Ready” section. A user that fulfills the requirement to the example program can be created using the following SQL statements:

Warning: This program is not an example of using best practices. Do not store the password and preferably also the other connection options in the source code. There is also very limited handling or errors and warnings in order to keep the example simple. You should not skip those steps in a production program.

Getting Ready

The first thing is to get ready by importing MySQL Connector/Python’s mysqlx module and connect to MySQL. This is simple to do as shown in the below code snippet (the line numbers refer to the full example):

The mysqlx module in imported in line 38. This is where the MySQL Connector/Python implementation of the X DevAPI resides. The module includes support for CRUD statements both for documents and SQL tables, schema and collection manipulations, as well as executing SQL statements. In this example, only the CRUD implementation for documents and the schema and collection manipulation will be used.

The warning_levels variable is uses to convert numeric warning levels returns by the X DevAPI to names. There is an example of how to handle warnings after the first document has been added.

Finally, the connection is created in line 52 using the get_session() method in the mysqlx module.  With a connection object in place, let’s move on to set up the schema and collection.

Setup

The X DevAPI has support for creating and dropping schemas and collections (but currently not SQL tables). This is used in the example to set up the my_collections schema with a single collection called my_docs:

The create_schema() method on the database (session) object is used to create the schema. It will succeed even if the schema already exists. In that case the existing schema will be returned.

The collection is similarly created with the create_collection() method from the schema object. This will by default fail if the collection already exists. It can be overwritten with the reuse argument (the second argument).

That is it. A collection is always defined internally in the same way when it is created. You can add indexes – I hope to get back to that in a future blog – but there is no need to think of columns and data types at this point. That is the advantage of using a schemaless database (but also one of the dangers – now the whole responsibility of staying consistent is up to you as a developer).

Let’s continue to add some documents.

CRUD – Create

For this example, three documents will be added to the my_docs collection. The documents contain information about three persons including their name, birthday, and hobbies. The documents can be defined as Python dictionaries with JSON arrays represented as Python lists:

This is the beauty of working with JSON documents in Python. They just work.

The birthdays are written in the ISO 8601 format (the same as MySQL’s date data type uses – but not datetime!). As the MySQL Document Store is schemaless, you are free to chose whatever format you feel like, however, it is strongly recommended to use a standard format. This YYYY-mm-dd format has the advantage that it will sort correctly, so alone for that reason, it is a strong candidate.

The documents will be inserted in two rounds. First Adam will be added, then Kate and Jane.

Adding a Single Document

There are a few ways to add documents (all working in the same basic way). This example will show two of them. First let’s look at how Adam is added:

The document is added inside a transaction. The X DevAPI connection inherits the value of autocommit from the server-side (defaults to ON), so to be sure the create action can be tested for warnings before committing, an explicit transaction is used. (Errors cause an exception, so since that is not handled here, it would cause an automatic rollback.)

The document is added using a chained statement. When you build an X DevAPI statement, you can choose between calling the method one by one or chaining them together as it is done in this case. Or you can choose a combination with some parts chained and some not. When the documents for Kate and Jane are added, it will be done without chaining.

The statement is submitted to the database using the execute() method. If you are used to executing Python statements in MySQL Shell, you may not be familiar with execute() as MySQL Shell allows you to skip it for interactive statements where the result is not assigned to a variable. The result is stored in the result variable which will be used to examine whether any warnings were triggered by the statement.

Tip: In MySQL Connector/Python, you must always call execute() to execute an X DevAPI statement.

It is best practice to verify whether queries cause any warnings. A warning will still allow the statement to execute, but it is in general a sign that not everything is as it should be. So, take warnings seriously. The earlier you include tests for warnings, the easier it is to handle them.

In line 99, the get_warnings_count() method of the result object is used to check if any warnings occurred. If so, the number of warnings is printed and each warning if retrieved using the get_warnings() method. A warning is a dictionary with three elements:

  • level: 1 for note and 2 for warning. This is what the warning_levels variable was created for at the start of the example.
  • code: The MySQL error number. The mysqlx.errorcode module contains string symbols for all the error numbers. This can be useful in order to check whether it is an expected error number that can be ignored.
  • msg: A string message describing the problem.

In this case, if any warnings occur, the transaction is rolled back, and the script exists.

Tip: Include handling of warnings from the beginning of coding your program. Handling warnings from the get go makes it much easier to handle them. They are usually a sign of something not working as expected and it is important that you know exactly why the warnings occur. All warnings include an error code that you can check against to verify whether it is an expected warning. If some warning is expected and you are confident, it is acceptable to ignore it.

If no error occurs, some information from the result is printed. An example output looks like (the ID will be different):

As expected one document has been added. The number of documents is printed using the get_affected_items_count() method. More interesting is the document ID. As the document did not include an element named _id, MySQL added one automatically and assigned a value to it. I will not go into how the ID is generated here, but just note that it includes three parts that together ensure global uniqueness even if you use multiple clients against multiple MySQL Server instances. At the same time, the IDs are still being generated in a way that is optimal for the InnoDB storage engine that is used for the underlying storage of the documents. The IDs are returned as a list; in this case there is only one element in the list, but if more than one document is inserted without an _id value, then there will be one generated ID per document.

The final step is to commit the transaction, so the document is persisted in the collection.

Adding Multiple Documents

When you want to add multiple documents using a single CRUD statement, you can essentially do it in two ways. You can add all of the documents in one go in the initial add() call similar to what was done for a single document with Adam. This can for example be done by having the documents in a tuple or list.

The other way, which will be used here, is to repeatably call add() to add the documents. Let’s see how that works:

To keep the example from getting too long, the check for warnings have been removed, and the example will just focus on adding the documents.

After the transaction has been started, the statement object is created by calling add() on the collection object. In this case, no arguments are given, so at that point in time, the statement will not insert any documents.

Then the two documents are added one by one by calling add() on the statement object, first with the kate document, then with the jane document. An advantage of this approach is that if you for example generate the documents inside a loop, then you can add them as they are ready.

When both documents have been added, the execute() method is called to submit the documents to the database and the transaction is committed. Again, some information from the result is printed (the IDs will be different):

So, two documents are inserted (again as expected) and two IDs are generated.

The way that the add statement was used to insert the two documents is an example of the opposite of chaining. Here, one action at a time is performed and the result is stored in the stmt_add variable.

Now that there are some documents to work with, it is time to query them.

CRUD – Read

When you want to query documents in a collation, you use the find() method of the collection object. The resulting find statement support all of the usual refinements such as filtering, sorting, grouping, etc. In this example, three queries will be executed. The first will find the total number of documents in the collection. The second, will find the persons born on 9 August 1982. The third will find the persons who has hiking as a hobby.

Total Number of Documents

The X DevAPI makes it easy to determine the number of documents in the document – the count() method of the collection will return the value as an integer. In practice the count() method goes through the same steps as you will see in the two subsequent queries, but they are hidden inside the implementation. The code snippet is:

It cannot get much easier than that. The output is:

Let’s move on and see some of the steps that were hidden in the first query.

Finding Documents Based on Simple Comparison

The persons (in this case just one person) born on 9 August 1982 can be found by creating a find statement and adding a simple filter. The example code is:

The filter clause is added in the call to find(). The syntax :birthday means that a parameter is used and the value will be added later. That has two advantages: it makes it easier to reuse the statement, and importantly it makes the statement safer as MySQL will ensure the value is escaped correctly – this is similar to the mysql_real_escape_string() function in the MySQL C API. The value of the parameter is given using the bind() method that has two arguments: the parameter name and value. If you use multiple parameters, call bind() once for each of them.

Otherwise the statement is simple to use. The filtering condition may seem too simple given it is a JSON document it applies to. However, Birthday in the condition is interpreted as $.Birthday (the $. part is optional) – that is the object named Birthday and is a child of the root of the document, which is just what is needed in this case. The next example includes a more complicated filter condition.

The fields to include are specified in a similar manner to the filter condition. You specify the path to the element you want to include. You can optionally rename the element using the AS keyword, for example: Surname AS Last_name. As for the condition, the $. part is optional.

The resulting row is retrieved using the fetch_one() method on the result object. This is fine here as we know there is only one resulting row. However, in a more general case you should use fetch_one() is a loop and continue until it returns None at which point all rows have been fetched.

The output is:

Querying with Condition on Element in Array

A more complicated find statement is to look into the Hobbies array and see if any of the elements is Hiking. This query also matches two of the persons in the collection, so a loop is required to handle them. The code is:

There are two main differences between this example and the previous: the filter condition and how the result documents are handled.

The filter uses the JSON_CONTAINS() function to check whether the $.Hobbies elements contains the value specified by the :hobby parameter. In the call to bind(), the parameter value is set to "Hiking". Note that Hiking must be quoted with double quotes as it is a JSON string. In this case, $. is included in the document path. However, it is still optional.

After executing the query, the resulting documents are fetched using the fetch_all() method. This will return all of the documents as a list. This makes it simpler to loop over the resulting rows, however be aware that for large result sets, it can cause a high memory usage on the application server.

Warning: Be careful with the fetch_all() method if the query can return a large result set. It will require the remaining part of the result to be stored in-memory on the application-side.

One advantage of the fetch_all() method is that it will allow you to get the total number of documents in the result using the count property of the result. The count property will show 0 until fetch_all() have completed. Once the documents have been fetched, it is possible to print the names of the persons who like to hike. The output is:

Other than a bit of cleanup, there is nothing more to do.

Cleanup

The final part of the example is to clean up. The my_collections schema is dropped so the database is left in the same state as at the start, and the connection is closed:

Dropping a schema is done in the same way as creating it, just that the drop_schema() method is used instead. The drop_schema() method will also work if the schema does not exist. In that case it is a null-operation.

It is important always to close the database connection. Have you ever seen the MySQL Server error log full of notes about aborted connections? If you do not explicitly close the database connection when you are done with it, one of those notes will be generated (provided the server is configured with error_log_verbosity = 3).

Additionally, not closing the connection will keep the connection alive until the program terminates. That is not a problem here, but in other cases, it may take a long time before the application shuts down. In the meantime, the connection count is higher than it needs to be, and if you happen to have an ongoing transaction (can very easily happen with autocommit = OFF), the connection may cause lock issues or slowness for the other connections.

Tip: Always close the database connection when you are done with it.

Want to Learn More?

I hope this has triggered your curiosity and you are ready to dive deeper into the world of MySQL Connector/Python, the X DevAPI, and the MySQL Document Store. If so, there are two recently released books that you may find useful.

Disclaimer: I am the author of one of these books.

One book is MySQL Connector/Python Revealed (Apress) written by me. It goes through MySQL Connector/Python both for the legacy PEP249 API (mainly the mysql.connector module) and the new X DevAPI (the mysqlx module). There are three chapters dedicated to the X DevAPI.

The other book is Introducing the MySQL 8 Document Store (Apress) written by Dr. Charles Bell (MySQL developer). This book goes through how JSON works in MySQL including information about the X DevAPI and its siblings the X Protocol and the X Plugin.

Both books are more than 500 pages and comes with code examples that will help bring you up to speed with MySQL Connector/Python and the MySQL Document Store.

Awesome MySQL Shell Prompt

A month ago, I wrote a blog on how you can configure the MySQL Shell prompt to suit your needs.  One thing I did not go into details with are the prompt templates prompt_256pl.json and prompt_256pl+aw.json. Common for both of these templates is that they require custom fonts to work.

In the two file names, pl stands for the PowerLine fonts and aw for the Awesome font. These fonts add symbols that are useful to create rich prompts, not only for MySQL Shell, but also for Bash, zsh, vim, etc. The symbols include an angle separator, a lock (that in MySQL Shell is used to symbolize an TLS/SSL connection), and so on.

The MySQL Shell Using Both the Powerline Fonts and the Awesome Terminal Font
If you want a MySQL Shell prompt like this, then read on.

This blog will show you how to install the necessary fonts to get an awesome MySQL Shell prompt on Oracle Linux 7, Red Hat Enterprise Linux (RHEL) 7, and CentOS 7. For other Linux distributions, the steps are the same or similar.

Specifying the Prompt Template

Before diving into installing the necessary fonts, let’s recall from the previous blog, how you choose how the prompt should look. Two ways to specify the prompt are:

  • Set the MYSQLSH_PROMPT_THEME environment variable to the path to the file with the required template. For the prompt_256pl+aw.json template using the el7 MySQL Shell RPM, the full path is /usr/share/mysqlsh/prompt/prompt_256pl+aw.json. This method is useful while testing.
  • Copy the prompt_256pl+aw.json file to ~/.mysqlsh/prompt.json. This way MySQL Shell will automatically pick up the new prompt definition. This is very useful when you have decided on your preferred prompt.

Awesome Terminal Fonts

If you want to use the prompt_256pl+aw.json template, then you will need to install the awesome font. One option is to use the patching-strategy branch of gabrielelana’s awesome-terminal-fonts project on GitHub. The steps to install the required font for the user are:

This installs just the font required for MySQL Shell. You can however choose to install all of the fonts if you like.

It is necessary to restart the desktop environment for the fonts to be available. The resulting prompt (when logged in a default schema has been set) looks like:

The MySQL Prompt with the Awesome Terminal Font Installed
The MySQL Prompt with the Awesome Terminal Font Installed

Personally I am not entirely keen on how the separator looks, so I install another font to replace the separator.

Powerline Font

The Powerline fonts serve a similar purpose as the awesome font, but they are missing some of the symbols used by the prompt_256pl+aw.json template. It does however include the separator used by the template, and I personally prefer it over the Awesome Terminal Font separator. If you choose the prompt_256pl.json template, you can just install the Powerline Fonts and skip the Awesome Terminal Font.

The installation instructions in the manual are quite good. However, they assume you want to use the fonts for more than just MySQL Shell. For MySQL Shell the following steps are enough:

After restarting the desktop environment, the prompt looks like this:

The MySQL Shell Using Both the Powerline Fonts and the Awesome Terminal Font
The MySQL Shell Using Both the Powerline Fonts and the Awesome Terminal Font

Fantasque Awesome Powerline

A third option is the Fantasque Awesome Powerline fonts which as the name suggest include all the fonts needed for the prompt_256pl+aw.json template.

Thanks to Lefred for this suggestion.

The steps to install the Fantasque Awesome Powerline fonts are familiar by now. The fonts can be downloaded using the ztomer’s GitHub repository and installed using the fc-cache command:

Then restart and the font is ready. The resulting prompt is very similar to the previous and is arguable simpler to install as it is just one source:

MySQL Shell Prompt with the Fantasque Powerline Awesome Fonts
MySQL Shell Prompt with the Fantasque Powerline Awesome Fonts

You can of course also choose to play with the other symbols in the Powerline and Awesome Terminal fonts and make your own custom MySQL Shell prompt configuration file. Whichever is your preferred prompt, have fun playing with it.

MySQL Connector/Python on iOS Using Pythonista 3

One of the nice things about MySQL Connector/Python is that it is available in a pure Python implementation. This makes it very portable. Today I have been exploring the possibility to take advantage of that to make MySQL Connector/Python available on my iPad.

There are few Python interpreters available for iPad. The one I will be discussing today is Pythonista 3 which has support for both Python 2.7 and 3.6. One of the things that caught my interest is that it comes with libraries to work with iOS such as accessing the contact and photos as well as UI tools. This is a commercial program (AUD 15), but this far looks to be worth the money. There are other options and I hope to write about those another day.

MySQL Connector/Python is available from PyPi. This makes it easy to install the modules in a uniform way across platforms. Unfortunately, Pythonista 3 does not support the pip command out of the box. Instead there is a community contribution called StaSh that can be used to get a shell-like environment that can be used to execute pip. So, our first step is to install StaSh.

Coding with MySQL Connector/Python on an iPad
Coding with MySQL Connector/Python on an iPad

Install StaSh

StaSh is a project maintained by ywangd and can be found on GitHub. It is self-installing by executing a small Python program that can be found in the README.md file on the project repository page. You copy the source code into a new file in Pythonista 3 and execute it (using the “Play” button):

StaSh is installed by executing a downloaded script.
StaSh is installed by executing a downloaded script.

Then you need to restart Pythonista 3. At this point StaSh is installed.

Installing PyPi Package Using StaSh pip

In order to be able to use the pip command through StaSh, you need to launch the launch_stash.py program which was installed in the previous step. The program can be found in the This iPad folder:

Open the launch_stash.py Script in This iPad
Open the launch_stash.py Script in This iPad

Open the program and use the “Play” button again to execute it. This creates the shell. You can do other things than use the pip command, but for the purpose of installing MySQL Connector/Python that is all that is required. The command is:

The console output is:

Using the pip command in StaSh to install MySQL Connector/Python.
Using the pip command in StaSh to install MySQL Connector/Python.

That’s it. Now you can use the MySQL Connector/Python modules on your iPad just as in any other environment.

Example

To verify it is working, let’s create a simple example. For this to work, you need MySQL installed allowing external access which likely requires enabling access to port 3306 in your firewall.

A simple example that queries the city table in the world database for the city with ID = 130 is:

Edit the connect_args dictionary with the connection arguments required for your MySQL instance.

Warning: The connection arguments are included inside the source code here to keep the example simple. Do not do this in real programs. It is unsafe to store the password in the source code and it makes the program hard to maintain.

When you run it, the details of Sydney, Australia is printed to the console:

Example of querying the world.city table using MySQL Connector/Python in Pythonista 3.
Example of querying the world.city table using MySQL Connector/Python in Pythonista 3.

This all looks great, but what about the X DevAPI? Unfortunately, there are some problems there.

X DevAPI

The X DevAPI is new in MySQL 8.0. It is included in MySQL Connector/Python in the mysqlx module. However, it does not work out of the box in Pythonista 3. When you try to execute the import mysqlx command, it fails:

So, the Protobuf module that comes with Pythonista 3 uses the old comma syntax for exception handling. This is not allowed with Python 3 (PEP 3110).

Update: While I had tried to install a newer version of Protobuf using the StaSh pip command, what I had not realised originally is that for the change to take effect, you need to restart Pythonista 3. Once that is done, the mysqlx module works as well. To install Protobuf in StaSh, launch StaSh in the same way as when MySQL Connector/Python was installed above and execute the pip command:

 

MySQL Shell: Built-In Help

It can be hard to recall all the details of how a program and API work. The usual way to handle that is to look at the manual or a book. Another – and in my opinion – nice way is to have built-in help, so you can find the information without changing between the program and browser. This blog discuss how to obtain help when you use MySQL Shell.

MySQL Shell is a client that allows you to execute queries and manage MySQL through SQL commands and JavaScript and Python code. It is a second generation command-line client with additional WebOps support. If you have not installed MySQL Shell yet, then you can download it from MySQL’s community downloads, Patches & Updates in My Oracle Support (MOS) (for customers), or Oracle Software Delivery Cloud (for trial downloads). You can also install it through MySQL Installer for Microsoft Windows.
MySQL Shell: Get help for a table object
MySQL Shell: Get help for a table object

MySQL Shell has a very nice and comprehensive built-in help. There is of course the help output produced using the --help option if you invoke the shell from the command line:

However, this help is not what makes MySQL Shell special. It is the help that you can see from within the shell when working in JavaScript or Python that is the worth some extra attention. There is both support for general help and obtaining help through objects.

General Help

The first layer of help is what is also known from the old mysql command-line client. A command existing of a backslash and a ?, h, or help (\?, \h or \help) will show information about the general usage of MySQL Shell:

This shows which commands and global objects are available. But there is more: you can also get help about the usage of MySQL Shell such as how to use the Admin API (for MySQL InnoDB Cluster), how to connect, or the SQL syntax. The search for relevant help topics are context sensitive, for example searching for the word select return different results depending on the mode and whether you are connected:

  • In Python or JavaScript mode without a connection, it is noted that information was found in the mysqlx.Table.select and mysqlx.TableSelect.select categories.
  • In Python or JavaScript mode with a connection, the SELECT SQL statement is included as a category.
  • In SQL mode the actual help text for the SELECT SQL statement is returned (requires a connection).

For example, to get help about the select method of a table object:

To get help for the SELECT SQL statement:

Note here how it is possible to get the help for the SELECT statement both from the Python (and JavaScript) as well as SQL modes, but the search term is different.

Tip: To get information about SQL statements, you must be connected to a MySQL instance.

When you use the JavaScript or Python modes there is another way to get  help based on your object. Let’s look at that.

Object Based Help

If you are coding in MySQL Shell using JavaScript or Python it may happen you need a hint how to use a given object, for example a table object. You can use the method described in the previous section to get help by searching for mysqlx.Table, however, you can also access the help directly from the object.

All of the X DevAPI objects in MySQL Shell has a help() method that you can invoke to have help returned for the object. For example, if you have an object named city for the city table in the world schema, then calling city.help() returns information about table object:

As you can see, the built-in help in MySQL Shell is a powerful resource. Make sure you use it.

MySQL NDB Cluster: Never Install a Management Node on the Same Host as a Data Node

In MySQL NDB Cluster, the management node (ndb_mgmd) is a lightweight process that among other things handles the configuration of the cluster. Since it is lightweight. It can be tempting to install it with one of the other nodes. However, if you want a high-availability setup, you should never install it on the same host as a data node (ndbd or ndbmtd). If you do that, it can cause a total cluster outage where the cluster could otherwise have survived.

The first sign of trouble occurs when you start the management nodes. The following warning is printed to standard output:

To understand why this setup can cause a complete cluster shutdown, it is necessary first to review how a node failure is handled in MySQL NDB Cluster.

Node Failure Handling

When a data node fails, the cluster automatically evaluates whether it is safe to continue. A node failure can in this respect be either that a data node crashes or there is a network outage meaning one or more nodes cannot be seen by the other nodes.

A clean node shutdown (such as when using the recommended STOP command in the management client) is not subject to the evaluation as the other nodes will be notified of the shutdown.

So, how does MySQL NDB Cluster decide whether, the cluster can continue after a node failure or not? And if there are two halves, which one gets to continue?
Assuming two replicas (NoOfReplicas = 2), the rules are quite simple. To quote Pro MySQL NDB Cluster:

In short, data nodes are allowed to continue if the following conditions are true:

  • The group of data nodes holds all the data.
  • Either more than half the data nodes are in the group or the group has won the arbitration process.

For the group of data nodes to hold all the data, there must be one data node from each node group available. A node group is a group of NoOfReplicas nodes that share data. The arbitration process refers to the process of contacting the arbitrator (typically a management node) – the first half to make contact will remain online.

This is all a bit abstract, so let’s take a look at a couple of examples.

Examples

Consider a cluster with two data nodes and two management nodes. Most of the examples will have a management node installed on each of the hosts with the data nodes. The last example will as contrast have the management nodes on separate hosts.

The starting point is thus a cluster using two hosts each with one data node and one management node as shown in this figure:

A healthy cluster with two data nodes and the management nodes installed on the same hosts as the data nodes.
A healthy cluster with two data nodes and the management nodes installed on the same hosts as the data nodes.

The green colour represents that the data node is online. The blue colour for the management node with Node Id 49 is the arbitrator, and the yellow management node is on standby.

This is where the problem with the setup starts to show up. The arbitrator is the node that is involved when there is exactly half the data nodes available after a node failure. In that case, the data node(s) will have to contact the arbitrator to confirm whether it is OK to continue. This avoids a split-brain scenario where there are two halves with all the data; in that case it is imperative that one half is shut down or the data can start to diverge. The half that is first to contact the arbitrator survives, the other is killed (STONITH – shoot the other node in the head).

So, let’s look at a potential split-brain scenario.

Split-Brain Scenario

A split-brain scenario can occur when the network between the two halves of the cluster is lost as shown in the next figure:

NDB Cluster with network failure but the cluster survives.
NDB Cluster with network failure but the cluster survives.

In this case the network connection between the two hosts is lost. Since both nodes have all data, it is necessary with arbitration to decide who can continue. The data node with Id 1 can still contact the arbitrator as it is on the same host, but Node Id 2 cannot (it would need to use the network that is down). So, Node Id 1 gets to continue whereas Node Id 2 is shut down.

So far so good. This is what is expected. A single point of failure does not lead to a cluster outage. However, what happens if we instead of a network failure considers a complete host failure?

Host Failure

Consider now a case where there is a hardware failure on Host A or someone by accident pulls the power. This causes the whole host to shut down taking both the data and management node with it. What happens in this case?

The first thought is that it will not be an issue. Node Id 2 has all the data, so surely it will continue, right? No, that is not so. The result is a total cluster outage as shown in the following figure:

The failure of the host with the arbitrator causes complete cluster outage.
The failure of the host with the arbitrator causes complete cluster outage.

Why does this happen? When Host A crashes, so does the arbitrator management node. Since Node Id 2 does not on its own constitute a majority of the data nodes, it must contact the arbitrator to be allowed to remain online.

You may think it can use the management node with Node Id 50 as the arbitrator, but that will not happen: while handling a node failure, under no circumstances can the arbitrator be changed. The nodes on Host B cannot know whether it cannot see the nodes on Host A because of a network failure (as in the previous example) or because the nodes are dead. So, they have to assume the other nodes are still alive or there would sooner or later be a split-brain cluster with both halves online.

Important: The arbitrator will never change while the cluster handles a node failure.

So, the data node with Id 2 has no other option than to shut itself down, and there is a total cluster outage. A single point of failure has caused a total failure. That is not the idea of a high availability cluster.

What could have been done to prevent the cluster outage? Let’s reconsider the case where the arbitrator is on a third independent host.

Arbitrator on Independent Host

The picture changes completely, if the management nodes are installed on Hosts C and D instead of Hosts A and B. For simplicity the management node with Node Id 50 is left out as it is anyway just a spectator while handling the node failure. In this case the scenario is:

The failure of the host with the arbitrator on a third host ensures the cluster remains online.
The failure of the host with the arbitrator on a third host ensures the cluster remains online.

Here Node Id 2 can still contact the arbitrator. Node Id 1 is dead, so it will not compete to win the arbitration, and the end result becomes that Node Id 2 remains online. The situation is back where a single point of failure does not bring down the whole cluster.

Conclusion

If you want your cluster to have the best chance of survival if there is a problem with one of the hosts, never install the management nodes on the same hosts as where there are data nodes. One of the management nodes will also act as the arbitrator. Since the arbitrator cannot change while the cluster is handling a node failure, if the host with the arbitrator crashes, it will cause a complete cluster shutdown if arbitration is required.

When you consider what is a host, you should look at physical hosts. Installing the management node in a different virtual machine on the same physical host offers only little extra protection compared to the case where they are installed in the same virtual host or on the same host using bare metal.

So, to conclude: make sure your management nodes are on a completely different physical host compared to your data nodes.

Want to Know More?

The book Pro MySQL NDB Cluster (published by Apress) by Mikiya Okuno and me includes lots of information about designing your cluster and how MySQL NDB Cluster works.

Disclaimer: I am one of the authors of Pro MySQL NDB Cluster.