MySQL Shell: Using External Python Modules

MySQL Shell is a great tool for working with MySQL. One of the features that make it stand out compared to the traditional mysql command-line client is the support for JavaScript and Python in addition to SQL statements. This allows you to write code you otherwise would have had to write outside the client. I showed a simple example of this in my post about the instant ALTER TABLE feature in MySQL 8.0.12 where a Python loop was used to populate a table with 1 million rows This blog will look further into the use of Python and more specifically external modules.

Using a customer table_tools module in MySQL Shell.
Using a customer table_tools module in MySQL Shell.

Using Standard Modules

Aforementioned loop that was used to populate a test table also showed another feature of MySQL Shell: You can use the standard Python modules just as you would do in any other Python script. For example, if you need to create UUIDs you can use the uuid module:

This on its own is great, but what about your own modules? Sure, that is supported as well. Before showing how you can access your own modules, let’s create a simple module to use as an example.

Example Module

For the purpose of this blog, the following code should be saved in the file table_tools.py. You can save it in whatever directory you keep your Python libraries. The code is:

The describe function takes a Table object from which it works backwards to get the session object. It then queries the information_schema.COLUMNS view to get the same information about the table as the DESC SQL command. Both the table and schema name can be found through the table object. Finally, the information is printed.

The example is overly simplified for general usage as it does not change the width of the output based on the length of the data, and there is no error handling whatsoever. However, this is on purpose to focus on the usage of the code from within MySQL Shell rather than on the code.

Note: The same code works in a MySQL Connector/Python script except that the rows are returned as mysqlx.result.Row objects. So, the loop printing the rows look a little different:

With the function ready, it is time to look at how you can import it into MySQL Shell.

Importing Modules Into MySQL Shell

In order to be able to import a module into MySQL Shell, it must be in the path searched by Python. If you have saved table_tools.py into a location already searched, then that is it. However, a likely more common scenario is that you have saved the file in a custom location. In that case, you need to tell Python where to look for the files.

You modify the search path in MySQL Shell just as you would in a regular Python program. If you for example have saved the file to D:\MySQL\Shell\Python, then you can add that to the path using the following code:

If this is something you need as a one off, then it is fine just to modify the path directly in MySQL Shell. However, if you are working on some utilities that you want to reuse, it becomes tedious. MySQL Shell has support for configuration files where commands can be executed. The one for Python is named mysqlshrc.py (and mysqlshrc.js for JavaScript).

MySQL Shell searches for the mysqlshrc.py file in four locations including global locations as well as user specific locations. You can see the full list and the search order in the MySQL Shell User Guide. The user specific file is %APPDATA%\MySQL\mysqlsh\mysqlshrc.py on Microsoft Windows and $HOME/.mysqlsh/mysqlshrc.py on Linux and macOS.

You can do more than just changing the search path in the mysqlshrc.py file. However, for this example nothing else is needed.

Using the Module

Now that MySQL Shell has been set up to search in the path where your module is saved, you can use it in MySQL Shell. For example to get the description of the world.city table, you can use the following commands:

The \use world command sets the default schema to the world database. As a side effect, it also makes the tables in the world database available as properties of the db object. So, it possible to pass an object for the world.city table as db.city to table_tools.describe() function.

That is it. Now it is your turn to explore the possibilities that have been opened with MySQL Shell.

MySQL 8.0: Persisted Variables

MySQL 8.0 introduced a new feature that allows you to persist configuration changes from inside MySQL. Previously you could execute SET GLOBAL to change the configuration at runtime, but you needed to update your MySQL configuration file in order to persist the change. In MySQL 8.0 you can skip the second step. This blog discuss how this works and how to backup and restore the configuration.

Using SET PERSIST to set a variable and the persisted_variables table in the Performance Schema to get a list of persisted variables.
Using SET PERSIST to set a variable and the persisted_variables table in the Performance Schema to get a list of persisted variables.

Persisting Variables

You persist changes with either the SET PERSIST or SET PERSIST_ONLY statement. The different is that SET PERSIST_ONLY only updates the configuration whereas SET PERSIST essentially combines SET GLOBAL and SET PERSIST_ONLY.

Note: Some variables such as innodb_buffer_pool_instances can only use PERSIST_ONLY, i.e. it requires a restart to make the changes take effect. Still others, such as datadir can currently not be persisted.

mysqld-auto.cnf and variables_info

The persisted variables are stored in the file mysqld-auto.cnf located in the data directory using the JSON format. It includes more information than just the persisted value. It also includes information such as who made the change and when. An example file is:

Since it is JSON, it is easy to reformat to make easier to read, for example:

This information is also available from the performance_schema.variables_info table:

Notice that the source for join_buffer_size is DYNAMIC whereas the two other variables have the source set to PERSISTED. Why? After all they all three existed in the mysqld-auto.cnf file. DYNAMIC means that the variable was changes since the last restart either using SET GLOBAL or SET PERSIST. Another thing to be aware of is that variables changed with SET PERSIST_ONLY will not show up in variables_info until after the next restart. I will soon get back to show a way to get the variables that have been persisted in one way or another.

Backup and Restore

As a simple way to back up the configuration is simply copy the mysqld-auto.cnf file to a safe location. Similarly, you can restore the configuration by copying it back.

However, what if you want most of the configuration but not everything or you want to edit some of the values? In that case you need another way of exporting the configuration as you should not manually edit mysqld-auto.cnf.

Warning: Do not edit the mysqld-auto.cnf file manually. It should only be changed with SET PERSIST and SET PERSIST_ONLY. If there are any errors in the file, MySQL will refuse to start.

Fortunately as it turns out, it is easy to export all persisted variables. The table performance_schema.persisted_variables includes all variables that has either been read from mysqld-auto.cnf or has been changed with SET PERSIST or SET PERSIST_ONLY since the last restart. The table include the persisted values. For example:

This can be used to create SET statements that can be used to recreate the configuration on another instance. For example:

Using the mysql command-line client, you can avoid the column names and table format by using the --skip-column-names and --batch options:

Note: On Microsoft Windows ensure everything is on one line and the backslashes are removed.

Now the file config.sql contains an export of the persisted variables:

This example creates SET PERSIST_ONLY statement as those will work with all persistable variables. When you replay the SET statements, it will require a restart of MySQL for the changes to take effect. If you want to use SET PERSIST where possible, then you need to take into consideration whether the variable support SET PERSIST.  A list of variables that require SET PERSIST_ONLY are included at the end.

The configuration can now be restored as:

PERSIST_ONLY Variables

As promised, I will conclude with a list of persistable variables that only supports SET PERSIST_ONLY. As of MySQL 8.0.12 without any plugins installed, the variables are:

 

New Book: MySQL Connector/Python Revealed

When you write programs that uses a database backend, it is necessary to use a connector/API to submit the queries and retrieve the result. If you are writing Python programs that used MySQL, you can use MySQL Connector/Python – the connector developered by Oracle Corporation.

Now there is a new book dedicated to the usage of the connector: MySQL Connector/Python Revealed, which is published by Apress. It is available in a softcover edition as well as an eBook (PDF, ePub, Mobi).

MySQL Connector/Python Revealed

The book is divided into four parts spanning from the installation to error handling and troubleshooting. The four parts are:

  • Part I: Getting Ready
    This part consists of a single chapter that helps you to get up and running. The chapter includes an introduction to MySQL Connector/Python and getting the connector and MySQL Server installed.
  • Part II: The Legacy APIs
    The legacy APIs include the connector module that implements PEP249 (the Python Database API). The discussion of the mysql.connector module spans four chapters. In addition to query execution, the use of connection pools and the failover feature is covered. Finally, there is also a discussion about the C Extension.
  • Part III – The X DevAPI
    One of the big new features in MySQL 8 is the MySQL Document Store including the X DevAPI. It allows you to use MySQL through the NoSQL API as well as by executing SQL queries. The NoSQL API includes support both for working with MySQL as a document store where the data is stored in JSON documents and with SQL tables. Part III includes three chapters that are dedicated to the X DevAPI.
  • Part IV – Error Handling and Troubleshooting
    The final part of book goes through the two important topics of error handling and troubleshooting including several examples of how common errors and how to resolve them.

With the book comes 66 code examples that are available for download from Apress’ GitHub repository. See the book’s homepage for instructions.

MySQL Connector/Python is available from several sources including online bookshops. The following table shows some of the places, where you can buy the book. (The table if current as of 13 August 2018; changes to the available formats may happen in the future.)

ShopSoftcovereBook
ApressYesPDF and ePub (both DRM free)
AmazonYesMobi (Kindle)
Barnes & NoblesYes
SaxoYes

InnoDB Progress Information

MySQL has since version 5.7 had support for progress information for some queries. As promised in my previous post, I will here discuss how you can use that to get progress information for ALTER TABLE on InnoDB tables.

Background and Setup

Progress information is implemented through the Performance Schema using the stage events. In version 8.0.12 there are currently seven stages that can provide this information for  ALTER TABLE statements on InnoDB tables. In MySQL 8, it is easy to list the stages capable of reporting progress information by using the setup_instruments Performance Schema table:

This also shows how the setup_instruments table in MySQL 8 has some additional information about the instruments such as properties and documentation (not included in the output). Adding this information is still work in progress.

MySQL 5.7 does not provide as easy a way to obtain the instruments providing progress information. Instead you need to consult the reference manual. However, the principle in using the feature is the same.

As you can see, all of the instruments are enabled and timed by default. What is not enabled by default, however, is the consumer that can make the information available:

Since the consumers form a hierarchical system, the sys schema function ps_is_consumer_enabled() is used to show whether the consumer is enabled taking the whole hierarchy into consideration.

In order to use the progress information, you need to enable the events_stages_current consumer. This is the consumer that is responsible for keeping the performance_schema.events_stages_current table up to date, i.e. record the current (or latest if there is no current stage) for each thread. With the default Performance Schema settings, the rest of the hierarchy is enabled. To enable event_stages_current and verify it will be consuming instruments, you can use the following queries:

That is it. Now you can monitor the progress of the queries that uses the stages with progress information.

Note: The more parts of the Performance Schema that is enabled and the more fine grained monitoring, the more overhead. Stages are not the worst with respect to overhead; nevertheless it is recommended you keep an eye on the affect of enabling the events_stages_current consumer.

Monitoring Progress

The base for monitoring the progress information is the performance_schema.events_stages_current table. There are two columns of interest for this discussion:

  • WORK_COMPLETED: The amount of work that is reported to have been completed.
  • WORK_ESTIMATED: The estimated amount of work that needs to be done.

For InnoDB ALTER TABLE the estimated amount of work is for the entire operation. That said, the estimate may be revised during the process, so it may happen that the if you calculate the percentage it decreases as time goes. However, in general the percentage (100% * WORK_COMPLETED/WORK_ESTIMATED) will increase steadily until the operation completes at 100%.

To learn more about how the progress information works, the following pages in the manual are recommended:

For now, let’s look at an example.

Example

For the example, the salaries table in the employees sample database will be used. The table is sufficiently large that it will be possible to query the progress while adding a column using the INPLACE algorithm. As discussed in MySQL 8.0.12: Instant ALTER TABLE, it is possible to add a column instantly, but for the purpose of this example, the INPLACE algorithm illustrates the progress information feature better. The query that will be executed is:

The performance_schema.events_stages_current table can be joined with the performance_schema.events_statements_current to show the query and progress. For example:

There is another way though. Instead of using the performance_schema.events_stages_current table directly, an easier way is to use the sys.session view. This is an advanced process list that includes much more information than the usual SHOW PROCESSLIST statement including progress information. The performance of sys.session has been improved with more than an order of magnitude in MySQL 8 by the addition of indexes to the Performance Schema tables making it highly useful.

Querying the sys.session view for sessions showing progress information while the ALTER TABLE is in progress returns an output similar to the following example:

In the example, the @sys.statement_truncate_len user variable is set to 85. By default the sys schema ensures the current statement is at most 64 characters long. In order to avoid truncation in this case, the truncate length is increased to 85.

The example output shows that the progress is at around 49%. It is important to note that is an estimate and not an exact number. The ALTER TABLE is performing a merge sort at the time, and the query has been running for 4.22 seconds.

A couple of other interesting columns are included. It can be seen the connection is using 464KiB at the time. In MySQL 8 memory instrumentation is enabled by default (in MySQL 5.7 you need to enable it yourself). Additionally, the name of the program executing the query is MySQLWorkbench, that is the query originates from MySQL Workbench.

So, next time you plan a large ALTER TABLE operation, consider enabling the events_stages_current consumer, so you can follow the progress.

Monitoring NDBCluster Copying Alter Progress

MySQL NDB Cluster has great support for online (inplace) schema changes, but it is still sometimes necessary to perform an offline (copying) ALTER TABLE. These are relatively expensive to make as the entire table is copied into a new table which eventually replace the old table.

One example where a copying ALTER TABLE is required is when upgrading from MySQL NDB Cluster 7.2 or earlier to MySQL NDB Cluster 7.3 or later. The format used for temporal columns changed between these version (corresponding to MySQL Server 5.5 to 5.6). In order to take advantage of the new temporal format, a table rebuild is required.

Note: Support for the old temporal format has been removed in MySQL 8.0. So, you must upgrade your tables before an upgrade is possible. There is at the time of writing no MySQL NDB Cluster releases based on MySQL Server 8.0.
Schematic representation of a copying ALTER TABLE
Schematic representation of a copying ALTER TABLE

For long running operations, it can be useful to monitor the progress. There is no built-in way to do this like there is for InnoDB in MySQL 5.7 and later (see the blog InnoDB Progress Information), however the ndbinfo schema can give some information about the progress.

The ndbinfo schema is a virtual schema with views that show information from the data nodes. You can argue it is MySQL NDB Cluster’s answer to the Performance Schema. The ndbinfo schema was introduced in MySQL NDB Cluster 7.1 more than eight years ago and has steadily seen more and more information becoming available.

One of these changes arrived in MySQL NDB Cluster 7.4 where the memory_per_fragment view was added. This view shows detailed information about the memory used per fragment (in most cases the same as partitions). This can also be used to get an estimate of the progress of a copying ALTER TABLE.

As mentioned, a copying ALTER TABLE is similar to creating a new table with the new schema (which may potential be the same as the old schema), then inserting all of the data from the old table to the new. At the end, the two tables are swapped and the old table dropped.

Note: Remember that a copying ALTER TABLE is an offline operation. Any changes made to the table during the operation may be lost! Make sure the table is read-only while the ALTER TABLE is executing.

The temporary table (that later become the real table) is an NDBCluster table like other user created tables. This means the table will show up in ndbinfo.memory_per_fragment as a normal table, just with a special table name.

Temporary tables are named like #sql-7f4b_4 where the part after the – is generated based on the operating system process ID of the mysqld process and the connection id of the connection executing the ALTER TABLE. The schema name for the temporary table is the same as for the original table. In the example the process ID is 32587 or 7f4b in hexadecimal notation and the connection ID is 4.

As an example consider a rebuild of the db1.t1 table. In this case the fully qualified name (the name used by NDB Cluster instead of the normal table name) is db1/def/t1, i.e. the schema name and table name with /def/ between them. You can choose to create the fully qualified name for the temporary table as described above. An alternative, if you just have one concurrent table rebuild in the schema is to just look for the fully qualified name matching db1/def/#sql-%.

So, you can use the ndbinfo.memory_per_fragment table to see how much memory is allocated per fragment of the temporary table compared to the original table. For example:

The columns with information about the node ID, block instance, and fragment number have been left out. This is why it looks like there are duplicate rows. It is also worth noticing that there are several “child tables” for the indexes and a blob column.

There are three memory columns. The first is for the fixed size column format, the second for the variable width columns format, and the last for hash indexes.

MySQL NDB Cluster supports two storage formats for the columns. The fixed format uses less memory for columns that are fixed width in nature (such as integers), however variable (called DYNAMIC in CREATE TABLE and ALTER TABLE statements) is more flexible. The variable/dynamic column format is also the only one supported when adding a column inplace (online). See also the manual page for CREATE TABLE for more information about the column format.

The hash memory is the memory used by hash indexes (for the primary key and unique indexes).

For the fixed and variable element memory usages there is both allocated and free bytes. Here the free bytes is used as a measure of the amount of fragmentation. A copying ALTER TABLE defragments the table, so it is necessary to the the fragmentation into consideration when estimating the progress. In reality it is more complicated than the query suggest, so the memory values in the query result will not end up matching 100%, however in most cases it should be a reasonable estimate.

You can also choose to aggregate the memory, for example:

This aggregate query also uses the sys schema function format_bytes() to convert the number of bytes into human readable numbers. The sys schema is installed by default for MySQL NDB Cluster 7.5 and later and is available from MySQL’s repository on GitHub for MySQL NDB Cluster 7.3 and 7.4.

This way of estimating the progress of a copying ALTER TABLE is not perfect, but at least it can give an idea of how the operation progresses.

What Does I/O Latencies and Bytes Mean in the Performance and sys Schemas?

The Performance Schema and sys schema are great for investigating what is going on in MySQL including investigating performance issues. In my work in MySQL Support, I have a several times heard questions whether a peak in the InnoDB Data File I/O – Latency graph in MySQL Enterprise Monitor (MEM) or some values from the corresponding tables and view in the Performance Schema and sys schema are cause for concern. This blog will discuss what these observations means and how to use them.
The MEM InnoDB File I/O Graphs showing a peak in latency and bytes at 12:51.

The Tables and Views Involved

This blog will look into three sources in the Performance Schema for I/O latencies, so let’s first take a look at those tables. The three Performance Schema tables are:

  • events_waits_summary_global_by_event_name: with the event name set to wait/io/table/sql/handler or wait/io/file/%. This is the table used for the waits_global_by_latency and wait_classes_global_by_% views in the sys schema.
  • table_io_waits_summary_by_table: this is the table used for the schema_table_statistics% views in the sys schema.
  • file_summary_by_instance: this is the table used for the io_global_by_file_by% views in the sys schema.

These are also the sources used in MySQL Enterprise Monitor for the InnoDB Data File I/O graphs shown above. Let’s take a look at an example of each of the tables.

events_waits_summary_global_by_event_name

The events_waits_summary_global_by_event_name has aggregate data for wait events grouped by the event name. For the purpose of this discussion it is the table and file wait/io events that are of interested. An example of the data returned is:

These three queries show the same data, just obtained and displayed in different ways.

In the result there are two groups of events. The wait/io/table events (the wait/io/table/sql/handler is the only event of this group which is why it can be listed explicitly) and the wait/io/file group.

The table events are for accessing data in tables. It does not matter whether the data is cached in the buffer pool or is accessed on disk. In this table and view, there is no distinguishing between different types of access (read, write, etc.).

The file events are, as the name suggest, for actually accessing files. There is one file event per file type. For example, in he output there are the wait/io/file/innodb/innodb_log_file event for accessing the InnoDB redo log files, the wait/io/file/innodb/innodb_data_file event for accessing the InnoDB data files themselves, the wait/io/file/sql/binlog event for the binary log files, etc.

In the second query, all of the timings are wrapped in the sys.format_time() function. The timings returned by the Performance Schema are in picoseconds (10^-12 second) which are somewhat hard for us humans to read. The sys.format_time() function converts the picoseconds to human readable strings. When you sort, however, make sure you sort by the original non-converted values.

Tip: Use the sys.format_time() function to convert the picoseconds to a human readable value, but do only so for the returned row; sort and filter by the original values. The Performance Schema always returns timings in picoseconds irrespective of the timer used internally for the event.

The sys schema by default returns the timing as human readable values. If you need the values in picoseconds prefix the table name with x$, for example sys.x$waits_global_by_latency. The sys schema includes an ORDER BY clause in most views. For the waits_global_by_latency view, the default ordering is by the total latency, so there is no need to add an ORDER BY clause in this example.

table_io_waits_summary_by_table

The table_io_waits_summary_by_table Performance Schema table and schema_table_statistics% sys schema views are related to the wait/io/table/sql/handler event just discussed. These provide information about the amount of time spent per table. Unlike querying the wait/io/table/sql/handler in the wait_events_% tables, it split the time spent into whether it is used for reads, writes, fetch, insert, update, or delete. The read and write columns are the aggregates for the corresponding read and write operations, respectively. Since fetch is the only read operation, the read and fetch columns will have the same values.

The table and view show the table I/O, i.e. the access to table data irrespective of whether the data is accessed in-memory or on disk. This is similar to the wait/io/table/sql/handler event. An example of the result of querying the table and view for the employees.salaries table is:

In this case it shows that there has been mostly writes – inserts – for the table. The sys schema view effectively joins on the performance_schema.file_summary_by_instance for the read columns, so for the schema_table_statistics view fetch and read are not synonyms.

So, what is it the file_summary_by_instance table shows that is different table the “table I/O” that has been the topic of the first two tables? Let’s see.

file_summary_by_instance

Unlike the two previous tables, the file_summary_by_instance shows how much time is spent on actual file I/O and how much data is accessed. This makes the file_summary_by_instance table and the corresponding sys schema views very useful for determining where time is spent doing disk I/O and which files have the most data accesses on disk.

An example of using the Performance Schema and two of the sys schema views is:

This example is from Microsoft Windows, and as always when backslashes are in play, it is fun to try to determine the appropriate number of backslashes to use. When specifying the file name with LIKE, you need four backslashes per backslash in the file name; when using = you need two backslashes.

Again, the values are split into reads and writes (though not as detailed as before with fetch, insert, update, and delete – that is not known at the level where the file I/O happens). The miscellaneous values include everything that is not considered reads or write; this includes opening and closing the file.

The sys schema queries not only have formatted the timings, but also the path and the bytes. This has been done using the sys.format_path() and sys.format_bytes() functions, respectively.

From the output, it can be seen that despite no rows were ever fetched (read) from the employees.salaries table (that was found in the previous output), there has still been some read file I/O. This was what the sys.schema_table_statistics view reflected.

So, what does all of this mean? The graph in MySQL Enterprise Monitor showed that there was six seconds file I/O latency for the InnoDB data files. Is that bad? As often with these kinds of questions, the answer is: “it depends”.

What to Make of the Values

In this case we have a case where the graphs in MySQL Enterprise Monitor show a peak for the latency and bytes used doing I/O on the InnoDB data files. This is actually disk I/O. But what exactly does that means and should the alarm sound?

Let’s first refresh our memory on how the graphs looked:
The MEM InnoDB File I/O Graphs showing a peak in latency and bytes at 12:31.If you are not using MySQL Enterprise Monitor, you may have similar graphs from your monitoring solution, or you have obtained latency and bytes values from the tables and views discussed in this blog.

The latency graph shows that we have done six seconds if I/O. What does that mean? It is the aggregate I/O during the period the data was collected. In this case, the data is plotted for every minute, so in the one minute around 12:51, of the 60 seconds a total of six seconds was spent doing I/O. Now, the six seconds suddenly do no sound so bad. Similar for the bytes, around 4.6MiB of data was read or written.

In general, the values obtained either from the monitoring graphs or from the underlying tables cannot be used to conclude whether the is a problem or not. They just show how much I/O was done at different times.

Similar for the values from the Performance Schema. On their own they do not tell much. You can almost say they are too neutral – they just state how much work was done, not whether it was too much or too little.

A more useful way to use this data is in case a problem is reported. This can be that system administrator reports the disks are 100% utilized or that end users report the system is slow. Then, you can go and look at what happened. If the disk I/O was unusually high at that point in time, then that is likely related, and you can continue your investigation from there.

There are more reports in MySQL Enterprise Monitor both as time series graphs and as point-in-time snapshots. The point-in-time snapshots are often using the sys schema views but allows sorting. An example is the Database File I/O reports:
MEM's Database File I/O ReportMySQL Workbench also provides performance reports based on the sys scheme. The equivalent to the previous report is the Top File I/O Activity Report:
MySQL Workbench's Top File I/O Activity ReportThe MySQL Workbench performance reports also allows you to export the report or copy the query used to generate the report, so you can execute it manually.

With respect to the wait/io/table/sql/handler events, then remember that I/O here does not mean disk I/O, but table I/O. So, all it means that time is accumulating for these events – including when looking at the per table is that the data in the table is used. There are also per index values in table_io_waits_summary_by_index_usage Performance Schema table and the schema_index_statistics sys view which have the same meaning. Whether a given usage is too high depends on many things. However, it can again be useful to investigate what is causing problems.

For the example data from this blog, it was triggered by loading the employees sample database. So, there was no problem. If you want to put data into your database, it must necessarily be written to the data file, otherwise it will not be persisted. However, if you think the data import took too long, you can use the data as part of your investigation on how to make the import faster.

The conclusion is that you should not panic if you see the I/O latencies peak. On their own they just mean that more work was done during that period that usual. Without context a peak is no worse than a dip (which could indicate something is preventing work from being done or that there is a natural slow period). Instead use the I/O latencies and bytes together with other observations to understand where MySQL is spending time and for which files data is being read and written.

References

I will recommend you look at the following references, if you want to understand more about the Performance Schema tables and sys schema views discussed in this blog:

The GitHub repository for the sys schema includes all the definitions of the views (and other objects). Since these are written in plain SQL, they are very useful to see in more depth where the data is coming from. The GitHub website allows you to browse through each of the files. Each sys schema object is defined in its own file.

Configuring the MySQL Shell Prompt

With the introduction of MySQL Shell 8.0, the second major version of the new command-line tool for MySQL, a new and rich featured prompt was introduced. Unlike the prompt of the traditional mysql command-line client, it does not just say mysql> by default. Instead it comes in a colour coded spectacle.

The default prompt is great, but for one reason or another it may be that you want to change the prompt. Before getting to that, let’s take a look at the default prompt, so the starting point is clear.

The Default Prompt

An example of the default prompt can be seen in the screen shot below. As you can see, there are several parts to the prompt, each carrying its information.

MySQL Shell with the default font.
MySQL Shell with the default font.

There are six parts. From left to right, they are:

  • Status: Whether it is a production system or whether the connection is lost. This part is not included in the above screen shot.
  • MySQL: Just a reminder that you are working with a MySQL database.
  • Connection: Which host you are connected to (localhost), which port (33060 – to the X protocol port), and that SSL is being used.
  • Schema: The current default schema.
  • Mode: Whether you are using JavaScript (JS), Python (Py), or SQL (SQL) to enter commands.
  • End: As per tradition, the prompt ends with a >.

Depending on your current status one or more of the parts may be missing. For example, the configuration options will only be present, when you have an active connection to a MySQL Server instance.

The prompt works well on a black background and thus brightly coloured text as in the screen shot, but for some other background and text colours, it is not so – or you may simply want different colours to signify which whether you are connected to a development or production system. You may also find the prompt too verbose, if you are recording a video or writing training material. So, let’s move on and find out how the prompt is configured.

The Prompt Configuration

Since the prompt is not just a simple string, it is also somewhat more complex to configure it than just setting an option. The configuration is done in a JSON object stored in a file named prompt.json (by default – you can change this – more about that later).

The location of prompt.json depends on your operating system:

  • Linux and macOS: ~/.mysqlsh/prompt.json – that is in the .mysqlsh directory in the user’s home directory.
  • Microsoft Windows: %AppData%\MySQL\mysqlsh\prompt.json – that is in AppData\Roaming\MySQL\mysqlsh directory from the user’s home directory.

If the file does not exist, MySQL Shell falls back on a system default. For example, on Oracle Linux 7 installation, the file /usr/share/mysqlsh/prompt/prompt_256.json is used. This is also the template that is copied to %AppData%\MySQL\mysqlsh\prompt.json on Microsoft Windows 10 installation.

The MySQL Shell installation includes several templates that you can choose from. These are:

  • prompt_16.json: A coloured prompt limited to use 16/8 color ANSI colours and attributes.
  • prompt_256.json: The prompt uses 256 indexed colours. This is the one that are used by default both on Oracle Linux 7 and Microsoft Windows 10.
  • prompt_256inv.json: Similar to prompt_256.json, but with an “invisible” background colour (it just uses the same as for the terminal) and with different foreground colours.
  • prompt_256pl.json: Same as prompt_256.json but with extra symbols. This Powerline patched font such as the one that is installed with the Powerline project. This will add a padlock with the prompt when you use SSL to connect to MySQL and use “arrow” separators.
  • prompt_256pl+aw.json: Same as prompt_256pl.json but with “awesome symbols”. This additionally requires the awesome symbols to be included in the Powerline font.
  • prompt_classic.json: This is a very basic prompt that just shows mysql-js>, mysql-py>, or mysql-sql> based on the mode in use.
  • prompt_nocolor.json: Gives the full prompt information, but completely without colours. An example of a prompt is: MySQL [localhost+ ssl/world] JS>

These are templates that you can use as is or modify to suite yours needs and preferences. One way to pick a theme is to copy the template file into the location of your user’s prompt definition. The templates can be found in the prompt directory of the installation, for example:

  • Oracle Linux 7 RPM: /usr/share/mysqlsh/prompt/
  • Microsoft Windows: C:\Program Files\MySQL\MySQL Shell 8.0\share\mysqlsh\prompt

Another option is to define the MYSQLSH_PROMPT_THEME environment variable to point to the file you want to use. The value should be the full path to the file. This is particularly useful if you want to try the different template to see what works best for you. For example, to use the prompt_256inv.json template from the command prompt on Microsoft Windows:

Which gives the prompt:

The prompt when using the prompt_256inv.json template.
The prompt when using the prompt_256inv.json template.

If none of the templates work for you, you can also dive in at the deep end of the pool and create your own configuration.

Creating Your Own Configuration

It is not completely trivial to create your own configuration, but if you use the template that is closest to the configuration you want as a starting point, it is not difficult either.

A good source of help to create the perfect prompt is also the README.prompt file that is located in the same directory as the template files. The README.prompt file contains the specification for the configuration.

Instead of going through the specification in details, let’s take a look at the prompt_256.json template and discuss some parts of it. Let’s start at the end of the file:

This is where the elements of the prompt is defined. There are a few things that is interesting to note here.

First, notice that there is an object with the classes disconnected%host% and %is_production%. The names inside the %s are variables defined in the same file or that comes from MySQL Shell itself (it has variables such as the host and port). For example, is_production is defined as:

So, a host is considered to be a production instance if it is included in the environment variable PRODUCTION_SERVERS. When there is a match, and additional element is inserted at the beginning of the prompt to make it clear, you are working on with a production system:

Connected to a production system.
Connected to a production system.

The second thing to note about the list of elements is that there are some special functions such as shrink which can be used to define how the text is kept relatively short. For example, the host uses truncate_on_dot, so only the part before the first dot in the hostname is displayed if the full hostname is too long. Alternatively ellipsize can be used to add … after the truncated value.

Third, the background and foreground colours are defined using the bg and fg elements respectively. This allows you to completely customize the prompt to your liking with respect to colours. The colour can be specified in one of the following ways:

  • By Name: There are a few colours that are known by name: black, red, green, yellow, blue, magenta, cyan, white.
  • By Index: A value between 0 and 255 (both inclusive) where 0 is black, 63 light blue, 127 magenta, 193 yellow, and 255 is white.
  • By RGB: Use a value in the #rrggbb format. Requires the terminal supports TrueColor colours.
Tip: If you want to do more than make a few tweaks to an existing template, read the README.prompt file to see the full specification including a list of supported attributes and built-in variables. These may change in the future as more features are added.

One group of built-in variables that deserve an example are the ones that in some way depend on the environment or the MySQL instance you are connected to. These are:

  • %env:varname%: This uses an environment variable. The way that it is determined whether you are connected to a production server is an example of how an environment variable
  • %sysvar:varname%: This uses the value of a global system variable from MySQL. That is, the value returned by SELECT @@global.varname.
  • %sessvar:varname%: Similar to the previous but using a session system variable.
  • %status:varname%: This uses the value of a global status variable from MySQL. That is, the value returned by SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = ‘varname’.
  • %status:varname%: Similar to the previous, but using a session status variable.

If you for example want to include the MySQL version (of the instance you are connected to) in the prompt, you can add an element like:

The resulting prompt is:

Including the MySQL Server version in the prompt.
Including the MySQL Server version in the prompt.

What next? Now it is your turn to play with MySQL Shell. Enjoy.

MySQL 8.0.12: Instant ALTER TABLE

There are many nice changes included in the MySQL 8.0.12 release that were published a couple of days ago. One of the most exciting is the ability to make instant schema changes to tables. This blog will look into why I think that that is a stand-out change.

I will not go into details with the implementation other than noting that the new data dictionary in MySQL 8.0 has allowed for a nice implementation of the patch that was contributed by the Tencent Games DBA Team. If you are interested in learning more about the implementation, I will recommend you to read the blog by Bin Su (MySQL 8.0: InnoDB now supports Instant ADD COLUMN) and the worklog implementing the feature (WL#11250: Support Instant Add Column).

Thanks to the Tencent Games DBA Team who contributed the patch for this feature.

Why Do We Need Instant ALTER TABLE?

As you may know, InnoDB has since version 5.6 supported online schema changes. So, a fair thought may be why that is no longer good enough. The thing is that while online schema changes are very nice and allows you to make changes to the data (DML statements) while the schema change is made, there are still some problems:

  • Online ALTER TABLE still requires a meta data lock for the duration of the operation. That is, it is not possible to make other schema changes until the ALTER TABLE has completed.
  • In replication setups, the SQL thread handling the online schema change will still block other changes coming through the replication channel from occurring. This means that an online schema change that takes an hour suddenly makes the replication lag an hour.
  • This is compounded in chained replication setups where the delay on the last instances in the topology is multiplied with the number of times the change has been replication. Suddenly this schema change that did not really affect the replication master becomes a major pain and delays the deployment of the application upgrade that relies on the schema changes.
  • Even though the change is online, it still is heavy on resources: disk for storing the extra copy of the table when the change cannot be made in-place, CPU and disk activity that may cause other queries to become slow, etc.
  • There is a limit to the amount of DML changes that can be made during the schema change. If you make too many changes, the schema change will fail.

So, in short, online schema changes are only the first stop. Instance schema changes is the future.

Which Changes Can Be Made Instantly?

While it would be great if all schema changes could be made instantly, unfortunately that is not the case. Just as not all schema changes can be made online, there are limitations to the new instant feature. In short, the changes that can be made instantly must only affect the metadata of the table. The metadata is stored in the data dictionary. The changes that can be made with the instant ALTER TABLE feature as per 8.0.12 are:

  • Adding a new column as the last column in the table.
  • Adding a generated virtual column.
  • Dropping a generated virtual column.
  • Setting a default value for an existing column.
  • Dropping the default value for an existing column.
  • Changing the list of values allowed for a column with the ENUM or SET data types. A requirement is that the storage size does not change for the column.
  • Change whether the index type is set explicitly for an existing index.

And who knows, maybe later the feature can be extended to cover more changes. There are also a few limitations that are good to be aware of:

  • The row format cannot be COMPRESSED.
  • The table cannot have a fulltext index.
  • Tables in the data dictionary cannot use the instant algorithm.
  • Temporary tables are not supported.

How to Ensure You are Using the Expected Algorithm?

One problem with schema changes is that here are different algorithms depending on the schema change. Currently there are three different algorithms:

  • INSTANT: the change completes very quickly (yes not quite instantly) as only the metadata in the data dictionary needs to be updated.
  • INPLACE: the changes are made within the existing table, i.e. a complete table copy is avoided.
  • COPY: the table is copied into a new copy with the new definition.

By default, MySQL chooses the algorithm doing the least work. That is, INSTANT if that is supported, otherwise INPLACE if that is supported, and finally COPY. Additionally, there is the concept of locking which can be set to either NONE, SHARED, or EXCLUSIVE.

So, how do you ensure you are not ending up with a copying ALTER TABLE taking exclusive locks when you thought the operation was going to be instant? The answer is to explicitly set the algorithm and lock type. That way, MySQL will throw an error if your schema change is not compatible with the requested algorithm. The same principle can be used to force a full rebuild of the table if you for example have a small table and don’t see a reason to worry about instantly added columns (see also more later).

I will give some examples of specifying the ALGORITHM and LOCK options to ALTER TABLE later. However, we first need an example table to play with.

Tip: Always specify the ALGORITHM and LOCK options explicitly to avoid unpleasant surprises.

Creating an Example Table

For the purpose of the upcoming examples of instant schema changes, it is necessary to have a table to play with. The table will be the testtbl table in the my_schema schema. The table can be created and populated with sample data by using MySQL Shell – this allows us to take advantage of the support for scripting. The table will have one million rows.

Note: To limit the amount of scrolling, the prompt in MySQL Shell has been changed to just show the mode (JS, PY, SQL).

Note how this takes advantage of the feature in MySQL Shell where the db object in JavaScript and Python mode has the tables as properties. Thus it is possible to refer to the testtbl table in the my_schema schema as db.testtbl after using \use my_schema to read the database objects. The data is inserted in 1000 batches of 1000 rows using a double loop.

Now it is time to play with schema changes.

Schema Changes Examples

For comparison, let’s first add a column using the old algorithms. First, add the column val2 using the copying algorithm – note this is not supported online, so a shared lock is taken:

Note: the test is made on a laptop – the timings themselves are not relevant, but they can be used to compare the time taken for the three algorithms.

So that took around 5.4 seconds. That is not too bad, but we can do much better than that. Let’s try the in-place algorithm. Not only will that require less work (though still a fair bit), it is also possible to perform the change online.

So, this is a factor three faster despite the table having an extra column compared to the original table. And it is possible to execute DML statements. That is a big win. But let’s move on to the finale: instant ALTER TABLE:

Wow – that’s less than 5/100 of a second – that’s instant enough for me. Note hos the LOCK option is not specified here. The instant algorithm is always as lock free as possible, so the LOCK option is not allowed.

There is a little more to it than this. As mentioned by Bin Su, there are some trade offs such as the row size if not checked upfront. So, in some cases it can be useful to know whether a table contains instantly added columns. This can be seen through the innodb_tables view in the Information Schema on the data dictionary:

Now what is that? 4 columns? But we only added one of the columns using the instant algorithm. What the INSTANT_COLS column shows is how many columns existed before the first instant column was added. In the example the columns id, val, val2, and val3 existed, then val4 was added using the instant algorithm. For tables that have never had any columns added instantly, the value of INSTANT_COLS is 0.

Want to Read More?

This blog has just been an introduction to the new MySQL 8.0.12 feature of instant schema changes. I will recommend you to read Bin Xu’s blog as well as the documentation in the MySQL reference manual to fully understand the feature:

Conclusion

The new feature allowing instant schema changes is a great way to avoid time consuming operations. Particularly in replication setups where the replicated statement will block while it applies, the different between an online and an instant schema change is like night and day.

MySQL 8.0.12 can be downloaded from MySQL Community Downloads or you can install it through one of the “native installers” such as MySQL Installer for Microsoft Windows or one of our Linux repositories (Yum, APT, SUSE). If you are a customer and need the commercial version of MySQL Server, I will recommend you to download it from Patches & Updates in My Oracle Support (MOS); there is also a 30 days trial version available from Oracle Software Delivery Cloud.

Please do not hesitate to provide feedback. Bugs and feature requests can be logged in the MySQL Bugs database.

Which Character Set Should You Use in MySQL?

MySQL supports a long list of character sets, and one of the strong points are that you can use different character sets per column. This is very flexible, but which character set should you use?

If you do not want to read the whole blog the summary is: Use utf8mb4, particularly in MySQL 8.0 with one of the UCA 9.0.0 based collations.

Before I reach this conclusion, let’s start out looking at what a character set is.

What is a Character Set?

Characters are wonderful things that can take all kinds of shapes ranging from the familiar (for English speakers at least) Latin characters as those used to write this blog, over Arabic (مرحبا بالعالم) and Asian characters such as simplified Chinese (你好,世界) to emojis (❤️🐬). This is all cool, but computers do not deal in characters, they work in bits, so how are all of these wonderful characters represented?

The answer is that character sets are used to encode the characters into bits. Given the number of known characters (Unicode 11.0 contains 137374 characters). Just to cover the possibility of those characters, 18 bits are needed (so effectively 3 bytes). That is significant.

In the good old days when computers were new, it was mostly English that was used, and using 18 bits per character was just not feasible. A popular home computer in the 1980s was the Commodore 64 was an 8-bit computer with a total of 64kB of RAM – you just could not get away with using multiple bytes per character.

In the early days a popular character set was ASCII which uses 7 bits to encode each character. This allows for 128 unique characters which is ok for plain English but does not even allow for all of the Western European accented characters to be included. One answer to that are the Latin character sets, such as ISO 8859-1 (Latin 1) and Windows-1252. These use 8 bits and thus supports 256 characters for each character set. Together they allow complete coverage for the languages using the Latin letters, but you have to choose the right one for the language you are writing, and in all cases 8 bit does not help much if you need to write something in Chinese that has thousands of characters.

The current solution is to use Unicode characters. The Unicode Consortium defines code points for the characters included. As mentioned, in Unicode 11.0 there are just over 137000 characters. This covers most usages (yes, there are actually still characters missing). There is a bit more to it than that, however. One thing is the code points defined by the Unicode Consortium, but these still need bits representations for computers. There are several solutions to that with UTF-8 being the most commonly used nowadays.

UTF-8 is a variable width encoding that uses one to four bytes to represent a Unicode code point. Part of the success of UTF-8 is that the characters used in ASCII have the same encoding in UTF-8 – that is, if you have a document in ASCII, you can just say it is now in UTF-8 and all works well. That made it easy for the large number of ASCII based websites to migrate to use Unicode. The rest of the characters use one or more bytes. While the compatibility with ASCII has been great for adaptation and it helps keep the size of documents in English down, it also has some downsides. One is that some other languages uses more bytes that needed; another is that scanning a string is relatively expensive as you must decode each character to know where the next character begins.

Before moving on, let’s have a quick look at some actual encodings in different character sets, to see the difference (UCS2 is a two-byte fixed width Unicode character set):

CharacterASCIILatin-1UTF-8UCS-2
A4141410041
ÅC5C38500C5
E7958C754C
🐬F09F90AC

The table shows how there clearly are some relations between these character sets, but also how only UTF-8 can represent all of the four test characters. So, from this comparison the strengths of UTF-8 is starting to show, but also the weaknesses. The character Å requires two bytes in UTF-8 (and UCS-2) but only one in Latin-1. The Chinese character 界 requires three bytes in UTF-8 but only two in UCS-2.

So, what does this mean in the context of MySQL? Let’s take a look.

MySQL Character Sets

Until MySQL 8.0 the default character set in MySQL was Latin-1 (named latin1). This was a convenient character set in many ways, for example it was fixed width, so finding the Nth character in a string was fast and it could store text for most Western European languages. However as discussed, Latin-1 is not what is used in this day and age – the World has moved on to UTF-8. So, in MySQL 8.0 the change was made to make utf8mb4 the default character set.

Stop a minute – what is utf8mb4? How does that differ from UTF-8 that was discussed in the previous section? Well, it is the same thing. Unfortunately, when UTF-8 was first implemented in MySQL, it was limited to three bytes (in MySQL 8.0 called utf8mb3). This predates my involvement with MySQL, but a guess for this choice is that it is related to internal temporary tables which in MySQL 5.7 and earlier uses the MEMORY storage engine when possible. Internal temporary tables are for example used to store the result of subquery and for sorting. The MEMORY storage engine only supports fix width columns, so a varchar(10) column would be treated as a char(10) column in an in-memory internal temporary table. With utf8mb4 that would mean 40 bytes, with the choice of a 3-byte implementation it would mean 30 bytes. Furthermore, until the emergence of emojis, it was rarely required to use more than three bytes in UTF-8.

Anyway, MySQL has support for a wide range of character sets to suite your specific need. In MySQL 8 there are a total of 41 character sets to choose from. The details of the character sets can be found in the information_schema.CHARACTER_SETS table:

Notice how there are several UTF and other Unicode character sets including utf8 and utf8m4. The utf8 character set is the 3-byte implementation. The name utf8 has now been deprecated in favour of utf8mb3 to make it specific that you are using an implementation that can at most use three bytes per character.

The table has four columns: The character set name, the default collation, a description, and the maximum number of bytes per character. The first and the two last of the columns are clear enough, but what is the collation? Let’s discuss that.

What is a Collation?

It is all well and good that you now know how to encode characters, but one of the main uses of databases is to compare data. Are two values the same or is one larger than the other? For numbers this is in general well defined (floating point comparisons and release numbers are examples where it is less clear), for example 10 is greater than 5.

However, for strings it is not that simple. Different languages have different rules for the order of the characters, and there are the questions whether an accented character should be considered the same as the base character and whether upper and lower case should be considered the same. An example is that in Danish and Norwegian, the character Ø comes before Å, but in Swedish Å comes before Ö (Ø and Ö are essentially the same letter).

The rules that define the order of the characters and whether to take accents and capitalization into account is called collations. There are many collations for the same character set. In MySQL 8.0 there are alone 73 collations to choose from for the utf8mb4 character set (more later about utf8mb4).

The collations that can be used with a given character set can be found in the information_schema.COLLATIONS table. For example, to see the 73 collations available for utf8mb4:

The most interesting in this discussion is the name. As you can see there is a pattern to the names, and it is possible to identify up to four parts in the name

  • First the character set name.
  • Then which language it is for, for example ja for Japanese. A special “language” is binary which mean each byte is compared directly one by one.
  • Then whether it is a UCA 9.0.0 based collation. The UCA 9.0.0 based collations have 0900 in the name.
  • Then up to three modifiers depending on whether it is accent and/or case sensitive or insensitive.

There are a total of five accent and case modifiers:

  • ai: accent insensitive, ä is equal to a.
  • as: accent sensitive, ä is not equal to a.
  • ci: case insensitive, a is equal to A.
  • cs: case sensitive, a is not equal to A.
  • ks: kana sensitive (only for Japanese)

This brings us on to MySQL 8.0 and character sets and collations.

MySQL 8.0

Good job of making it this far. You are almost at the end now.

As part of the work to make utf8mb4 the default character set in MySQL 8.0, a large amount of work was put into making the MySQL work more efficiently with the UTF-8 character set and to make it more useful for the users. Some of the changes include:

  • The UCA 9.0.0 collations (with 0900 in the name) are new in MySQL 8.0. MySQL works faster with these new collations than the old collations.
  • In-memory internal temporary tables now use the new Temptable storage engine by default. It supports variable width columns making it much more efficient with UTF-8 than earlier versions of MySQL. From version 8.0.12 (just released) the Temptable storage engine also supports BLOBs.

Another thing to be aware of is that the new X DevAPI that for example allows NoSQL access to both the MySQL Document store with JSON documents and to SQL tables expect all query results to be UTF-8.

For these reasons it is recommended to use utf8mb4 with one of the new UCA 9.0.0 collations for most uses in MySQL 8.0. The default collation for utf8mb4 in MySQL 8.0 is utf8mb4_0900_ai_ci. This is a good collation for the general use cases, but feel free to use one of the more specific collations if that works better for your application. For example, the utf8mb4_ja_0900_as_cs or utf8mb4_ja_0900_as_cs_ks if you need correct Japanese comparisons (such as the fix for the infamous Sushi = Beer bug).

Character sets and collations are big topics. If you are interested to know more about MySQL 8.0 and character sets, there is a series of blogs on the MySQL Server Blog, for example:

Additionally, the MySQL reference manual has a chapter with 15 sections about character sets and collations: Chapter 10 Character Sets, Collations, Unicode.

MySQL NDB Cluster 7.6: Fast Import with ndb_import

A common problem when you need to set up a new database instance is to import the initial data. This can particularly be an issue, if you are converting an existing database so there is a large amount of data to import. In MySQL NDB Cluster 7.6 which just was announced as general available (GA) this week, there is a new tool to help with such imports: ndb_import.

The ndb_import utility takes a CSV file and imports the data into a table. (While C stands for comma, the tool is not limited to comma-separated files, in fact tabs is the default delimiter.) But hey, does that not sound familiar? It sure does. MySQL Server has “for ever” included the mysqlimport utility and LOAD DATA statement. So why do we need another utility and what does ndb_import do that we cannot already do with the existing tools?

The big advantage of ndb_import is that it is a native NDB tool. It does not connect to any of the SQL nodes; instead it connects directly to the data nodes as an NDB API program. This alone allows you to bypass the overhead of connecting to the SQL node and parse SQL statements. However, it is even better than that. The ndb_import tool can also take advantage of the parallelism offered by MySQL NDB Cluster. This means that in the end, the overall performance is expected to be quite a bit better than loading the data using mysqlimport or LOAD DATA.

Note: The speed-up will depend on your setup. A cluster with all nodes in a VM on a laptop may not experience any significant speed-up compared to other methods of importing data.

So, how does this work? Let us look at a simple example. We will import the following data from the data.txt file:

The data includes the first name, surname, and birthday for four persons separated by \t, and it will be imported into the db1.t1 table:

Note: You still need to create the table through an SQL node.

With the data and table, the data can be imported using the following ndb_import command:

The arguments in the first line of the command tells how to connect to the management node, the schema name, and the name of the file to import data from. Like for mysqlimport, the table name is derived from the file name by using the basename, t1 in this case, as the table name.

The –connections and –db-workers options defines the parallelism of the job. The –connections option is equivalent to the ndb-cluster-connection-pool option that can be set for SQL nodes and defines how many connections the ndb_import process creates to the data nodes. There must be at least as many free API node slots available as the number of connections requested. Otherwise the command will hang while waiting for enough connections to become available and eventually fail. The –db-workers option specifies the number of threads ndb_import will use per data node.

The remaining options are equivalent to the options available for mysqlimport, LOAD DATA, and SELECT … INTO OUTFILE. They specify the format of the data inside the source file and how many rows to skip – one in this case due to the header.

If you need to load data into several tables in the same schema, you can specify a file name for each table, and they will all be processed.

There are many more options available. I will recommend you to check out the reference manual page for ndb_import for more information.

The output of the command is:

This includes some metadata information about the job such as the table and file names as well as the total number of rows imported, how long time it took, and the number of rows imported.

What happens, if something goes wrong? If we thought the data was comma-separated and had used –fields-terminated-by=”,”, an error would have occurred:

Notice here how the fifth line of the output says “job-1 rejected 1 rows (limit 0), see t1.rej”. Each input file is considered a job, so “job-1” refers to the first (and in this case only) input file. It rejected 1 row, and the limit refers to the –rejects option which specifies how many permanent errors must be encountered before failing the whole job. The default is to fail on the first error. Finally, “see t1.rej” tells us to look at the t1.rej file (created by ndb_import) for more information about the failed job:

The t1.rej file is a tab-delimited file with information for each failed row. In this case, the error is that there are too few values: one value was found (because there are no commas in the data and we specified the wrong delimiter) but four values were expected. If –rejects had been set to 3, all four rows would have been included in t1.rej.

The ndb_import binary is included with the NDB Cluster installation and can be found in the bin directory. The next step is for you to try it out. Good luck importing data into MySQL NDB Cluster 7.6.