MySQL NDB Cluster: Never Install a Management Node on the Same Host as a Data Node

In MySQL NDB Cluster, the management node (ndb_mgmd) is a lightweight process that among other things handles the configuration of the cluster. Since it is lightweight. It can be tempting to install it with one of the other nodes. However, if you want a high-availability setup, you should never install it on the same host as a data node (ndbd or ndbmtd). If you do that, it can cause a total cluster outage where the cluster could otherwise have survived.

The first sign of trouble occurs when you start the management nodes. The following warning is printed to standard output:

2018-08-22 18:04:14 [MgmtSrvr] WARNING  -- at line 46: Cluster configuration warning:
  arbitrator with id 49 and db node with id 1 on same host 192.0.2.1
  arbitrator with id 50 and db node with id 2 on same host 192.0.2.2
  Running arbitrator on the same host as a database node may
  cause complete cluster shutdown in case of host failure.

To understand why this setup can cause a complete cluster shutdown, it is necessary first to review how a node failure is handled in MySQL NDB Cluster.

Node Failure Handling

When a data node fails, the cluster automatically evaluates whether it is safe to continue. A node failure can in this respect be either that a data node crashes or there is a network outage meaning one or more nodes cannot be seen by the other nodes.

A clean node shutdown (such as when using the recommended STOP command in the management client) is not subject to the evaluation as the other nodes will be notified of the shutdown.

So, how does MySQL NDB Cluster decide whether, the cluster can continue after a node failure or not? And if there are two halves, which one gets to continue?
Assuming two replicas (NoOfReplicas = 2), the rules are quite simple. To quote Pro MySQL NDB Cluster:

Pro MySQL NDB Cluster

“In short, data nodes are allowed to continue if the following conditions are true:

  • The group of data nodes holds all the data
  • Either more than half the data nodes are in the group or the group has won the arbitration process.”

For the group of data nodes to hold all the data, there must be one data node from each node group available. A node group is a group of NoOfReplicas nodes that share data. The arbitration process refers to the process of contacting the arbitrator (typically a management node) – the first half to make contact will remain online.

This is all a bit abstract, so let’s take a look at a couple of examples.

Examples

Consider a cluster with two data nodes and two management nodes. Most of the examples will have a management node installed on each of the hosts with the data nodes. The last example will as contrast have the management nodes on separate hosts.

The starting point is thus a cluster using two hosts each with one data node and one management node as shown in this figure:

A healthy cluster with two data nodes and the management nodes installed on the same hosts as the data nodes.
A healthy cluster with two data nodes and the management nodes installed on the same hosts as the data nodes.

The green colour represents that the data node is online. The blue colour for the management node with Node Id 49 is the arbitrator, and the yellow management node is on standby.

This is where the problem with the setup starts to show up. The arbitrator is the node that is involved when there is exactly half the data nodes available after a node failure. In that case, the data node(s) will have to contact the arbitrator to confirm whether it is OK to continue. This avoids a split-brain scenario where there are two halves with all the data; in that case it is imperative that one half is shut down or the data can start to diverge. The half that is first to contact the arbitrator survives, the other is killed (STONITH – shoot the other node in the head).

So, let’s look at a potential split-brain scenario.

Split-Brain Scenario

A split-brain scenario can occur when the network between the two halves of the cluster is lost as shown in the next figure:

NDB Cluster with network failure but the cluster survives.
NDB Cluster with network failure but the cluster survives.

In this case the network connection between the two hosts is lost. Since both nodes have all data, it is necessary with arbitration to decide who can continue. The data node with Id 1 can still contact the arbitrator as it is on the same host, but Node Id 2 cannot (it would need to use the network that is down). So, Node Id 1 gets to continue whereas Node Id 2 is shut down.

So far so good. This is what is expected. A single point of failure does not lead to a cluster outage. However, what happens if we instead of a network failure considers a complete host failure?

Host Failure

Consider now a case where there is a hardware failure on Host A or someone by accident pulls the power. This causes the whole host to shut down taking both the data and management node with it. What happens in this case?

The first thought is that it will not be an issue. Node Id 2 has all the data, so surely it will continue, right? No, that is not so. The result is a total cluster outage as shown in the following figure:

The failure of the host with the arbitrator causes complete cluster outage.
The failure of the host with the arbitrator causes complete cluster outage.

Why does this happen? When Host A crashes, so does the arbitrator management node. Since Node Id 2 does not on its own constitute a majority of the data nodes, it must contact the arbitrator to be allowed to remain online.

You may think it can use the management node with Node Id 50 as the arbitrator, but that will not happen: while handling a node failure, under no circumstances can the arbitrator be changed. The nodes on Host B cannot know whether it cannot see the nodes on Host A because of a network failure (as in the previous example) or because the nodes are dead. So, they have to assume the other nodes are still alive or there would sooner or later be a split-brain cluster with both halves online.

Important

The arbitrator will never change while the cluster handles a node failure.

So, the data node with Id 2 has no other option than to shut itself down, and there is a total cluster outage. A single point of failure has caused a total failure. That is not the idea of a high availability cluster.

What could have been done to prevent the cluster outage? Let’s reconsider the case where the arbitrator is on a third independent host.

Arbitrator on Independent Host

The picture changes completely, if the management nodes are installed on Hosts C and D instead of Hosts A and B. For simplicity the management node with Node Id 50 is left out as it is anyway just a spectator while handling the node failure. In this case the scenario is:

The failure of the host with the arbitrator on a third host ensures the cluster remains online.
The failure of the host with the arbitrator on a third host ensures the cluster remains online.

Here Node Id 2 can still contact the arbitrator. Node Id 1 is dead, so it will not compete to win the arbitration, and the end result becomes that Node Id 2 remains online. The situation is back where a single point of failure does not bring down the whole cluster.

Conclusion

If you want your cluster to have the best chance of survival if there is a problem with one of the hosts, never install the management nodes on the same hosts as where there are data nodes. One of the management nodes will also act as the arbitrator. Since the arbitrator cannot change while the cluster is handling a node failure, if the host with the arbitrator crashes, it will cause a complete cluster shutdown if arbitration is required.

When you consider what is a host, you should look at physical hosts. Installing the management node in a different virtual machine on the same physical host offers only little extra protection compared to the case where they are installed in the same virtual host or on the same host using bare metal.

So, to conclude: make sure your management nodes are on a completely different physical host compared to your data nodes.

Want to Know More?

The book Pro MySQL NDB Cluster (published by Apress) by Mikiya Okuno and me includes lots of information about designing your cluster and how MySQL NDB Cluster works.

Disclaimer

I am one of the authors of Pro MySQL NDB Cluster.

Monitoring NDBCluster Copying Alter Progress

MySQL NDB Cluster has great support for online (inplace) schema changes, but it is still sometimes necessary to perform an offline (copying) ALTER TABLE. These are relatively expensive to make as the entire table is copied into a new table which eventually replace the old table.

One example where a copying ALTER TABLE is required is when upgrading from MySQL NDB Cluster 7.2 or earlier to MySQL NDB Cluster 7.3 or later. The format used for temporal columns changed between these version (corresponding to MySQL Server 5.5 to 5.6). In order to take advantage of the new temporal format, a table rebuild is required.

Note

Support for the old temporal format has been removed in MySQL 8.0. So, you must upgrade your tables before an upgrade is possible. There is at the time of writing no MySQL NDB Cluster releases based on MySQL Server 8.0.

Schematic representation of a copying ALTER TABLE
Schematic representation of a copying ALTER TABLE

For long running operations, it can be useful to monitor the progress. There is no built-in way to do this like there is for InnoDB in MySQL 5.7 and later (see the blog InnoDB Progress Information), however the ndbinfo schema can give some information about the progress.

The ndbinfo schema is a virtual schema with views that show information from the data nodes. You can argue it is MySQL NDB Cluster’s answer to the Performance Schema. The ndbinfo schema was introduced in MySQL NDB Cluster 7.1 more than eight years ago and has steadily seen more and more information becoming available.

One of these changes arrived in MySQL NDB Cluster 7.4 where the memory_per_fragment view was added. This view shows detailed information about the memory used per fragment (in most cases the same as partitions). This can also be used to get an estimate of the progress of a copying ALTER TABLE.

As mentioned, a copying ALTER TABLE is similar to creating a new table with the new schema (which may potential be the same as the old schema), then inserting all of the data from the old table to the new. At the end, the two tables are swapped and the old table dropped.

Important

Remember that a copying ALTER TABLE is an offline operation. Any changes made to the table during the operation may be lost! Make sure the table is read-only while the ALTER TABLE is executing.

The temporary table (that later become the real table) is an NDBCluster table like other user created tables. This means the table will show up in ndbinfo.memory_per_fragment as a normal table, just with a special table name.

Temporary tables are named like #sql-7f4b_4 where the part after the – is generated based on the operating system process ID of the mysqld process and the connection id of the connection executing the ALTER TABLE. The schema name for the temporary table is the same as for the original table. In the example the process ID is 32587 or 7f4b in hexadecimal notation and the connection ID is 4.

As an example consider a rebuild of the db1.t1 table. In this case the fully qualified name (the name used by NDB Cluster instead of the normal table name) is db1/def/t1, i.e. the schema name and table name with /def/ between them. You can choose to create the fully qualified name for the temporary table as described above. An alternative, if you just have one concurrent table rebuild in the schema is to just look for the fully qualified name matching db1/def/#sql-%.

So, you can use the ndbinfo.memory_per_fragment table to see how much memory is allocated per fragment of the temporary table compared to the original table. For example:

mysql> SELECT fq_name, parent_fq_name, type, table_id,
              (fixed_elem_alloc_bytes-fixed_elem_free_bytes) AS FixedBytes,
              (var_elem_alloc_bytes-var_elem_free_bytes) AS VarBytes,
              hash_index_alloc_bytes
         FROM ndbinfo.memory_per_fragment
        WHERE fq_name = 'db1/def/t1' OR fq_name LIKE 'db1/def/#sql-%'
              OR parent_fq_name = 'db1/def/t1' OR parent_fq_name LIKE 'db1/def/#sql-%';
+------------------------+---------------------+-------------------+----------+------------+----------+------------------------+
| fq_name                | parent_fq_name      | type              | table_id | FixedBytes | VarBytes | hash_index_alloc_bytes |
+------------------------+---------------------+-------------------+----------+------------+----------+------------------------+
| db1/def/NDB$BLOB_45_3  | db1/def/t1          | User table        |       46 |     100580 |  1038088 |                  40960 |
| db1/def/NDB$BLOB_45_3  | db1/def/t1          | User table        |       46 |      99320 |  1056380 |                  40960 |
| db1/def/NDB$BLOB_45_3  | db1/def/t1          | User table        |       46 |     100580 |  1038088 |                  40960 |
| db1/def/NDB$BLOB_45_3  | db1/def/t1          | User table        |       46 |      99320 |  1056380 |                  40960 |
| sys/def/45/val1$unique | db1/def/t1          | Unique hash index |       49 |      77640 |        0 |                  40960 |
| sys/def/45/val1$unique | db1/def/t1          | Unique hash index |       49 |      76184 |        0 |                  40960 |
| sys/def/45/val1$unique | db1/def/t1          | Unique hash index |       49 |      77640 |        0 |                  40960 |
| sys/def/45/val1$unique | db1/def/t1          | Unique hash index |       49 |      76184 |        0 |                  40960 |
| sys/def/45/val1        | db1/def/t1          | Ordered index     |       48 |      39424 |        0 |                      0 |
| sys/def/45/val1        | db1/def/t1          | Ordered index     |       48 |      37792 |        0 |                      0 |
| sys/def/45/val1        | db1/def/t1          | Ordered index     |       48 |      39424 |        0 |                      0 |
| sys/def/45/val1        | db1/def/t1          | Ordered index     |       48 |      37792 |        0 |                      0 |
| sys/def/45/PRIMARY     | db1/def/t1          | Ordered index     |       47 |      39424 |        0 |                      0 |
| sys/def/45/PRIMARY     | db1/def/t1          | Ordered index     |       47 |      37792 |        0 |                      0 |
| sys/def/45/PRIMARY     | db1/def/t1          | Ordered index     |       47 |      39424 |        0 |                      0 |
| sys/def/45/PRIMARY     | db1/def/t1          | Ordered index     |       47 |      37792 |        0 |                      0 |
| db1/def/NDB$BLOB_14_3  | db1/def/#sql-7f4b_4 | User table        |       15 |      43180 |   446148 |                  24576 |
| db1/def/NDB$BLOB_14_3  | db1/def/#sql-7f4b_4 | User table        |       15 |      44404 |   471920 |                  24576 |
| db1/def/NDB$BLOB_14_3  | db1/def/#sql-7f4b_4 | User table        |       15 |      43360 |   450184 |                  24576 |
| db1/def/NDB$BLOB_14_3  | db1/def/#sql-7f4b_4 | User table        |       15 |      44404 |   471920 |                  24576 |
| sys/def/14/val1$unique | db1/def/#sql-7f4b_4 | Unique hash index |       44 |      33448 |        0 |                  24576 |
| sys/def/14/val1$unique | db1/def/#sql-7f4b_4 | Unique hash index |       44 |      34176 |        0 |                  24576 |
| sys/def/14/val1$unique | db1/def/#sql-7f4b_4 | Unique hash index |       44 |      33532 |        0 |                  24576 |
| sys/def/14/val1$unique | db1/def/#sql-7f4b_4 | Unique hash index |       44 |      34176 |        0 |                  24576 |
| sys/def/14/PRIMARY     | db1/def/#sql-7f4b_4 | Ordered index     |       42 |      15904 |        0 |                      0 |
| sys/def/14/PRIMARY     | db1/def/#sql-7f4b_4 | Ordered index     |       42 |      16992 |        0 |                      0 |
| sys/def/14/PRIMARY     | db1/def/#sql-7f4b_4 | Ordered index     |       42 |      15904 |        0 |                      0 |
| sys/def/14/PRIMARY     | db1/def/#sql-7f4b_4 | Ordered index     |       42 |      16992 |        0 |                      0 |
| sys/def/14/val1        | db1/def/#sql-7f4b_4 | Ordered index     |       43 |      15904 |        0 |                      0 |
| sys/def/14/val1        | db1/def/#sql-7f4b_4 | Ordered index     |       43 |      16992 |        0 |                      0 |
| sys/def/14/val1        | db1/def/#sql-7f4b_4 | Ordered index     |       43 |      15904 |        0 |                      0 |
| sys/def/14/val1        | db1/def/#sql-7f4b_4 | Ordered index     |       43 |      16992 |        0 |                      0 |
| db1/def/t1             | NULL                | User table        |       45 |     110792 |   775260 |                  40960 |
| db1/def/t1             | NULL                | User table        |       45 |     108712 |   760568 |                  40960 |
| db1/def/t1             | NULL                | User table        |       45 |     110792 |   775260 |                  40960 |
| db1/def/t1             | NULL                | User table        |       45 |     108712 |   760568 |                  40960 |
| db1/def/#sql-7f4b_4    | NULL                | User table        |       14 |      47536 |   332412 |                  24576 |
| db1/def/#sql-7f4b_4    | NULL                | User table        |       14 |      48656 |   340252 |                  24576 |
| db1/def/#sql-7f4b_4    | NULL                | User table        |       14 |      47696 |   333532 |                  24576 |
| db1/def/#sql-7f4b_4    | NULL                | User table        |       14 |      48656 |   340252 |                  24576 |
+------------------------+---------------------+-------------------+----------+------------+----------+------------------------+
40 rows in set (0.86 sec)

The columns with information about the node ID, block instance, and fragment number have been left out. This is why it looks like there are duplicate rows. It is also worth noticing that there are several “child tables” for the indexes and a blob column.

There are three memory columns. The first is for the fixed size column format, the second for the variable width columns format, and the last for hash indexes.

MySQL NDB Cluster supports two storage formats for the columns. The fixed format uses less memory for columns that are fixed width in nature (such as integers), however variable (called DYNAMIC in CREATE TABLE and ALTER TABLE statements) is more flexible. The variable/dynamic column format is also the only one supported when adding a column inplace (online). See also the manual page for CREATE TABLE for more information about the column format.

The hash memory is the memory used by hash indexes (for the primary key and unique indexes).

For the fixed and variable element memory usages there is both allocated and free bytes. Here the free bytes is used as a measure of the amount of fragmentation. A copying ALTER TABLE defragments the table, so it is necessary to the the fragmentation into consideration when estimating the progress. In reality it is more complicated than the query suggest, so the memory values in the query result will not end up matching 100%, however in most cases it should be a reasonable estimate.

You can also choose to aggregate the memory, for example:

mysql> SELECT IF(fq_name LIKE 'db1/def/%'
                    AND fq_name NOT LIKE 'db1/def/NDB$BLOB%',
                 fq_name,
                 parent_fq_name
              ) AS FqName,
              sys.format_bytes(
                 SUM(fixed_elem_alloc_bytes-fixed_elem_free_bytes)
              ) AS FixedBytes,
              sys.format_bytes(
                 SUM(var_elem_alloc_bytes-var_elem_free_bytes)
              ) AS VarBytes,
              sys.format_bytes(
                 SUM(hash_index_alloc_bytes)
              ) AS HashBytes
         FROM ndbinfo.memory_per_fragment
        WHERE fq_name = 'db1/def/t1' OR fq_name LIKE 'db1/def/#sql-%'
              OR parent_fq_name = 'db1/def/t1' OR parent_fq_name LIKE 'db1/def/#sql-%'
        GROUP BY FqName;
+---------------------+------------+----------+------------+
| FqName              | FixedBytes | VarBytes | HashBytes  |
+---------------------+------------+----------+------------+
| db1/def/#sql-7f4b_4 | 629.20 KiB | 3.08 MiB | 288.00 KiB |
| db1/def/t1          | 1.39 MiB   | 6.92 MiB | 480.00 KiB |
+---------------------+------------+----------+------------+
2 rows in set (0.69 sec)

This aggregate query also uses the sys schema function format_bytes() to convert the number of bytes into human readable numbers. The sys schema is installed by default for MySQL NDB Cluster 7.5 and later and is available from MySQL’s repository on GitHub for MySQL NDB Cluster 7.3 and 7.4.

This way of estimating the progress of a copying ALTER TABLE is not perfect, but at least it can give an idea of how the operation progresses.

MySQL NDB Cluster 7.6: Fast Import with ndb_import

A common problem when you need to set up a new database instance is to import the initial data. This can particularly be an issue, if you are converting an existing database so there is a large amount of data to import. In MySQL NDB Cluster 7.6 which just was announced as general available (GA) this week, there is a new tool to help with such imports: ndb_import.

The ndb_import utility takes a CSV file and imports the data into a table. (While C stands for comma, the tool is not limited to comma-separated files, in fact tabs is the default delimiter.) But hey, does that not sound familiar? It sure does. MySQL Server has “for ever” included the mysqlimport utility and LOAD DATA statement. So why do we need another utility and what does ndb_import do that we cannot already do with the existing tools?

The big advantage of ndb_import is that it is a native NDB tool. It does not connect to any of the SQL nodes; instead it connects directly to the data nodes as an NDB API program. This alone allows you to bypass the overhead of connecting to the SQL node and parse SQL statements. However, it is even better than that. The ndb_import tool can also take advantage of the parallelism offered by MySQL NDB Cluster. This means that in the end, the overall performance is expected to be quite a bit better than loading the data using mysqlimport or LOAD DATA.

Note

The speed-up will depend on your setup. A cluster with all nodes in a VM on a laptop may not experience any significant speed-up compared to other methods of importing data.

So, how does this work? Let us look at a simple example. We will import the following data from the data.txt file:

# ID    First   Surname Birthday
1	John	Doe	1945-10-15
2	Jane	Doe	1947-08-11
3	Mary	Smith	1974-12-04
4	Mike	Jones	1980-11-12

The data includes the first name, surname, and birthday for four persons separated by \t, and it will be imported into the db1.t1 table:

CREATE SCHEMA IF NOT EXISTS db1;

CREATE TABLE db1.t1 (
  id int unsigned NOT NULL auto_increment,
  FirstName varchar(20) NOT NULL,
  Surname varchar(20) NOT NULL,
  Birthday date NOT NULL,
  PRIMARY KEY (id)
) ENGINE=NDBCluster CHARACTER SET=utf8mb4;

Information

You still need to create the table through an SQL node.

With the data and table, the data can be imported using the following ndb_import command:

shell$ ndb_import --ndb-connectstring=localhost:1186 db1 t1.txt \
                  --connections=2 --db-workers=2 \
                  --fields-terminated-by="\t" --ignore-lines=1 \
                  --lines-terminated-by="\n"

The arguments in the first line of the command tells how to connect to the management node, the schema name, and the name of the file to import data from. Like for mysqlimport, the table name is derived from the file name by using the basename, t1 in this case, as the table name.

The --connections and --db-workers options defines the parallelism of the job. The --connections option is equivalent to the ndb-cluster-connection-pool option that can be set for SQL nodes and defines how many connections the ndb_import process creates to the data nodes. There must be at least as many free API node slots available as the number of connections requested. Otherwise the command will hang while waiting for enough connections to become available and eventually fail. The --db-workers option specifies the number of threads ndb_import will use per data node.

The remaining options are equivalent to the options available for mysqlimport, LOAD DATA, and SELECT ... INTO OUTFILE. They specify the format of the data inside the source file and how many rows to skip – one in this case due to the header.

If you need to load data into several tables in the same schema, you can specify a file name for each table, and they will all be processed.

There are many more options available. I will recommend you to check out the reference manual page for ndb_import for more information.

The output of the command is:

job-1 import db1.t1 from t1.txt
job-1 [running] import db1.t1 from t1.txt
job-1 [success] import db1.t1 from t1.txt
job-1 imported 4 rows in 0h0m0s at 9 rows/s
jobs summary: defined: 1 run: 1 with success: 1 with failure: 0

This includes some metadata information about the job such as the table and file names as well as the total number of rows imported, how long time it took, and the number of rows imported.

What happens, if something goes wrong? If we thought the data was comma-separated and had used --fields-terminated-by=",", an error would have occurred:

shell$ ndb_import --ndb-connectstring=localhost:1186 db1 t1.txt \
                  --connections=2 --db-workers=2 \
                  --fields-terminated-by="," \
                  --ignore-lines=1 --lines-terminated-by="\n"
job-1 import db1.t1 from t1.txt
job-1 [running] import db1.t1 from t1.txt
job-1 [error] import db1.t1 from t1.txt
job-1 imported 0 rows in 0h0m0s at 0 rows/s
job-1 rejected 1 rows (limit 0), see t1.rej
job-1 error[gen-0]: error in teams: 1-csv-input (source:740)
job-1 1-csv-input error[data-0]: reject limit 0 exceeded (source:534)
jobs summary: defined: 1 run: 1 with success: 0 with failure: 1

Notice here how the fifth line of the output says “job-1 rejected 1 rows (limit 0), see t1.rej”. Each input file is considered a job, so “job-1” refers to the first (and in this case only) input file. It rejected 1 row, and the limit refers to the –rejects option which specifies how many permanent errors must be encountered before failing the whole job. The default is to fail on the first error. Finally, “see t1.rej” tells us to look at the t1.rej file (created by ndb_import) for more information about the failed job:

shell$ cat t1.rej
runno   rowid   linenr  startpos        endpos  bytes   errortype       errorcode       sourceline      errortext       reject
0       0       2       32      54      22      data    0       1130    line 2: too few fields (1 < 4)  1\tJohn\tDoe\t1945-10-15\n

The t1.rej file is a tab-delimited file with information for each failed row. In this case, the error is that there are too few values: one value was found (because there are no commas in the data and we specified the wrong delimiter) but four values were expected. If --rejects had been set to 3, all four rows would have been included in t1.rej.

The ndb_import binary is included with the NDB Cluster installation and can be found in the bin directory. The next step is for you to try it out. Good luck importing data into MySQL NDB Cluster 7.6.

MySQL NDB Cluster Backups

Today – 31 March – is world backup day, so I thought I would write a little about backups in MySQL NDB Cluster.

Just because NDB Cluster offers built-in redundancy and high availability does not mean backups are not important. They are – as ever and as for everything in software. The redundancy does not protect against user errors (anyone ever executed DROP TABLE or DROP SCHEMA by accident?) neither does it protect against a natural disaster, fire, or another disaster hitting the data center. Similar with high availability.

In short, if the data is in any way remotely important for you, you ensure you have a backup. Furthermore, a backup is not worth any more than your ability to restore it. If a fire rages your data center, it does not help you have the best backup in the world hosted in that data center.

So, before actually creating and restoring a backup, let us look at two best practices when it comes to backups.

Best Practices

The best practices mentioned here are by no means unique to MySQL NDB Cluster nor even databases. They are not exhaustive either, but more meant as something guidelines to have in mind when designing your backups.

Use a Backup Method that Works with Your Product

It sounds pretty obvious – why would you ever use a backup solution that does not work? Obviously no one does that on purpose, but unfortunately it is too common that it has not been checked whether the backup solution is appropriate.

With respect to MySQL NDB Cluster, I can mention that rsync of the NDB file system will not work, neither will any other method of creating a binary backup from the file system (including MySQL Enterprise Backup). It does not work either to use mysqldump unless you keep the cluster read-only for example by putting the cluster into “single user mode” and locking all tables.

When you test your backups make sure that you make changes to the data while the backup is running. A backup method may work when the database is idle, but not when concurrent writes are occurring.

In a little bit, I will show what the recommended way to create an online backup in NDB Cluster is.

Ensure You Can Restore Your Backups

There are two parts to this: can you retrieve your backups even in the worst case scenario, and do you know how to restore your backups?

You cannot assume that a backup that is kept locally on the same host or even in the same data center will be available when you need it. Think in terms of a major disaster such as the entire data center gone. Is it likely to happen? Fortunately not, but from time to time really bad things happens: fires, earthquakes, flooding, etc. Even if it is a once a century event, do you want to run the risk?

So, ensure you are copying your backups off site. How far away you need to copy it depends on several factors, but at least ensure it is not in the same suburb.

The other aspect is that too often, the first time a restore is attempted is when there is a total outage and everyone is in panic mode. That is not the optimal time to learn about the restore requirements and gotchas. Make it routine to restore backups. It serves too purposes: it validates your backups – see also the previous best practice – and it validates your steps to restore a backup.

Creating a Backup

It is very easy to create an online backup of a cluster using MySQL NDB Cluster as it is built-in. In the simplest of cases, it is as trivial as to execute the START BACKUP command in the ndb_mgm client, for example:

shell$ ndb_mgm --ndb-connectstring=localhost:1186 \
               --execute="START BACKUP"
Connected to Management Server at: localhost:1186
Waiting for completed, this may take several minutes
Node 1: Backup 1 started from node 49
Node 1: Backup 1 started from node 49 completed
 StartGCP: 4970 StopGCP: 4973
 #Records: 4025756 #LogRecords: 1251
 Data: 120749052 bytes Log: 50072 bytes

Each backup has a backup ID. In the above example, the ID is 1 (“Backup 1 started from …”). When a backup is started without specifying a backup ID, MySQL NDB Cluster determines what the previously highest used ID is and adds one to that. However, while this is convenient, it does mean the backup ID does not carry any information other than the sequence the backups were made.

An alternative is to explicitly request a given ID. Supported IDs are 1 through 4294967294. One option is to choose the ID to be YYmmddHHMM where YY is the year, mm the month, dd the day, HH the hours in 24 hours format, and MM the minutes. Zero-padded the numbers if the value is less than 10. This makes the backup ID reflect when the backup was created.

To specify the backup ID explicitly specify the requested ID as the first argument after START BACKUP, for example (using the interactive mode of ndb_mgm this time):

ndb_mgm> START BACKUP 1803311603
Waiting for completed, this may take several minutes
Node 1: Backup 1803311603 started from node 49
Node 1: Backup 1803311603 started from node 49 completed
 StartGCP: 5330 StopGCP: 5333
 #Records: 4025756 #LogRecords: 1396
 Data: 120749052 bytes Log: 55880 bytes

Here the backup ID is 1803311603 meaning the backup was created on 31 March 2018 at 16:03.

There are other arguments that can be used, for example to specify whether the snapshot time (where the backup is consistent) should be at the start of the end (the default) of the backup. The HELP START BACKUP command can be used to get online help with the START BACKUP command.

Information

Remember that START BACKUP only backs up NDBCluster tables. Use mysqldump, mysqlpump, or another backup program to backup the schema and/or non-NDBCluster tables.

Restoring a Backup

It is a little more complicated to restore a backup than to create it, but once you have tried it a few times, it should not provide any major issues.

The backups are restored using the ndb_restore program. It is an NDB API program that supports both restoring the schema and data. It is recommended to perform the restore in three steps:

  1. Restore the schema.
  2. Restore the data with indexes disabled.
  3. Rebuild the indexes.

Information

In MySQL NDB Cluster 7.4 and earlier, restoring the schema with ndb_restore did not change the number of partitions to the default of the cluster you restore to. If you have not yet upgraded to MySQL NDB Cluster 7.5, it is recommended to restore the schema from a mysqldump or mysqlpump backup if the cluster does not have the same number of data nodes and LDM threads.

The restore examples assumes you are restoring into an empty cluster. There is also support for partial restores and renaming tables, but that will not be discussed here. Let us take a look at the three steps.

Step 1: Restore the Schema

The schema is restored using the --restore_meta option, for example:

shell$ ndb_restore --ndb-connectstring=localhost:1186 \
                   --nodeid=1 --backupid=1803311603 \
                   --backup_path=/backups/cluster/BACKUP/BACKUP-1803311603 \
                   --restore_meta --disable-indexes
Nodeid = 1
Backup Id = 1803311603
backup path = /backups/cluster/BACKUP/BACKUP-1803311603
2018-03-31 16:28:07 [restore_metadata] Read meta data file header
Opening file '/backups/cluster/BACKUP/BACKUP-1803311603/BACKUP-1803311603.1.ctl'
File size 47368 bytes
Backup version in files: ndb-6.3.11 ndb version: mysql-5.7.21 ndb-7.5.9
2018-03-31 16:28:07 [restore_metadata] Load content
Stop GCP of Backup: 5332
2018-03-31 16:28:07 [restore_metadata] Get number of Tables
2018-03-31 16:28:07 [restore_metadata] Validate Footer
Connected to ndb!!
2018-03-31 16:28:08 [restore_metadata] Restore objects (tablespaces, ..)
2018-03-31 16:28:08 [restore_metadata] Restoring tables
Successfully restored table `world/def/country`
...
2018-03-31 16:28:11 [restore_data] Start restoring table data

NDBT_ProgramExit: 0 - OK

The arguments used here are:

  • –ndb-connectstring=localhost:1186. The host and port number where to connect to the management node(s). This example is from a test cluster with all nodes on the same host. In general you will not be specifying localhost here (never ever have the management and data nodes on the same host or even the same physical server – a topic for another day).
  • –nodeid=1. This tells which node ID to restore from. This is based on the node ID from the cluster where the backup was created. Either data node can be used.
  • –backupid=18033311603. The backup ID to restore.
  • –backup_path=…. The location of the backup files.
  • –restore_meta. Restore the schema (called meta data).
  • –disable-indexes. Do not restore the indexes (we will rebuild them later).

You may wonder why we do not want to restore the indexes. I will get back to that after the restore has been completed.

You should only execute this command once and only for one node id. Before proceeding to the next step, ensure the step completed without errors. The next step is to restore the data.

Step 2: Restore the Data

The command to restore the data is very similar to restoring the schema. The main differences is that –restore_meta will be replaced by –restore_data and that ndb_restore should be used once for each data node that was in the cluster where the backup was created.

For example in case of two data nodes:

shell$ ndb_restore --ndb-connectstring=localhost:1186 \
                   --nodeid=1 --backupid=1803311603 \
                   --backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
                   --restore_data --disable-indexes


shell$ ndb_restore --ndb-connectstring=localhost:1186 \
                   --nodeid=2 --backupid=1803311603 \
                   --backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
                   --restore_data --disable-indexes

These steps can be run in parallel as long as it does not cause an overload of the data nodes. A rule of thumb is that you can execute one ndb_restore –restore_data per host you have data nodes one. I.e. if you have one data node per host, you can restore all parts in parallel. If you have two data nodes per host, it may be necessary to divide the restore into two parts.

The final step is to rebuild the indexes.

Step 3: Rebuild the Indexes

As we disabled the indexes while restoring the schema and data, it is necessary to recreate them. This is done in a similar way to restoring the data – i.e. it should only be done for one node ID, for example:

shell$ ndb_restore --ndb-connectstring=localhost:1186 \
                   --nodeid=1 --backupid=1803311603 \
                   --backup_path=/dev/shm/backup/BACKUP/BACKUP-1803311603 \
                   --rebuild-indexes

That’s it. You can use the data again. But why was it that the indexes where disabled? Let me return to that.

Why Disable Indexes During the Restore?

There are two reasons to disable the indexes while restoring the schema and data:

  • Performance
  • Constraints (unique indexes and foreign keys)

As such, it is only necessary to disable the indexes while restoring the data, but there is no reason to create the indexes during the schema restore just to remove them again in the next step.

By disabling the indexes, there is no need to maintain the indexes during the restore. This allows us to restore the data faster, but then we need to rebuild the indexes at the end. This is still faster though, and if BuildIndexThreads and the number of fragments per data node are greater than 1, the rebuild will happen in parallel like during a restart.

The second thing is that if you have unique keys or foreign keys, it is in general not possible to restore the backup with indexes enabled. The reason is that the backup happens in parallel across the data nodes with the changes happening during the backup recorded separately. When you restore the data, it is not possible to guarantee that data and log are restored in the same order as the changes occurred during the backup. So, to avoid unique key and foreign key errors, it is necessary to disable the indexes until after the data has been restored.

Do not worry – this does not mean that the restored data will be inconsistent. At the end of the backup – and rebuilding the indexes checks for this – the constraints are fulfilled again.

Want to Know More?

This blog really only scratches the surface of backups. If you want to read more, some references are:

What is MySQL NDB Cluster?

I have had the opportunity to write a blog for Apress with a brief introduction to MySQL NDB Cluster. The blog gives a brief overview of the history and why you should consider it. The architecture is described before some key characteristics are discussed.

If you are interested, the blog can be found at https://www.apress.com/us/blog/all-blog-posts/what-is-mysql-ndb-cluster/15454530.

Happy reading.

Over view of the MySQL NDB Cluster architecture.

New Book: Pro MySQL NDB Cluster

It is with great pleasure, I can announce that a new book dedicated to MySQL NDB Cluster has just been released. The book Pro MySQL NDB Cluster is written by my colleague Mikiya Okuno and myself and is a nearly 700 pages deep dive into the world of MySQL NDB Cluster. The book is published by Apress.

Tip: There are several ways to cluster MySQL. This book is about the product MySQL Cluster (often called MySQL NDB Cluster to clarify which cluster it is). There is also MySQL InnoDB Cluster, clustering using replication, and clustering through operating or hardware features. Pro MySQL NDB Cluster is only about the former.

We very much hope you will enjoy the book. Feedback and questions are most welcome, for example on Twitter (@nippondanji and @JWKrogh).

Note: At the time of writing, only the eBook is available for purchase. A softcover version will follow as soon as it has been possible to print it; this can also be pre-ordered now. – Update: The softcover version of the book is now also available.

The book is divided into five parts and 20 chapters.

Part I – The Basics

The first part provides some background information on the various parts in MySQL NDB Cluster and how it works. The chapters are:

  • Chapter 1: Architecture and Core Concepts
  • Chapter 2: The Data Nodes

Part II – Installation and Configuration

The second part focuses on the installation and configuration related topics, including replication between clusters. The chapter are:

  • Chapter 3: System Planning
  • Chapter 4: Configuration
  • Chapter 5: Installation
  • Chapter 6: Replication

Part III – Daily Tasks and Maintenance

In the third part, the topics include tasks that is part of the daily routine as a database administrator plus a tutorial where the tasks discussed in parts II and III are handled through MySQL Cluster Manager. The chapters are:

  • Chapter 7: The NDB Management Client and Other NDB Utilities
  • Chapter 8: Backups and Restores
  • Chapter 9: Table Maintenance
  • Chapter 10: Restarts
  • Chapter 11: Upgrades and Downgrades
  • Chapter 12: Security Considerations
  • Chapter 13: MySQL Cluster Manager

Chapter IV – Monitoring and Troubleshooting

The fourth part continues with two topics that are also part of the daily routine: monitoring and troubleshooting. The chapters are:

  • Chapter 14: Monitoring Solutions and the Operating System
  • Chapter 15: Sources for Monitoring Data
  • Chapter 16: Monitoring MySQL NDB Cluster
  • Chapter 17: Typical Troubles and Solutions

Chapter V – Development and Performance Tuning

The final part covers topics that are related to development and getting the tuning the cluster with respect to performance. The chapters are:

  • Chapter 18: Developing Applications Using SQL with MySQL NDB Cluster
  • Chapter 19: MySQL NDB Cluster as a NoSQL Database
  • Chapter 20: MySQL NDB Cluster and Application Performance Tuning

Working Around MySQL Cluster Push Down Limitations Using Subqueries

This post was originally published on the MySQL Support Team Blog at https://blogs.oracle.com/mysqlsupport/entry/working_around_mysql_cluster_push on 5 August 2016.

I worked on an issue recently where a query was too slow when executed in MySQL Cluster. The issue was that Cluster has some restrictions when it comes to push down conditions.

As an example of this, consider the following query using the employees sample database. The query takes a look at the average salary based on how many years the employee has been with the company. As the latest hire date in the database is in January 2000, the query uses 1 February 2000 as the reference date.

Initially the query performs like (performance is with two data nodes and all nodes in the same virtual machine on a laptop, so the timings are not necessarily representative of a production system, though the improvements should be repeatable):

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
+-----------------+-------------+------------+
| LengthOfService | NoEmployees | AvgSalary  |
+-----------------+-------------+------------+
|               0 |        1230 | 53051.4211 |
|               1 |        3667 | 54488.7478 |
|               2 |        5893 | 55670.2121 |
|               3 |        8305 | 57162.7269 |
|               4 |       10647 | 58557.9498 |
|               5 |       12804 | 60132.0652 |
|               6 |       15267 | 61769.0817 |
|               7 |       17540 | 62992.4075 |
|               8 |       19290 | 64468.6666 |
|               9 |       21737 | 66047.8462 |
|              10 |       24056 | 67671.1557 |
|              11 |       26488 | 68748.2062 |
|              12 |       28294 | 70409.9716 |
|              13 |       30249 | 72006.3509 |
|              14 |       32572 | 73478.7101 |
|              15 |         297 | 74331.7798 |
+-----------------+-------------+------------+
16 rows in set (23.89 sec)

The straight join is needed as the performance is better than leaving the join order up to the optimizer.

The schema for the two tables in use is:

mysql> SHOW CREATE TABLE employees\G
*************************** 1. row ***************************
       Table: employees
Create Table: CREATE TABLE `employees` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

mysql> SHOW CREATE TABLE salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`),
  CONSTRAINT `FK_130_154` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Why this poor performance? Looking at the EXPLAIN plan is the first step:

mysql> EXPLAIN EXTENDED SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService, COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary FROM salaries STRAIGHT_JOIN employees ON employees.emp_no = salaries.emp_no WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01' GROUP BY LengthOfService\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
         type: ALL
possible_keys: PRIMARY,emp_no
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2844047
     filtered: 100.00
        Extra: Using where with pushed condition ((`employees`.`salaries`.`from_date` <= '2000-02-01') and (`employees`.`salaries`.`to_date` >= '2000-02-01')); Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: employees.salaries.emp_no
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set, 3 warnings (0.00 sec)

The EXPLAIN plan itself does not look bad – the index usage is as expected. However note the 3 warnings – one is the usual rewritten query after the application of rewriting and optimizer rules, but the two other gives more information why the performance is not what would be expected:

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: Table 'salaries' is not pushable: GROUP BY cannot be done using index on grouped columns.
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: Table 'employees' is not pushable: GROUP BY cannot be done using index on grouped columns.
*************************** 3. row ***************************
...

Here it can be seen that the tables are not pushable, because they are involved in the GROUP BY.

This gave me an idea. A couple of years ago, I wrote a post on the MySQL Server Blog about MySQL Server Blog: Better Performance for JOINs Not Using Indexes. These materialized temporary tables can also include auto keys that can improve the join performance significantly. What if a similar tactics is used with the above query?

Changing the join on the employees table to join a subquery:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN (SELECT * FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
...
16 rows in set (6.64 sec)

That is more than a factor 3 improvement.

The new EXPLAIN plan is:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: salaries
         type: ALL
possible_keys: PRIMARY,emp_no
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2844047
        Extra: Using where with pushed condition; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ref
possible_keys: <auto_key0>
          key: <auto_key0>
      key_len: 4
          ref: employees.salaries.emp_no
         rows: 10
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: employees
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 300024
        Extra: NULL
3 rows in set (0.01 sec)

Note how <auto_key0> is used for the join on the derived table. This is what helps make the rewrite work.

Actually we can do a bit better. The above subquery selects all columns from the employees table even though only the emp_no and hire_date columns are used. So the next rewrite is:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM salaries
              STRAIGHT_JOIN (SELECT emp_no, hire_date FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
        GROUP BY LengthOfService;
...
16 rows in set (5.74 sec)

The improvement of only including the columns needed will vary. For example if the internal temporary table ends up being converted into an on-disk table because of the extra data, or a covering index no longer can be used can increase the importance of only choosing the columns needed.

A final slight improvement can be gained by also converting the salaries table into a subquery:

mysql> SELECT FLOOR(DATEDIFF('2000-02-01', hire_date)/365) AS LengthOfService,
              COUNT(DISTINCT employees.emp_no) AS NoEmployees, AVG(salary) AS AvgSalary
         FROM (SELECT emp_no, salary
                 FROM salaries
                WHERE salaries.from_date <= '2000-02-01' AND salaries.to_date >= '2000-02-01'
              ) AS salaries
              STRAIGHT_JOIN (SELECT emp_no, hire_date FROM employees) AS employees ON employees.emp_no = salaries.emp_no
        GROUP BY LengthOfService;
...
16 rows in set (5.36 sec)

The total speed up is from a little under 24 seconds to a little over 5 second or more than a factor 4.

Note: the above rewrite will not work out of the box in MySQL Cluster 7.5. The reason is that it is based on MySQL Server 5.7 where the optimizer can perform more advanced rewrites than it was possible on MySQL 5.6. So by default the above subqueries will be rewritten to normal joins and you end up with the original query again. To use the tactics of subqueries in Cluster 7.5, it is necessary first to disable the optimizer switch allowing the optimizer to rewrite the subqueries to normal joins:

mysql> SET optimizer_switch='derived_merge=off';

New Member of the Cluster API Family: The Native Node.js Connector

MySQL Cluster 7.3 went GA yesterday and with it came a new member of the MySQL Cluster API family: mysql-js – a native Node.js connector. mysql-js uses the NDB API to connect directly to the data nodes which improves performance compared to executing queries through the MySQL nodes.

For an introduction to mysql-js and installation instructions I will recommend taking a look at the official API documentation and Andrew Morgan’s blog; the latter also has an overview of the new features in MySQL Cluster 7.3 in general.

To get a feel for how the new API works, I went ahead and created a small test program that will take one or more files with delimited data (e.g. similar to what you get with SELECT ... INTO OUTFILE and inserts the data into a table. I have tried to keep things simple. This means that no other external modules than mysql-js is used, not very much error handling has been included, the reading, parsing of the data files could be done much better, performance has not been considered, etc. – but I would rather focus on the usage of mysql-js.

The complete example can be found in the file nodejs_tabinsert.js. The following will go through the important bits.

Preparation

The first part of the script is not really specific to mysql-js, so I will go lightly over that. A few of the arguments deserve a couple of extra words:

  • –log-level: when set to debug or detail some output with information about what happens inside mysql-js is logged. This can be useful to learn more about the module or for debugging.
  • –basedir: this is the same as the basedir option for mysqld – it sets where MySQL has been installed. It is used for loading the mysql-js module. Default is /usr/local/mysql.
  • –database and –table: which table to insert the data into. The default database is test, but the table name must always be specified.
  • –connect-string: the script connects directly to the cluster nodes, so it needs the NDB connect-string similar to other NDB programs. The default is localhost:1186.
  • –delimiter: the delimiter used in the data files. The default is a tab (\t).

Setting Up mysql-js

With all the arguments parsed, it is not possible to load the mysql-js module:

// Create the mysql-js instance - look for it in $basedir/share/nodejs
var nosqlPath = path.join(basedir, 'share', 'nodejs');
var nosql = require(nosqlPath);

// unified_debug becomes available once nosql has been loaded. So set up
// the log level now.
if (logLevel != 'default') {
   unified_debug.on();
   switch (logLevel) {
      case 'debug':
         unified_debug.level_debug();
         break;

      case 'detail':
         unified_debug.level_detail();
         break;
   }
}

// Configure the connections - use all defaults except for the connect string and the database
var dbProperties = nosql.ConnectionProperties('ndb');
dbProperties.ndb_connectstring = ndbConnectstring;
dbProperties.database          = databaseName;

The unified_debug class is part of mysql-js and allows to get debug information from inside mysql-js logged to the console.

The nosql.ConnectionProperties() method will return an object with the default settings for the chosen adapter – in this case ndb. After that we can change the settings where we do not want the defaults. It is also possible to use an object with the settings as the argument instead of ‘ndb’; that requires setting the name of the adapter using the “implementation” property. Currently the two supported adapters are ‘ndb’ (as in this example) and ‘mysql’ which connects to mysqld instead. ‘mysql’ required node-mysql version 2.0 and also support InnoDB.

As the ‘ndb’ adapter connects directly to the cluster nodes, no authentication is used. This is the same as for the NDB API.

Callbacks and Table Mapping Constructor

var trxCommit = function(err, session) {
   if (err) {
      failOnError(err, 'Failed to commit after inserting ' + session.insertedRows + ' rows from ' + session.file + '.');
   }
   session.close(function(err) {
      if (err) {
         failOnError(err, 'Failed to close session for ' + session.file + '.');
      }
   });
}

We will load each file inside a transaction. The trxCommit() callback will verify that the transaction was committed without error and then closes the session.

var onInsert = function(err, session) {
   session.insertedRows++;
   if (err &amp;&amp; err.ndb_error !== null) {
      failOnError(err, 'Error onInsert after ' + session.insertedRows + ' rows.');
   }

   // Check whether this is the last row.
   if (session.insertedRows === session.totalRows) {
      session.currentTransaction().commit(trxCommit, session);
   }
};

The onInsert callback checks whether each insert worked correctly. When all rows for the session (file) have been inserted, it commits the transaction.

var tableRow = function(tableMeta, line) {
   // Skip empty lines and comments
   if (line.length > 0 &amp;&amp; line.substr(0, 1) != '#') {
      var dataArray = line.split(delimiter);
      for (var j = 0; j < tableMeta.columns.length; j++) {
         this[tableMeta.columns[j].name] = dataArray[tableMeta.columns[j].columnNumber];
      }
   }
}

The tableRow is the constructor later used for the table mapping. It is used to set up the object with the data to be inserted for that row. tableMeta is a TableMetaData object with information about the table we are inserting into.

The Session

This is were the bulk of the work is done. Each file will have it’s own session.

var onSession = function(err, session, file) {
   if (err) {
      failOnError(err, 'Error onSession.');
   }

   // Get the metadata for the table we are going to insert into.
   // This is needed to map the lines read from the data files into row objects
   // (the mapping happens in tableRow() ).
   session.getTableMetadata(databaseName, tableName, function(err, tableMeta) {
      if (err) {
         failOnError(err, 'Error getTableMetadata.');
      }

      var trx = session.currentTransaction();
      trx.begin(function(err) {
         if (err) {
            failOnError(err, 'Failed to start transaction for "' + file + '".');
         }
      });

      session.insertedRows = 0;
      session.file         = file;
      console.log('Reading: ' + file);
      fs.readFile(file, { encoding: 'utf8', flag: 'r' }, function(err, data) {
         if (err) {
            failOnError(err, 'Error reading file "' + file + '"');
         }

         // First find the rows to inserted
         console.log('Analysing: ' + file);
         var rows  = [];
         session.totalRows = 0;
         data.split('\n').forEach(function(line) {
            var row = new tableRow(tableMeta, line);
            if (Object.keys(row).length > 0) {
               rows[session.totalRows++] = row;
            }
         });

         // Insert the rows
         console.log('Inserting: ' + file);
         rows.forEach(function(row) {
            session.persist(row, onInsert, session);
         });
      });
   });
};

The onSession function is a callback that is used when creating (opening) the sessions.

The first step is to get the meta data for the table. As all data is inserted into the same table, in principle we could reuse the same meta data object for all sessions, but the getTableMetaData() method is a method of the session, so it cannot be fetched until this point.

Next a transaction is started. We get the transaction with the session.currentTransaction() method. This returns an idle transaction which can then be started using the begin() method. As such there is not need to store the transaction in a variable; as can be seen in the trxCommit() and onInsert() callbacks above, it is also possible to call session.currnetTransaction() repeatedly – it will keep returning the same transaction object.

The rest of the onSession function processes the actual data. The insert itself is performed with the session.persist() method.

Edit: using a session this way to insert the rows one by one is obviously not very efficient as it requires a round trip to the data nodes for each row. For bulk inserts the Batch class is a better choice, however I chose Session to demonstrate using multiple updates inside a transaction.

Creating the Sessions

var annotations = new nosql.TableMapping(tableName).applyToClass(tableRow);
files.forEach(function(file) {
   nosql.openSession(dbProperties, tableRow, onSession, file);
});

First the table mapping is defined. Then a session is opened for each file. Opening a session means connecting to the cluster, so it can be a relatively expensive step.

Running the Script

To test the script, the table t1 in the test database will be used:

CREATE TABLE `t1` (
  `id` int(10) unsigned NOT NULL PRIMARY KEY,
  `val` varchar(10) NOT NULL
) ENGINE=ndbcluster DEFAULT CHARSET=utf8;

For the data files, I have been using:

t1a.txt:

# id    val
1       a
2       b
3       c
4       d
5       e

t1b.txt:

# id    val
6       f
7       g
8       h
9       i
10      j

Running the script:

shell$ export LD_LIBRARY_PATH=/usr/local/mysql/lib
shell$ node nodejs_tabinsert.js --table=t1 t1a.txt t1b.txt
Connected to cluster as node id: 53
Reading: t1b.txt
Reading: t1a.txt
Analysing: t1b.txt
Inserting: t1b.txt
Analysing: t1a.txt
Inserting: t1a.txt

One important observation is that even though the session for t1a.txt is created before the one for t1b, the t1b.txt file is ending up being inserted first. Actually if the inserts were using auto-increments, it would be possible to see that in fact, the actual assignment of auto-increment values will in general alternate between rows from t1b.txt and t1a.txt. The lesson: in node.js do not count on knowing the exact order of operations.

I hope this example will spark your interest in mysql-js. Feedback is most welcome – both bug reports and feature requests can be reports at bugs.mysql.com.