Testing MySQL NDB Cluster with dbdeployer

A great way to install MySQL when you need to do quick tests is to use a sandbox tool. This allows you to perform all the installation steps with a single command making the whole process very simple, and it allows for automation of the test. Giuseppe Maxia (also known as the Data Charmer, @datacharmer on Twitter) has for many years maintained sandbox tools for MySQL, first with MySQL Sandbox and now with dbdeployer.

One of the most recent features of dbdeployer is the support for MySQL NDB Cluster. In this blog, I will take this feature and test it. First, I will briefly discuss what MySQL NDB Cluster is, then install dbdeployer, and finally set up a test cluster.

Deploying a MySQL NDB Cluster cluster with dbdeployer.
Deploying a MySQL NDB Cluster cluster with dbdeployer.

What is MySQL NDB Cluster?

MySQL NDB Cluster is primarily an in-memory database (but also with support for on-disk data) that has been designed from day one to be highly available and providing consistent response times. A cluster consists of several nodes which can be one of three types:

  • Data Nodes: This is where the actual data is stored. Currently there is support for up to 48 data nodes in a cluster with up to 1TiB of data memory for each node.
  • API Nodes: These are the nodes where queries are executed on. An API node can be a normal mysqld process (also known as an SQL node), or it can be a NoSQL node using the C++ (this is the native NDB API), Java (ClusterJ), memcached, or Node.js API.
  • Management Nodes: These nodes hold the configuration of the cluster, and one of the management nodes is the most common choice as an arbitrator in case it is necessary decide between two halves of data nodes to avoid a split brain scenario.

Tip

If you are interested in learning more about the arbitration process, then I wrote a blog earlier about the importance of installing the management nodes on different hosts than where the data nodes are installed. This blog includes several examples of handling node failures and arbitration.

You will typically have at least two data nodes in a cluster with two copies (replicas) of the data. This allows one data node to be offline while the cluster stays online. More data nodes can be added to increase the capacity or to add more data partitions. The data partitioning (sharding) and the replicas is all handled automatically, including when querying the data.

Over view of the MySQL NDB Cluster architecture.
Over view of the MySQL NDB Cluster architecture.

All of this means that you will end up with quite a few nodes. In a production cluster, you need at least two of each node type to have high availability. Even though you may not need high availability for your testing, you will still need at least two data nodes, one management node, and one SQL node. Being able to automate the installation of the cluster is a great help when you need to do a quick test – which brings us to dbdeployer. The first step is to install it.

Want to Know More about MySQL NDB Cluster?

I am one of the authors of Pro MySQL NDB Cluster (Apress) which is an almost 700 pages long book dedicated to MySQL NDB Cluster. You can buy it from Apress (print or DRM free ePub and PDF), Amazon (print and Kindle/Mobi), Barnes & Nobles (print), and other book shops.

I have also written a brief introduction to MySQL NDB Cluster – but with a little more information than above – on Apress' blog.

Installing dbdeployer

It is simple to install dbdeployer. From the dbdeployer's GitHub page, there are releases that can be downloaded and easily installed. For this blog, I am using release 1.24.0 on Linux. I will recommend you to use the latest release. In addition to Linux, dbdeployer is also available for macOS. Unfortunately there is no Microsoft Windows support.

An example of downloading and installing dbdeployer is:

shell$ mkdir Downloads

shell$ cd Downloads/

shell$ wget https://github.com/datacharmer/dbdeployer/releases/download/v1.24.0/dbdeployer-1.24.0.linux.tar.gz
...
HTTP request sent, awaiting response... 200 OK
Length: 4888282 (4.7M) [application/octet-stream]
Saving to: ‘dbdeployer-1.24.0.linux.tar.gz’

100%[================================>] 4,888,282   1.70MB/s   in 2.8s   

2019-03-25 17:48:54 (1.70 MB/s) - ‘dbdeployer-1.24.0.linux.tar.gz’ saved [4888282/4888282]

shell$ tar -zxf dbdeployer-1.24.0.linux.tar.gz 

shell$ mkdir ~/bin

shell$ mv dbdeployer-1.24.0.linux ~/bin/dbdeployer

shell$ export PATH=${PATH}:~/bin

This downloads and unpacks the 1.24.0 release into the Downloads directory. Then the dbdeployer binary is moved to the ~/bin directory and renamed to dbdeployer. Finally, the ~/bin directory is added to the path searched when executing a command, so it is not necessary to specify the path each dbdeployer is executed. There are other ways to perform these steps and other options where to install it; see also the official documentation.

That it is. Now it is possible to install a test cluster.

Installing a Test Cluster

Since dbdeployer works on a single host, all of the nodes will be installed on the same host. While this is bad for a production cluster, it is perfectly fine for most test clusters.

Warning

While a single host cluster is great for most tests, for testing your application before a deployment to production, it is recommended to use a multi-host cluster that is as similar to your production cluster as possible.

The first step is to download MySQL NDB Cluster as a tar-ball. You can get the latest patch release of each version from https://dev.mysql.com/downloads/cluster/. If you need to test with an older release, you can get that from https://downloads.mysql.com/archives/cluster/. In this example, MySQL NDB Cluster 7.6.9 is downloaded from the latest releases and places in the ~/Downloads directory:

shell$ cd ~/Downloads/

shell$ wget https://dev.mysql.com/get/Downloads/MySQL-Cluster-7.6/mysql-cluster-gpl-7.6.9-linux-glibc2.12-x86_64.tar.gz
...
HTTP request sent, awaiting response... 200 OK
Length: 914236523 (872M) [application/x-tar-gz]
Saving to: ‘mysql-cluster-gpl-7.6.9-linux-glibc2.12-x86_64.tar.gz’

100%[================================>] 914,236,523  699KB/s   in 23m 52s

2019-03-25 18:49:29 (624 KB/s) - ‘mysql-cluster-gpl-7.6.9-linux-glibc2.12-x86_64.tar.gz’ saved [914236523/914236523]

Once the download has completed, use the unpack command of dbdeployer to unpack the downloaded file:

shell$ dbdeployer unpack --prefix=ndb ~/Downloads/mysql-cluster-gpl-7.6.9-linux-glibc2.12-x86_64.tar.gz 
Unpacking tarball /home/dbdeployer/Downloads/mysql-cluster-gpl-7.6.9-linux-glibc2.12-x86_64.tar.gz to $HOME/opt/mysql/ndb7.6.9
.........100.........200.........300.........400.........500........
...
.........20300.........20400.........20500.........20600.........2070020704
Renaming directory /home/dbdeployer/opt/mysql/mysql-cluster-gpl-7.6.9-linux-glibc2.12-x86_64 to /home/dbdeployer/opt/mysql/ndb7.6.9

You are now ready for the actual creation of the test cluster. This is done using the deploy command:

shell$ dbdeployer deploy replication ndb7.6.9 --topology=ndb --concurrent
$HOME/sandboxes/ndb_msb_ndb7_6_9/initialize_nodes
MySQL Cluster Management Server mysql-5.7.25 ndb-7.6.9
2019-03-27 17:22:16 [ndbd] INFO     -- Angel connected to 'localhost:20900'
2019-03-27 17:22:16 [ndbd] INFO     -- Angel allocated nodeid: 2
2019-03-27 17:22:17 [ndbd] INFO     -- Angel connected to 'localhost:20900'
2019-03-27 17:22:17 [ndbd] INFO     -- Angel allocated nodeid: 3
executing 'start' on node 1
............ sandbox server started
executing 'start' on node 2
.... sandbox server started
executing 'start' on node 3
.... sandbox server started
NDB cluster directory installed in $HOME/sandboxes/ndb_msb_ndb7_6_9
run 'dbdeployer usage multiple' for basic instructions'

This creates a cluster with two data nodes, one management nodes, and three SQL nodes. The nodes have been installed in the ${HOME}/sandboxes/ndb_msb_ndb7_6_9/ directory:

shell$ ls sandboxes/ndb_msb_ndb7_6_9/
check_nodes          ndb_conf  node3               test_replication
clear_all            ndb_mgm   restart_all         test_sb_all
cluster_initialized  ndbnode1  sbdescription.json  use_all
initialize_nodes     ndbnode2  send_kill_all       use_all_masters
n1                   ndbnode3  start_all           use_all_slaves
n2                   node1     status_all
n3                   node2     stop_all

Notice how there for example is an ndb_mgm script. This is a wrapper script around the ndb_mgm binary in the MySQL installation – the MySQL NDB Cluster management client. This makes it easy to connect to the management node, for example to check the status of the cluster:

shell$ ./sandboxes/ndb_msb_ndb7_6_9/ndb_mgm -e "SHOW"
Connected to Management Server at: localhost:20900
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0, *)
id=3    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)

[mysqld(API)]   4 node(s)
id=4    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)
id=5    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)
id=6    @127.0.0.1  (mysql-5.7.25 ndb-7.6.9)
id=7 (not connected, accepting connect from localhost)

Before wrapping up, let's see how you can connect to the different SQL nodes and see how they indeed query the same data.

Testing the Cluster

As a simple test, connect to the first SQL node and create a table. Then, connect to the second SQL node and insert a row. Finally, connect to the third SQL node and query the data.

The SQL nodes are in the node* directories in ${HOME}/sandboxes/ndb_msb_ndb7_6_9/. Each of those work in the same way as for a standalone MySQL Server sandbox, so you can use the use wrapper script to connect using the MySQL command-line client:

shell$ ./sandboxes/ndb_msb_ndb7_6_9/node1/use 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.25-ndb-7.6.9-cluster-gpl-log MySQL Cluster Community Server (GPL)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

node1 [localhost:27510] {msandbox} ((none)) > 

Now, the table can be created (output has been reformatted):

node1 [localhost:27510] {msandbox} ((none)) > \R
Returning to default PROMPT of mysql> 

mysql> CREATE SCHEMA db1;
Query OK, 1 row affected (0.42 sec)

mysql> CREATE TABLE db1.t1 (
          id int unsigned NOT NULL auto_increment PRIMARY KEY,
          val varchar(36)
       ) ENGINE=NDBCluster;
Query OK, 0 rows affected (2.64 sec)

I changed the prompt back to the default mysql> prompt. This is not because I don't like the prompt created by dbdeployer, but simply to make the formatting of the queries nicer. In general, I do prefer my prompt to tell me where I am connected, so the normal dbdeployer prompt will otherwise work well for me.

The table creation is just like normal except the engine is set to NDBCluster. This is the engine name that tells MySQL to create the table in the data nodes.

The second step is to connect to the second instance and insert a row:

node2 [localhost:27511] {msandbox} ((none)) > INSERT INTO db1.t1 (val) VALUES (UUID());
Query OK, 1 row affected (0.11 sec)

node2 [localhost:27511] {msandbox} ((none)) > SELECT * FROM db1.t1;
+----+--------------------------------------+
| id | val                                  |
+----+--------------------------------------+
|  1 | 84f59369-5051-11e9-9078-08002709eea3 |
+----+--------------------------------------+
1 row in set (0.05 sec)

Notice how this worked without creating the table. Since the table was created in the data nodes, all SQL nodes that connect to these data nodes will automatically know about the table.

Finally, confirm the data is also available in the third node:

node3 [localhost:27512] {msandbox} ((none)) > SELECT * FROM db1.t1;
+----+--------------------------------------+
| id | val                                  |
+----+--------------------------------------+
|  1 | 84f59369-5051-11e9-9078-08002709eea3 |
+----+--------------------------------------+
1 row in set (0.12 sec)

Verdict

It is fantastic that dbdeployer now support MySQL NDB Cluster as well. It will be a great help performing tests. I do have some comments based on my testing. It is very likely some of those are just do to the fact, that this is my initial use of dbdeployer and thus, I will not claim that I understand all details of how it works yet, so do not take the following comments as the final word – nor are the comments meant as negative criticism:

  • I find it a little confusing that a cluster is considered a replication topology. Yes, there is synchronous replication between the data nodes, but it is not related to the replication you have between two MySQL Server instances (which is also supported between two clusters). Personally, I would have called a single cluster for a single sandbox, and then allow for a (future) feature setting up two clusters with replication between them.
  • The restart_all sandbox command literally shuts down the whole cluster, then starts it again (but see also two items later). For MySQL NDB Cluster there are essentially two different types of restarts (which each can either be a normal or an initial restart):
    • System Restart: All data nodes at least are shut down together, then started together. This is what restart_all implements.
    • Rolling Restart: The cluster as a whole remains online throughout the restart phase. This is done by always leaving one data node in each node group online while restarting the data nodes. SQL nodes are restarted such that at least one SQL node is online at all times. This is the normal way to do most configuration changes as it avoids downtime. I miss this restart type.
  • There does not seem to be any way to choose between normal and initial restarts.
  • The start_all does not start the management and data nodes (only the SQL nodes are started). This may be on purpose, but seems inconsistent with stop_all that does shut down the management and data nodes. Actually, I have not been able to find a way to start the cluster cleanly. There is initialize_nodes that will start the management and data nodes, but the script will also try to start the SQL nodes and load grants into the SQL nodes.
  • The stop_all script, first shuts down the management and data nodes. Then the SQL nodes. It is better to do it in the opposite order as it avoids errors on the SQL nodes if queries are executed during the shutdown. In older versions of MySQL NDB Cluster, it could also take a long time to shut down an SQL node that had lost the connection to the data nodes.
  • The management node is given NodeId = 1 and the data nodes the subsequent ids. Data nodes can only have ids 1-48, so I always recommend reserving these ids for data nodes, and make the first management node have NodeId = 49 and SQL nodes later ids.
  • There does not seem to be any way to change the number of management nodes. The --ndb-nodes option appears to be taken as one management node, and the rest as data nodes. Maybe a better way would be to have two options like:
    • --ndb-nodegroups: The number of node groups in the cluster. The number of data nodes can then be calculated as <# Node Groups> * NoOfReplicas.
    • --ndb-mgmnodes: The number of management nodes.
  • There is no check whether the number of NDB nodes is valid. For example with --ndb-nodes=4, dbdeployer tries to create a cluster with three data nodes which is not valid with NoOfReplicas = 2.
  • I did not find any way to specify my preferred configuration of the cluster as part of the sandbox deployment.
  • Consider adding the --reload option when starting ndb_mgmd (the management node). This will make the management node check whether there are any changes to the cluster configuration (stored in <path to sandbox>/ndb_conf/config.ini) and if so apply those changes.

This may seem like a long list of comments, but I am also very well aware that support for MySQL NDB Cluster has only just been added, and that it takes time to implement all the details. Rome was not built in one day.

So, I would very much like to conclude with a big thank you to the Data Charmer. This is a great initial implementation.

I have worked with MySQL databases since 2006 both as an SQL developer, a database administrator, and for more than eight years as part of the Oracle MySQL Support team. I have spoken at MySQL Connect and Oracle OpenWorld on several occasions. I have contributed to the sys schema and four Oracle Certified Professional (OCP) exams for MySQL 5.6 to 8.0. I have written four books, all published at Apress.

4 Comments on “Testing MySQL NDB Cluster with dbdeployer

  1. Jesper,
    Many thanks for the thorough review!

    Please consider the following three points to understand possible improvements:

    1. I am not an expert on NDB cluster. I am not even a regular user. I barely know how to set it up. What I did is based on the requests and limited feedback I received from willing NDB users.

    2. The naming and the nodes used are constraints that come from dbdeployer infrastructure, and the need to use the same commands to deal with very different entities, such as master/slave replication, group of independent MySQL servers, multi-source replication (fan-in, all-masters), group replication, Percona Xtradb cluster. Thus, NDB support should as much as possible fit into the existing paradigms. For example, the current script organization allows you to run “test_replication” in every topology, and get the expected result. For NDB, every SQL node creates a table with engine ndbcluster, and each node retrieves the tables from all nodes.

    3. dbdeployer is built for configurability. There is a publicly viewable template for every script that you find in the sandbox. You can export the templates, edit them, re-import them, and eventually create your own improvements. (https://github.com/datacharmer/dbdeployer#Sandbox-customization)

    The number 3 above is the key. By understanding how the system works, interested users can propose enhancements. Using this framework, users of TiDB (of which I know even less than NDB clusters) were able to propose features and submit code enhancements. I expect a similar interaction with NDB users to improve the tool.

    Looking forward to seeing the improvement wanted converted into dbdeployer issues and eventually pull requests!

    • Hi Giuseppe,

      Thanks for the comments.

      Yes, I am fully aware that it is not easy to extend a framework to cover completely new topology, so I understand it is necessary to make compromises.

      I will take a look at the link to the customization. Thanks for providing that. As mentioned, I have to admit that I am very new to dbdeployer.

      Thanks,
      Jesper

  2. Hi,
    I just had a doubt. I am a complete beginner btw.
    But I am not getting all the nodes to connect simultaneously.
    Can you help me out and tell me how to make them work?

    • Hi Antonio,

      Can I get you to clarify, what you mean when you say the nodes do not connect simultaneously? And which nodes (SQL, data, or management)?

      You can start taking a look at the output of the ndb_mgm -e “SHOW” command. Note that in that output, it is common to have unconnected nodes, so you have some spare nodes defined, if you need them later.

      You can also look at the error logs and see if anything stands out.

      Best regards,
      Jesper

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.