In MySQL NDB Cluster, the management node (
ndb_mgmd) is a lightweight process that among other things handles the configuration of the cluster. Since it is lightweight. It can be tempting to install it with one of the other nodes. However, if you want a high-availability setup, you should never install it on the same host as a data node (
ndbmtd). If you do that, it can cause a total cluster outage where the cluster could otherwise have survived.
The first sign of trouble occurs when you start the management nodes. The following warning is printed to standard output:
2018-08-22 18:04:14 [MgmtSrvr] WARNING -- at line 46: Cluster configuration warning: arbitrator with id 49 and db node with id 1 on same host 192.0.2.1 arbitrator with id 50 and db node with id 2 on same host 192.0.2.2 Running arbitrator on the same host as a database node may cause complete cluster shutdown in case of host failure.
To understand why this setup can cause a complete cluster shutdown, it is necessary first to review how a node failure is handled in MySQL NDB Cluster.
Node Failure Handling
When a data node fails, the cluster automatically evaluates whether it is safe to continue. A node failure can in this respect be either that a data node crashes or there is a network outage meaning one or more nodes cannot be seen by the other nodes.
A clean node shutdown (such as when using the recommended
STOP command in the management client) is not subject to the evaluation as the other nodes will be notified of the shutdown.
So, how does MySQL NDB Cluster decide whether, the cluster can continue after a node failure or not? And if there are two halves, which one gets to continue?
Assuming two replicas (
NoOfReplicas = 2), the rules are quite simple. To quote Pro MySQL NDB Cluster:
“In short, data nodes are allowed to continue if the following conditions are true:
- The group of data nodes holds all the data
- Either more than half the data nodes are in the group or the group has won the arbitration process.”
For the group of data nodes to hold all the data, there must be one data node from each node group available. A node group is a group of
NoOfReplicas nodes that share data. The arbitration process refers to the process of contacting the arbitrator (typically a management node) – the first half to make contact will remain online.
This is all a bit abstract, so let’s take a look at a couple of examples.
Consider a cluster with two data nodes and two management nodes. Most of the examples will have a management node installed on each of the hosts with the data nodes. The last example will as contrast have the management nodes on separate hosts.
The starting point is thus a cluster using two hosts each with one data node and one management node as shown in this figure:
The green colour represents that the data node is online. The blue colour for the management node with Node Id 49 is the arbitrator, and the yellow management node is on standby.
This is where the problem with the setup starts to show up. The arbitrator is the node that is involved when there is exactly half the data nodes available after a node failure. In that case, the data node(s) will have to contact the arbitrator to confirm whether it is OK to continue. This avoids a split-brain scenario where there are two halves with all the data; in that case it is imperative that one half is shut down or the data can start to diverge. The half that is first to contact the arbitrator survives, the other is killed (STONITH – shoot the other node in the head).
So, let’s look at a potential split-brain scenario.
A split-brain scenario can occur when the network between the two halves of the cluster is lost as shown in the next figure:
In this case the network connection between the two hosts is lost. Since both nodes have all data, it is necessary with arbitration to decide who can continue. The data node with Id 1 can still contact the arbitrator as it is on the same host, but Node Id 2 cannot (it would need to use the network that is down). So, Node Id 1 gets to continue whereas Node Id 2 is shut down.
So far so good. This is what is expected. A single point of failure does not lead to a cluster outage. However, what happens if we instead of a network failure considers a complete host failure?
Consider now a case where there is a hardware failure on Host A or someone by accident pulls the power. This causes the whole host to shut down taking both the data and management node with it. What happens in this case?
The first thought is that it will not be an issue. Node Id 2 has all the data, so surely it will continue, right? No, that is not so. The result is a total cluster outage as shown in the following figure:
Why does this happen? When Host A crashes, so does the arbitrator management node. Since Node Id 2 does not on its own constitute a majority of the data nodes, it must contact the arbitrator to be allowed to remain online.
You may think it can use the management node with Node Id 50 as the arbitrator, but that will not happen: while handling a node failure, under no circumstances can the arbitrator be changed. The nodes on Host B cannot know whether it cannot see the nodes on Host A because of a network failure (as in the previous example) or because the nodes are dead. So, they have to assume the other nodes are still alive or there would sooner or later be a split-brain cluster with both halves online.
So, the data node with Id 2 has no other option than to shut itself down, and there is a total cluster outage. A single point of failure has caused a total failure. That is not the idea of a high availability cluster.
What could have been done to prevent the cluster outage? Let’s reconsider the case where the arbitrator is on a third independent host.
Arbitrator on Independent Host
The picture changes completely, if the management nodes are installed on Hosts C and D instead of Hosts A and B. For simplicity the management node with Node Id 50 is left out as it is anyway just a spectator while handling the node failure. In this case the scenario is:
Here Node Id 2 can still contact the arbitrator. Node Id 1 is dead, so it will not compete to win the arbitration, and the end result becomes that Node Id 2 remains online. The situation is back where a single point of failure does not bring down the whole cluster.
If you want your cluster to have the best chance of survival if there is a problem with one of the hosts, never install the management nodes on the same hosts as where there are data nodes. One of the management nodes will also act as the arbitrator. Since the arbitrator cannot change while the cluster is handling a node failure, if the host with the arbitrator crashes, it will cause a complete cluster shutdown if arbitration is required.
When you consider what is a host, you should look at physical hosts. Installing the management node in a different virtual machine on the same physical host offers only little extra protection compared to the case where they are installed in the same virtual host or on the same host using bare metal.
So, to conclude: make sure your management nodes are on a completely different physical host compared to your data nodes.
Want to Know More?
The book Pro MySQL NDB Cluster (published by Apress) by Mikiya Okuno and me includes lots of information about designing your cluster and how MySQL NDB Cluster works.