MySQL NDB Cluster: Never Install a Management Node on the Same Host as a Data Node

In MySQL NDB Cluster, the management node (ndb_mgmd) is a lightweight process that among other things handles the configuration of the cluster. Since it is lightweight. It can be tempting to install it with one of the other nodes. However, if you want a high-availability setup, you should never install it on the same host as a data node (ndbd or ndbmtd). If you do that, it can cause a total cluster outage where the cluster could otherwise have survived.

The first sign of trouble occurs when you start the management nodes. The following warning is printed to standard output:

2018-08-22 18:04:14 [MgmtSrvr] WARNING  -- at line 46: Cluster configuration warning:
  arbitrator with id 49 and db node with id 1 on same host 192.0.2.1
  arbitrator with id 50 and db node with id 2 on same host 192.0.2.2
  Running arbitrator on the same host as a database node may
  cause complete cluster shutdown in case of host failure.

To understand why this setup can cause a complete cluster shutdown, it is necessary first to review how a node failure is handled in MySQL NDB Cluster.

Node Failure Handling

When a data node fails, the cluster automatically evaluates whether it is safe to continue. A node failure can in this respect be either that a data node crashes or there is a network outage meaning one or more nodes cannot be seen by the other nodes.

A clean node shutdown (such as when using the recommended STOP command in the management client) is not subject to the evaluation as the other nodes will be notified of the shutdown.

So, how does MySQL NDB Cluster decide whether, the cluster can continue after a node failure or not? And if there are two halves, which one gets to continue?
Assuming two replicas (NoOfReplicas = 2), the rules are quite simple. To quote Pro MySQL NDB Cluster:

Pro MySQL NDB Cluster

“In short, data nodes are allowed to continue if the following conditions are true:

  • The group of data nodes holds all the data
  • Either more than half the data nodes are in the group or the group has won the arbitration process.”

For the group of data nodes to hold all the data, there must be one data node from each node group available. A node group is a group of NoOfReplicas nodes that share data. The arbitration process refers to the process of contacting the arbitrator (typically a management node) – the first half to make contact will remain online.

This is all a bit abstract, so let’s take a look at a couple of examples.

Examples

Consider a cluster with two data nodes and two management nodes. Most of the examples will have a management node installed on each of the hosts with the data nodes. The last example will as contrast have the management nodes on separate hosts.

The starting point is thus a cluster using two hosts each with one data node and one management node as shown in this figure:

A healthy cluster with two data nodes and the management nodes installed on the same hosts as the data nodes.
A healthy cluster with two data nodes and the management nodes installed on the same hosts as the data nodes.

The green colour represents that the data node is online. The blue colour for the management node with Node Id 49 is the arbitrator, and the yellow management node is on standby.

This is where the problem with the setup starts to show up. The arbitrator is the node that is involved when there is exactly half the data nodes available after a node failure. In that case, the data node(s) will have to contact the arbitrator to confirm whether it is OK to continue. This avoids a split-brain scenario where there are two halves with all the data; in that case it is imperative that one half is shut down or the data can start to diverge. The half that is first to contact the arbitrator survives, the other is killed (STONITH – shoot the other node in the head).

So, let’s look at a potential split-brain scenario.

Split-Brain Scenario

A split-brain scenario can occur when the network between the two halves of the cluster is lost as shown in the next figure:

NDB Cluster with network failure but the cluster survives.
NDB Cluster with network failure but the cluster survives.

In this case the network connection between the two hosts is lost. Since both nodes have all data, it is necessary with arbitration to decide who can continue. The data node with Id 1 can still contact the arbitrator as it is on the same host, but Node Id 2 cannot (it would need to use the network that is down). So, Node Id 1 gets to continue whereas Node Id 2 is shut down.

So far so good. This is what is expected. A single point of failure does not lead to a cluster outage. However, what happens if we instead of a network failure considers a complete host failure?

Host Failure

Consider now a case where there is a hardware failure on Host A or someone by accident pulls the power. This causes the whole host to shut down taking both the data and management node with it. What happens in this case?

The first thought is that it will not be an issue. Node Id 2 has all the data, so surely it will continue, right? No, that is not so. The result is a total cluster outage as shown in the following figure:

The failure of the host with the arbitrator causes complete cluster outage.
The failure of the host with the arbitrator causes complete cluster outage.

Why does this happen? When Host A crashes, so does the arbitrator management node. Since Node Id 2 does not on its own constitute a majority of the data nodes, it must contact the arbitrator to be allowed to remain online.

You may think it can use the management node with Node Id 50 as the arbitrator, but that will not happen: while handling a node failure, under no circumstances can the arbitrator be changed. The nodes on Host B cannot know whether it cannot see the nodes on Host A because of a network failure (as in the previous example) or because the nodes are dead. So, they have to assume the other nodes are still alive or there would sooner or later be a split-brain cluster with both halves online.

Important

The arbitrator will never change while the cluster handles a node failure.

So, the data node with Id 2 has no other option than to shut itself down, and there is a total cluster outage. A single point of failure has caused a total failure. That is not the idea of a high availability cluster.

What could have been done to prevent the cluster outage? Let’s reconsider the case where the arbitrator is on a third independent host.

Arbitrator on Independent Host

The picture changes completely, if the management nodes are installed on Hosts C and D instead of Hosts A and B. For simplicity the management node with Node Id 50 is left out as it is anyway just a spectator while handling the node failure. In this case the scenario is:

The failure of the host with the arbitrator on a third host ensures the cluster remains online.
The failure of the host with the arbitrator on a third host ensures the cluster remains online.

Here Node Id 2 can still contact the arbitrator. Node Id 1 is dead, so it will not compete to win the arbitration, and the end result becomes that Node Id 2 remains online. The situation is back where a single point of failure does not bring down the whole cluster.

Conclusion

If you want your cluster to have the best chance of survival if there is a problem with one of the hosts, never install the management nodes on the same hosts as where there are data nodes. One of the management nodes will also act as the arbitrator. Since the arbitrator cannot change while the cluster is handling a node failure, if the host with the arbitrator crashes, it will cause a complete cluster shutdown if arbitration is required.

When you consider what is a host, you should look at physical hosts. Installing the management node in a different virtual machine on the same physical host offers only little extra protection compared to the case where they are installed in the same virtual host or on the same host using bare metal.

So, to conclude: make sure your management nodes are on a completely different physical host compared to your data nodes.

Want to Know More?

The book Pro MySQL NDB Cluster (published by Apress) by Mikiya Okuno and me includes lots of information about designing your cluster and how MySQL NDB Cluster works.

Disclaimer

I am one of the authors of Pro MySQL NDB Cluster.

I have worked with MySQL databases since 2006 both as an SQL developer, a database administrator, and for more than eight years as part of the Oracle MySQL Support team. I have spoken at MySQL Connect and Oracle OpenWorld on several occasions. I have contributed to the sys schema and four Oracle Certified Professional (OCP) exams for MySQL 5.6 to 8.0. I have written four books, all published at Apress.

4 Comments on “MySQL NDB Cluster: Never Install a Management Node on the Same Host as a Data Node

  1. A good post, I have a question now. As mentioned by you let me consider that I have deployed Data Node in HostA & HostB and Management Node in HostC & HostD. While HostD is the active Management Server this host crashes down. In such case, how would HostC pick up and provide arbitration if any failure scenarios occurs later?

    • Hi Subash,

      In that case the data nodes will do a “vote” and agree to promote HostC to be the arbitrator. That is, it is the data nodes that elect the arbitrator. There will also be a note about it in the logs.

      For example consider an example where there are two management nodes with NodeIds 49 and 50. NodeId = 49 is the current arbitrator:
      mysql> SELECT * FROM ndbinfo.arbitrator_validity_summary;
      +------------+------------------+---------------+-----------------+
      | arbitrator | arb_ticket | arb_connected | consensus_count |
      +------------+------------------+---------------+-----------------+
      | 49 | 5262000100a28204 | Yes | 2 |
      +------------+------------------+---------------+-----------------+
      1 row in set (0.01 sec)

      mysql> SELECT * FROM ndbinfo.arbitrator_validity_detail;
      +---------+------------+------------------+---------------+-----------+
      | node_id | arbitrator | arb_ticket | arb_connected | arb_state |
      +---------+------------+------------------+---------------+-----------+
      | 1 | 49 | 5262000100a28204 | Yes | ARBIT_RUN |
      | 2 | 49 | 5262000100a28204 | Yes | ARBIT_RUN |
      +---------+------------+------------------+---------------+-----------+
      2 rows in set (0.01 sec)

      The output shows that both data nodes (ids 1 and 2) agree that 49 is the arbitrator. Now kill node 49 (or unplug the host or similar). I used a kill -9 to ensure the management node shut down without communicating it to the rest of the nodes.

      In the cluster log for the surviving management node (NodeId = 50), this triggers a series of log messages:
      2019-04-20 10:24:14 [MgmtSrvr] ALERT -- Node 50: Node 49 Disconnected
      2019-04-20 10:24:14 [MgmtSrvr] ALERT -- Node 2: Node 49 Disconnected
      2019-04-20 10:24:14 [MgmtSrvr] INFO -- Node 1: Communication to Node 49 closed
      2019-04-20 10:24:14 [MgmtSrvr] INFO -- Node 1: Lost arbitrator node 49 - process failure [state=6]
      2019-04-20 10:24:14 [MgmtSrvr] INFO -- Node 1: President restarts arbitration thread [state=1]
      2019-04-20 10:24:14 [MgmtSrvr] INFO -- Node 2: Communication to Node 49 closed
      2019-04-20 10:24:14 [MgmtSrvr] ALERT -- Node 1: Node 49 Disconnected
      2019-04-20 10:24:14 [MgmtSrvr] INFO -- Node 2: Prepare arbitrator node 50 [ticket=5262000200a4cc9c]
      2019-04-20 10:24:15 [MgmtSrvr] INFO -- Node 1: Started arbitrator node 50 [ticket=5262000200a4cc9c]
      2019-04-20 10:24:15 [MgmtSrvr] INFO -- Node 1: Communication to Node 49 opened
      2019-04-20 10:24:15 [MgmtSrvr] INFO -- Node 2: Communication to Node 49 opened

      This shows how node 49 disconnects, then a message that the arbitrator was lost and later how node 50 is elected (started) as the new arbitrator.

      Similar on the “president” data node:
      2019-04-20 10:24:14 [ndbd] INFO -- Lost arbitrator node 49 - process failure [state=6]
      2019-04-20 10:24:14 [ndbd] INFO -- President restarts arbitration thread [state=1]
      2019-04-20 10:24:15 [ndbd] INFO -- Started arbitrator node 50 [ticket=5262000200a4cc9c]

      On the non-president data nodes, only a “Prepare arbitrator node 50 …” message is logged.

      If you query the ndbinfo tables, you can see node 50 is also showing up there as the new arbitrator:
      mysql> SELECT * FROM ndbinfo.arbitrator_validity_summary;
      +------------+------------------+---------------+-----------------+
      | arbitrator | arb_ticket | arb_connected | consensus_count |
      +------------+------------------+---------------+-----------------+
      | 50 | 5262000200a4cc9c | Yes | 2 |
      +------------+------------------+---------------+-----------------+
      1 row in set (0.01 sec)

      mysql> SELECT * FROM ndbinfo.arbitrator_validity_detail;
      +---------+------------+------------------+---------------+-----------+
      | node_id | arbitrator | arb_ticket | arb_connected | arb_state |
      +---------+------------+------------------+---------------+-----------+
      | 1 | 50 | 5262000200a4cc9c | Yes | ARBIT_RUN |
      | 2 | 50 | 5262000200a4cc9c | Yes | ARBIT_RUN |
      +---------+------------+------------------+---------------+-----------+
      2 rows in set (0.01 sec)

      Cheers,
      Jesper

  2. I have a cluster with two data nodes, two management nodes and two sql nodes. Cluster is using two hosts each with one data node, one management node and one sql node.

    To test the mentioned issue, i performed force restart on the host with arbitrator but doing so does not affect the cluster. The other management node becomes arbitrator and cluster functions perfectly.

    Moreover, when the other host starts, it rejoins perfectly.

    I am using mysql ndb cluster 8.0.2.

    • Hi Arbaz,

      Exactly how did you perform the “forced restart”? By nature, the arbitrator cannot change while handling a node failure as that means a network outage will make both halves continue. So if you really made a shutdown where the shutdown could not be communicated to the data node on the other host first, then it is a serious bug that you should report on https://bugs.mysql.com. My guess is that either a SIGTERM (signal 15) was first sent to the nodes allowing a clean node shutdown to performed, or that the arbitrator was shut down before the data node, so the data nodes could elect the other arbitrator first.

      To test it, I suggest you first try to pull the network between the hosts. That should allow the half with the active arbitrator to continue but shut down the other half. If not, you have a split brain and that is a bug. If that test works as it should, try (with the full cluster restored) to pull the power on the host with the active arbitrator.

      Note: there is no such version as 8.0.2 for MySQL NDB Cluster. The first DMR was 8.0.13 and the first GA release was 8.0.19, so I assume you mean 8.0.20.

      Cheers,
      Jesper

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.