Master Node in a Cluster


Hi!

I am an independent contractor working for a company in the UK and, since I have had a smattering of VMS experience, I have been asked to decommission a VAX system. My experience does not include clustering, however.

The node is a member of a cluster and I know I have to remove the node from that, using @CLUSTER_CONFIG, but some of the documentation (hah!) which the preceding system manager left indicates that they have had problems in the past when the master was rebooted after one or more of the members. However, the specific node he mentions in the doc no longer exists.

How do I find out which node is master? Your page of configuring a single master disk didn't help me, as their SYLOGICALS.COM is all commented out. I expected SHOW CLUSTER to tell me, but according to that, all nodes are members.

Is that possible?

Also, the Compaq docs suggest editing MODPARAMS.DAT and adjusting the EXPECTED_VOTES parameter. It helpfully suggests incrementing the value by 1 when adding a node, so I presume I subtract 1 when deleting? Also, it confusingly states that VAXVMSSYS.PAR should be updated, whereas elsewhere it states that CLUSTER_CONFIG takes care of that.

According to my observations, the nodes boot independently and, whichever completes first, takes control of the cluster, as it were. I came across an old document here which states that one node in particular needs to be before the others, "due to licensing problems".

Asking around, the consensus is that this node holds the license database (their previous problems have related to the C compilers in use complaining that they weren't licensed at all, or more usually running out of licenses after just a couple of users were logged in.)

Anyway, I noticed that the nodes would say something like "Request to first_node_to_boot to join cluster from booting_node" and, on the two occasions I've had the cluster boot (once not of my doing, but seemingly after a network broadcast storm), if the first node to boot wasn't this "licensing" node, the other machines would fail in various ways, mostly related to DECnet, whereby a SET HOST node would fail with a "node unreachable" error.

It seems to me what we have here is a pseudo-cluster, used to serve the different machine's disks collectively, rather than for processing power. Feasible?

Last question (a new one): One of the reasons this company clings to VMS is for the batching/queuing facilities. Any recommendations or pointers for reproducing these capabilities on NT or Unix? I had a VERY quick look at VX/JSP and VX/DCL (both NT platform) from Sector 7 which looks promising.

Any help you could send my way would be greatly appreciated.

Best regards,
Ian Northwood

Ian ...

There is no such thing as a master node. No one node ever takes control of the cluster. I think you are looking for the "Boot Server Node". Or the node from which a satellite node boots from.

There are server nodes and satellite nodes. A Cluster can have several server nodes and several boot server nodes. What you need to do is find the boot server for the satellite node you want to remove.

When booting nodes in a cluster, first boot your boot servers. Then your other servers, then your satellites. Yes, you could experience problems if you boot them in the wrong order.

Just about everything is distributed among all nodes in a cluster. However you may see a node name in the message : "Request to first_node_to_boot to join cluster from booting_node", which would imply that the "first_node_to_boot" is the master, however it only means that this node is the one that is currently running the distributed connection manager the most. Depending on system loads, this node can change dynamically from one node to another.

There are many ways to find out which node is the boot server for another node. One way is to log onto your satellite node and edit the file SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT. In most cases the [SYS0. root of the system disk is the root for the boot node, but not always. The sure-fire way is to boot the satellite node in question and wait for the message that says "SYSTEM LOADED BY NODE nodename". This is your boot node.

Concerning your licensing problem of having problems when you don't boot the licensing node first is due to errors made by the system manager when the system was generated. Basically it means they did not read the documentation called "Guide to VMS Clusters". This problem can be fixed, and I'll be glad to write out the method if you wish, but for the time being, it's a good idea to boot the "Licensing node" first. The fact that you get the error "Node Unreachable" may mean that there are bigger licensing errors involved as well.

Go to my web page at http://www.jcameron.com/vms/ and select the sub page called "DCL Tricks". There, download the file called SYSINFO.COM, and place this file in you root directory. For explanation purposes I'm going to assume your login default device and directory are DISK$USER1:[NORTH]. After you have placed this in your directory DISK$USER1:[NORTH], then enter in these commands :

$MCR SYSMAN
SYSMAN>SET ENV/CLUSTER/USER=SYSTEM
_Password:<system password>
SYSMAN>DO @DISK$USER1:[NORTH]SYSINFO.COM

This will print out a bunch of useful information about each node, including which disk it boots from and which node is it's boot node. This should help you allot.

Concerning batching, there are third party software products that simulate batching under UNIX, but to me they fall short. I'm not sure, but I believe it is not possible under NT, but I'm no expert there. Even so VMS not only offers Batching and Queuing, they are very good at it, and if you implement them correctly you can have a 100% fault tolerant batch processing system with job checkpointing and fail over. That is why most large banking companies still use VMS. Never fall for the doom sayers that tout that VMS will go away. Currently there is nothing to fill it's shoes for it's specialized capabilities, and ability to remain in operation.

Good Luck
Jeff Cameron


Send me your question. 


 

My Home Page | VMS Home

DCL | Utilities | Management | Tips

FORTRAN | Pascal

eMail Questions

Quiz?