Hi! I am an independent contractor working for a company in the UK and, since I have had a smattering of VMS experience, I have been asked to decommission a VAX system. My experience does not include clustering, however. The node is a member of a cluster and I know I have to remove the node from that, using @CLUSTER_CONFIG, but some of the documentation (hah!) which the preceding system manager left indicates that they have had problems in the past when the master was rebooted after one or more of the members. However, the specific node he mentions in the doc no longer exists. How do I find out which node is master? Your page of configuring a single master disk didn't help me, as their SYLOGICALS.COM is all commented out. I expected SHOW CLUSTER to tell me, but according to that, all nodes are members. Is that possible? Also, the Compaq docs suggest editing MODPARAMS.DAT and adjusting the EXPECTED_VOTES parameter. It helpfully suggests incrementing the value by 1 when adding a node, so I presume I subtract 1 when deleting? Also, it confusingly states that VAXVMSSYS.PAR should be updated, whereas elsewhere it states that CLUSTER_CONFIG takes care of that. According to my observations, the nodes boot independently and, whichever completes first, takes control of the cluster, as it were. I came across an old document here which states that one node in particular needs to be before the others, "due to licensing problems". Asking around, the consensus is that this node holds the license database (their previous problems have related to the C compilers in use complaining that they weren't licensed at all, or more usually running out of licenses after just a couple of users were logged in.) Anyway, I noticed that the nodes would say something like "Request to first_node_to_boot to join cluster from booting_node" and, on the two occasions I've had the cluster boot (once not of my doing, but seemingly after a network broadcast storm), if the first node to boot wasn't this "licensing" node, the other machines would fail in various ways, mostly related to DECnet, whereby a SET HOST node would fail with a "node unreachable" error. It seems to me what we have here is a pseudo-cluster, used to serve the different machine's disks collectively, rather than for processing power. Feasible? Last question (a new one): One of the reasons this company clings to VMS is for the batching/queuing facilities. Any recommendations or pointers for reproducing these capabilities on NT or Unix? I had a VERY quick look at VX/JSP and VX/DCL (both NT platform) from Sector 7 which looks promising. Any help you could send my way would be greatly appreciated. Best regards, |
Ian ... There is no such thing as a master node.
No one node ever takes control of the cluster. I think you
are looking for the "Boot Server Node". Or the node from
which a satellite node boots from. There are server nodes and satellite
nodes. A Cluster can have several server nodes and several
boot server nodes. What you need to do is find the boot
server for the satellite node you want to remove. When booting nodes in a cluster, first
boot your boot servers. Then your other servers, then your
satellites. Yes, you could experience problems if you boot
them in the wrong order. Just about everything is distributed
among all nodes in a cluster. However you may see a node
name in the message : "Request to first_node_to_boot
to join cluster from booting_node", which would imply that
the "first_node_to_boot" is the master, however it
only means that this node is the one that is currently
running the distributed connection manager the most.
Depending on system loads, this node can change dynamically
from one node to another. There are many ways to find out which
node is the boot server for another node. One way is to log
onto your satellite node and edit the file
SYS$SYSDEVICE:[SYS0.SYSEXE]MODPARAMS.DAT. In most
cases the [SYS0. root of the system disk is the root for
the boot node, but not always. The sure-fire way is to boot
the satellite node in question and wait for the message that
says "SYSTEM LOADED BY NODE nodename". This is your
boot node. Concerning your licensing problem of
having problems when you don't boot the licensing node first
is due to errors made by the system manager when the system
was generated. Basically it means they did not read the
documentation called "Guide to VMS Clusters". This problem
can be fixed, and I'll be glad to write out the method if
you wish, but for the time being, it's a good idea to boot
the "Licensing node" first. The fact that you get the error
"Node Unreachable" may mean that there are bigger licensing
errors involved as well. Go to my web page at http://www.jcameron.com/vms/
and select the sub page called "DCL
Tricks". There, download the file
called SYSINFO.COM,
and place this file in you root directory. For explanation
purposes I'm going to assume your login default device and
directory are DISK$USER1:[NORTH]. After you have
placed this in your directory DISK$USER1:[NORTH],
then enter in these commands : $MCR
SYSMAN This will print out a bunch of useful
information about each node, including which disk it boots
from and which node is it's boot node. This should help you
allot. Concerning batching, there are third
party software products that simulate batching under UNIX,
but to me they fall short. I'm not sure, but I believe it is
not possible under NT, but I'm no expert there. Even so VMS
not only offers Batching and Queuing, they are very good at
it, and if you implement them correctly you can have a 100%
fault tolerant batch processing system with job
checkpointing and fail over. That is why most large banking
companies still use VMS. Never fall for the doom sayers that
tout that VMS will go away. Currently there is nothing to
fill it's shoes for it's specialized capabilities, and
ability to remain in operation. Good Luck
SYSMAN>SET ENV/CLUSTER/USER=SYSTEM
_Password:<system password>
SYSMAN>DO
@DISK$USER1:[NORTH]SYSINFO.COM
Jeff Cameron
DCL | Utilities | Management | Tips