We've got several DSSI two-node clusters. All the hardware is Compaq/Dec. We use VMS 6.2 with host-based shadowing (phase II). Yes, we use a quorum disk. All other volumes are two-member shadow sets. Now the actual problem... Sometimes, but not always, we encounter disk merges after a reboot of one or both nodes. When this happens of course the perceived response time for our users bogs down to a snail's pace. If the system disk does a merge, the system is almost unusable. (To deal with this, we sometimes opt to dismount one member of a shadow set and then remount it forcing a copy instead of a merge). This works, but is not something we enjoy because of the RISK or a disk failure during that time. Other notes: When shutting down we use shutdown options "remove,reboot". Like I wrote; sometimes the system will reboot with out any merges, other times it's a mess. Now of course if a system doesn't shutdown cleanly and an operator hits the HALT button or does something impatient like power cycling then I can understand the merge operation, but not with a normal system shutdown/reboot. Got any ideas of how to eliminate the disk merges? |
Even though I have never used host based
shadowing, I'm familiar enough with the Files-11 ODS II file
system and what is needed to perform shadowing. I used to
run a site with controller or HSC based shadowing. I'm 95%
certain I know what the problem is, but for benefit of the
web page a background explanation is needed. In any kind of shadowing system, the idea
is to have a block-by-block copy of a disk on another disk,
and each time one shadow set member is changed, it's changed
on the other. Of course, the first time you add a
second disk to a shadow set, the master disk has to be
completely copied to the new member, and this is a pretty
straight forward thing to do, but it just takes time. This
is the shadow state called "copy" as you know. When in the
copy state, the operations are pretty simple too, the source
disk and the new shadow set member are addressed
sequentially, which means that the copy operation has only a
moderate impact on the system performance (which is also
accessing the master system drive). This explains the
situation when you mount and force a copy
operation. However, if a shadow set member that once
was a member is added again, it would be stupid to copy the
whole disk again, when for the most part, a large part of
the data is already on the new shadow set member, hence the
"Merge" state which brings in a new shadow set member, and
updates only those blocks that have changed since the disk
was last a member of the shadow set. To do the Merge
operation, the shadow software has to look at the file
system structure and identify where new files have been
created, existing files deleted or modified, and then update
the appropriate blocks on the new shadow set member. This
operation impacts the source, or master, disk tremendously
because the I/O operations are of a random order rather than
sequential. Now here is the crux of your problem (or
so I believe): This operation can be extremely
intensive, even if your disk is even slightly fragmented. If
you have high fragmentation, it can almost bring your normal
disk requests to a halt . So if the disk in question is your
system disk, bingo, you system slows down to the pace of a
Pentium in August in New Mexico. So my bet is that your system disk is
fragmented. Even if you have defrag software like Diskeeper
running on your system disk, you have to remember that this
software cannot move files (the defragmentation operation)
while they are open, and a running system disk has hundreds
of files open when the system is up. Some files are always
open an appended to each time the system is booted like the
accounting file, error log, or security audit log. Other
files are slowly appended to while the system is up like
OPERATOR.LOG on many other log files. And if you have gone
through many installs and upgrades since you first generated
the disk or since the last time you manually defragmented
it, you system disk my very well be heavily
fragmented. Well, you are probably asking ... "If my
defrag software doesn't defrag my system disk very good,
what do I do?" The answer involves work, but will pay off in
the long run. And if you don't do this operation alt least
once a year and once after each OS upgrade, you should
heavily consider it. I call it "Giving your system disk a
DNC". No! not like the gynocologist does! DNC means
"Defrag-N-Clean". First clean up your system disk, then
defrag it by performing a standalone image backup and an
immediate standalone image restore. It's good to have good
standalone backups of your system disk on tape, but if you
are in a hurry, doing this standalone backup/restore from
disk to disk is ok. It is, however very important that you
use the standalone backup utility, and not backup while the
system is running. You will notice that I did say clean your
system disk before performing the standalone backup/restore
operation. Here are ways to clean your system disk. Of
course each of the below operations you will have to decide
if you need the files that will be deleted, and back them up
or copy them elsewhere before you do any of the cleanup
operations. Cleanup chores : Now you can do the standalone
backup/restore, and when your done, when rebooting, force a
copy operation the first time when forming the shadow set
member, and then subsequent times, your merges should go
rather rapidly and with minimal impact on system
performance. I hope this helps. Jeff Cameron
$PURGE
SYS$SYSDEVICE:[*...]*.*
$DELETE
SYS$SYSDEVICE:[*...]*.LOG;*
$SET ACCOUNT/NEW_FILE
$DELETE SYS$MANAGER:ACOUNTNG.DAT
$SET AUDIT/SERVER=NEW_LOG
$DELETE (wherever your audit file is)
$DELETE SYS$ERRORLOG:ERRLOG.*;*
$@TRIM_FILE.COM
SYS$SYSDEVICE:[*...]*.*;*
DCL | Utilities | Management | Tips