Shadow Set Merge
operation on the
System Disk 


We've got several DSSI two-node clusters. All the hardware is Compaq/Dec. We use VMS 6.2 with host-based shadowing (phase II). Yes, we use a quorum disk. All other volumes are two-member shadow sets.

Now the actual problem...

Sometimes, but not always, we encounter disk merges after a reboot of one or both nodes. When this happens of course the perceived response time for our users bogs down to a snail's pace. If the system disk does a merge, the system is almost unusable. (To deal with this, we sometimes opt to dismount one member of a shadow set and then remount it forcing a copy instead of a merge). This works, but is not something we enjoy because of the RISK or a disk failure during that time.

Other notes: When shutting down we use shutdown options "remove,reboot". Like I wrote; sometimes the system will reboot with out any merges, other times it's a mess.

Now of course if a system doesn't shutdown cleanly and an operator hits the HALT button or does something impatient like power cycling then I can understand the merge operation, but not with a normal system shutdown/reboot.

Got any ideas of how to eliminate the disk merges?

Even though I have never used host based shadowing, I'm familiar enough with the Files-11 ODS II file system and what is needed to perform shadowing. I used to run a site with controller or HSC based shadowing. I'm 95% certain I know what the problem is, but for benefit of the web page a background explanation is needed.

In any kind of shadowing system, the idea is to have a block-by-block copy of a disk on another disk, and each time one shadow set member is changed, it's changed on the other.

Of course, the first time you add a second disk to a shadow set, the master disk has to be completely copied to the new member, and this is a pretty straight forward thing to do, but it just takes time. This is the shadow state called "copy" as you know. When in the copy state, the operations are pretty simple too, the source disk and the new shadow set member are addressed sequentially, which means that the copy operation has only a moderate impact on the system performance (which is also accessing the master system drive). This explains the situation when you mount and force a copy operation.

However, if a shadow set member that once was a member is added again, it would be stupid to copy the whole disk again, when for the most part, a large part of the data is already on the new shadow set member, hence the "Merge" state which brings in a new shadow set member, and updates only those blocks that have changed since the disk was last a member of the shadow set. To do the Merge operation, the shadow software has to look at the file system structure and identify where new files have been created, existing files deleted or modified, and then update the appropriate blocks on the new shadow set member. This operation impacts the source, or master, disk tremendously because the I/O operations are of a random order rather than sequential.

Now here is the crux of your problem (or so I believe):

This operation can be extremely intensive, even if your disk is even slightly fragmented. If you have high fragmentation, it can almost bring your normal disk requests to a halt . So if the disk in question is your system disk, bingo, you system slows down to the pace of a Pentium in August in New Mexico.

So my bet is that your system disk is fragmented. Even if you have defrag software like Diskeeper running on your system disk, you have to remember that this software cannot move files (the defragmentation operation) while they are open, and a running system disk has hundreds of files open when the system is up. Some files are always open an appended to each time the system is booted like the accounting file, error log, or security audit log. Other files are slowly appended to while the system is up like OPERATOR.LOG on many other log files. And if you have gone through many installs and upgrades since you first generated the disk or since the last time you manually defragmented it, you system disk my very well be heavily fragmented.

Well, you are probably asking ... "If my defrag software doesn't defrag my system disk very good, what do I do?" The answer involves work, but will pay off in the long run. And if you don't do this operation alt least once a year and once after each OS upgrade, you should heavily consider it.

I call it "Giving your system disk a DNC". No! not like the gynocologist does! DNC means "Defrag-N-Clean". First clean up your system disk, then defrag it by performing a standalone image backup and an immediate standalone image restore. It's good to have good standalone backups of your system disk on tape, but if you are in a hurry, doing this standalone backup/restore from disk to disk is ok. It is, however very important that you use the standalone backup utility, and not backup while the system is running.

You will notice that I did say clean your system disk before performing the standalone backup/restore operation. Here are ways to clean your system disk. Of course each of the below operations you will have to decide if you need the files that will be deleted, and back them up or copy them elsewhere before you do any of the cleanup operations.

 

Cleanup chores :

  • Purge your entire system disk.
    $PURGE SYS$SYSDEVICE:[*...]*.*
  • Delete all unnecessary log files (including OPERATOR.LOG).
    $DELETE SYS$SYSDEVICE:[*...]*.LOG;*
  • Make a new accounting file and delete the old. (On each node of your cluster if necessary)
    $SET ACCOUNT/NEW_FILE
    $DELETE SYS$MANAGER:ACOUNTNG.DAT
  • Make a new security audit file and delete the old.
    $SET AUDIT/SERVER=NEW_LOG
    $DELETE
    (wherever your audit file is)
  • Make a new Error log file.
    $DELETE SYS$ERRORLOG:ERRLOG.*;*
  • Run my TRIM_FILE.COM procedure on the entire disk.
    $@TRIM_FILE.COM SYS$SYSDEVICE:[*...]*.*;*

Now you can do the standalone backup/restore, and when your done, when rebooting, force a copy operation the first time when forming the shadow set member, and then subsequent times, your merges should go rather rapidly and with minimal impact on system performance.

I hope this helps.

Jeff Cameron

 


Send me your question.


 

My Home Page | VMS Home

DCL | Utilities | Management | Tips

FORTRAN | Pascal

eMail Questions

Quiz?