You
R WASTed

Updated April 28, 2002


Hi JEFF.

I've started a backup from my PC, on a VAX server, and my LAT session has broken. So the process related to that job aborted while backing up to tape. When I got back into the VAX, the process was still active but frozen in a RWAST state. I decided to stop that process, but that didn't work. It's still there. And the tape device is in a "Mounted/Dismount" state. Additional attempts to stop or clear out the process give the message "No such process" even though the SHOW SYSTEM command still shows the process in the RWAST state.

What do I do?

Yves

Yves ...

Concerning your problem with the tape drive and it's process in a RWAST state. I've seen it too many times, and the only solution is to reboot the system. When this happens, I call it "You R WASTed". It comes about because of an unusual relation between the physical operation of a tape drive, and the magnetic tape driver.

Because of the history of tape media, Digital purposely chose to write the tape driver in such a way, as to allow users to continue to operate while certain tape operations are still going on. In other words, they had to write the driver in such a way that if a user queued an I/O operation to the tape drive with a WAIT until completed option, that the user's process did not have to wait until the physical operation actually completed, as in a rewind command, which can take several minutes.

So even if a user performs a QIO/Wait on a tape drive, the driver starts the operation, and resumes the process as if the I/O has completed when, in fact, it has not. When this happens the driver does alert the process that the I/O has not completed in the I/O status block. It then queues up an AST (Asynchronous System Trap) to signal the process when the operation actually does complete. When one of these AST's are active, the tape driver will not execute any more I/O commands until the previous operation completes.

Because of this design, it is possible for things to get messed up, if the hardware has a problem when one of these AST's is active. For example, if the tape drive goes offline. When something unexpected happens, your process will go into the RWAST state, Ie: waiting for the Read/Write AST to complete. Now, while the example given below may not be the exact reason your process went into the RWAST state, it is the most common reason and a good example.

When a tape drive is issued a rewind command, the actual I/O operation does not complete until the hardware detects the BOT (beginning of tape) marker. So in the case of the rewind operation, the I/O returns with an incomplete status, even though the operation was queued with a WAIT option. In this case the tape driver returns control to the user process, and sets up an AST (Asynchronous System Trap) to signal actual completion. This prevents future I/O operations from being executed until the rewind operation has physically completed successfully. The problem lies in the fact that device drivers do not execute in the context of a process, while ASTs do.

Now, imagine these following course of events.

A. You are running backup, and the end of volume is reached. Backup issues a rewind command, followed by an unload command preparing to continue processing on another volume. The tape begins to rewind and the AST is scheduled to allow more IOs once the tape reaches BOT. So the Unload IO operation is queued, but not executed, pending completion of the Rewind AST.

B. You see the tape rewinding, and press the online/offline button to take the tape offline. This prevents the BOT signal from ever getting to the driver. If you are lucky, when you mount the next tape and press the online button it will detect the BOT, and continue on with no problem. However, sometimes the new BOT condition does not satisfy the rewind completion because it did not occur due to a rewind command, but rather your loading the new tape.

C. So, your process is still pending the AST, which seems to never get satisfied. Everything is hung up, and you cant do anything. So now you try to stop the process. Open VMS begins to delete the process by doing what is called a process rundown, which essentially is closing all files and links to devices. It attempts to terminate the connection to the tape drive, but the driver is not accepting any more IO operations, pending the AST completion.

D. The process has been removed from the active process table, but still appears in the system master process table in a rundown state still awaiting completion of a Read/Write AST or RWAST.

If you have enough buttons on the tape drive to manually cause the tape to advance forward while offline, then manually begin a rewind operation while offline, and press the online/offline button to bring the tape back online before it reaches BOT, then you might be able to satisfy the AST condition, and the process will finish running down. But since most tape devices these days do not have this offline capability, you are SOL (Software Operationally Locked), or I like to say "You R WASTed".

If "You R WASTed", then the only thing that I know you can do to get your tape drive back is to reboot the system.

Now I'm not saying the above situation is exactly what happened to you, but evidently something happened to the tape drive, and/or the process was stopped prior to the rewind operation completing normally.

The moral to this story is : When a magtape unit is rewinding due to a software rewind IO operation, always let the rewind complete before you mess with any magtape buttons or controls. Also, trying to stop a process in a RWAST state only makes things worse. Try shutting down the power to the tape drive, and reset it. But if that doesn't help ...

You R WASTed

and you must reboot.

Sorry for the bad news.

Jeff

 

BUT WAIT!
What would you pay to get out of
that pesky RWAST state
without the cost of rebooting?
Before you resign yourself to death at the hands of your users, or worse, your boss because you have to reboot your production system just to get access back from a tape drive.

All hope is not lost. Vist Phil Ottewell's Web Page on the RWAST state. You may yet redeem yourself and elevate yourself to level of "The Man Behind The Curtain" Wizzard of OpenVMS!


Send me your question.


 

My Home Page | VMS Home

DCL | Utilities | Management | Tips

FORTRAN | Pascal

eMail Questions

Quiz?