Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Managing Serviceguard Fifteenth Edition > Chapter 8 Troubleshooting Your Cluster

Replacing Disks

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using. Separate descriptions are provided for replacing an array mechanism and a disk in a high availability enclosure.

For more information, see the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, at http://docs.hp.com

Replacing a Faulty Array Mechanism

With any HA disk array configured in RAID 1 or RAID 5, refer to the array’s documentation for instructions on how to replace a faulty mechanism. After the replacement, the device itself automatically rebuilds the missing data on the new disk. No LVM or VxVM activity is needed. This process is known as hot swapping the disk.

Replacing a Faulty Mechanism in an HA Enclosure

If you are using software mirroring with Mirrordisk/UX and the mirrored disks are mounted in a high availability disk enclosure, you can use the following steps to hot plug a disk mechanism:

  1. Identify the physical volume name of the failed disk and the name of the volume group in which it was configured. In the following example, the volume group name is shown as /dev/vg_sg01 and the physical volume name is shown as /dev/dsk/c2t3d0. Substitute the volume group and physical volume names that are correct for your system.

    NOTE: This example assumes you are using legacy DSF naming. Under agile addressing, the physical volume would have a name such as /dev/disk/disk1. See “About Device File Names (Device Special Files)”.

    If you need to replace a disk under the 11i v3 agile addressing scheme, you may be able to reduce downtime by using the io_redirect_dsf(1M) command to reassign the existing DSF to the new device. See the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, posted at http://docs.hp.com.

  2. Identify the names of any logical volumes that have extents defined on the failed physical volume.

  3. On the node on which the volume group is currently activated, use the following command for each logical volume that has extents on the failed physical volume:

    lvreduce -m 0 /dev/vg_sg01/lvolname /dev/dsk/c2t3d0 
  4. At this point, remove the failed disk and insert a new one. The new disk will have the same HP-UX device name as the old one.

  5. On the node from which you issued the lvreduce command, issue the following command to restore the volume group configuration data to the newly inserted disk:

    vgcfgrestore -n /dev/vg_sg01 /dev/dsk/c2t3d0 
  6. Issue the following command to extend the logical volume to the newly inserted disk:

    lvextend -m 1 /dev/vg_sg01 /dev/dsk/c2t3d0 
  7. Finally, use the lvsync command for each logical volume that has extents on the failed physical volume. This synchronizes the extents of the new disk with the extents of the other mirror.

    lvsync /dev/vg_sg01/lvolname  

Replacing a Lock Disk

You can replace an unusable lock disk while the cluster is running, provided you do not change the devicefile name (DSF).

CAUTION: Before you start, make sure that all nodes have logged a message in syslog saying that the lock disk is corrupt or unusable.
IMPORTANT: If you need to replace a disk under the HP-UX 11i v3 agile addressing scheme (see “About Device File Names (Device Special Files)”), you may need to use the io_redirect_dsf(1M) command to reassign the existing DSF to the new device, depending on whether the operation changes the WWID of the device. See the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, posted at http://docs.hp.com -> 11i v3 -> System Administration. See also the section on io_redirect_dsf in the white paper The Next Generation Mass Storage Stack under Network and Systems Management -> Storage Area Management on docs.hp.com.

If, for any reason, you are not able to use the existing DSF for the new device, you must halt the cluster and change the name of the DSF in the cluster configuration file; see “Updating the Cluster Lock Disk Configuration Offline”.

Replace a failed LVM lock disk in the same way as you replace a data disk. If you are using a dedicated lock disk (one with no user data on it), then you need to use only one LVM command, for example:

vgcfgrestore -n /dev/vg_lock /dev/dsk/c2t3d0

Serviceguard checks the lock disk every 75 seconds. After using the vgcfgrestore command, review the syslog file of an active cluster node for not more than 75 seconds. By this time you should see a message showing that the lock disk is healthy again.

NOTE: If you restore or recreate the volume group for the lock disk and you need to re-create the cluster lock (for example if no vgcfgbackup is available), you can run cmdisklock to re-create the lock. See the cmdisklock (1m) manpage for more information.

Replacing a Lock LUN

You can replace an unusable lock LUN while the cluster is running, provided you do not change the devicefile name (DSF).

CAUTION: Before you start, make sure that all nodes have logged a message such as the following in syslog:

WARNING: Cluster lock LUN /dev/dsk/c0t1d1 is corrupt: bad label. Until this situation is corrected, a single failure could cause all nodes in the cluster to crash.

IMPORTANT: If you need to replace a LUN under the HP-UX 11i v3 agile addressing scheme (see “About Device File Names (Device Special Files)”), you may need to use the io_redirect_dsf(1M) command to reassign the existing DSF to the new device, depending on whether the operation changes the WWID of the LUN; see the section on io_redirect_dsf in the white paper The Next Generation Mass Storage Stack under Network and Systems Management -> Storage Area Management on docs.hp.com.

If, for any reason, you are not able to use the existing DSF for the new device, you must halt the cluster and change the name of the DSF in the cluster configuration file; see “Updating the Cluster Lock Disk Configuration Offline”.

Once all nodes have logged this message, use a command such as the following to specify the new cluster lock LUN:

cmdisklock reset /dev/dsk/c0t1d1

cmdisklock checks that the specified device is not in use by LVM, VxVM, ASM, or the file system, and will fail if the device has a label marking it as in use by any of those subsystems. cmdisklock -f overrides this check.

CAUTION: You are responsible for determining that the device is not being used by any subsystem on any node connected to the device before using cmdisklock -f. If you use cmdisklock -f without taking this precaution, you could lose data.
NOTE: cmdisklock is needed only when you are repairing or replacing a lock LUN or lock disk; see the cmdisklock (1m) manpage for more information.

Serviceguard checks the lock disk every 75 seconds. After using the vgcfgrestore command, review the syslog file of an active cluster node for not more than 75 seconds. By this time you should see a message showing that the lock disk is healthy again.

On-line Hardware Maintenance with In-line SCSI Terminator

In some shared SCSI bus configurations, on-line SCSI disk controller hardware repairs can be made if HP in-line terminator (ILT) cables are used. In-line terminator cables are supported with most SCSI-2 Fast-Wide configurations.

In-line terminator cables are supported with Ultra2 SCSI host bus adapters only when used with the SC10 disk enclosure. This is because the SC10 operates at slower SCSI bus speeds, which are safe for the use of ILT cables. In-line terminator cables are not supported for use in any Ultra160 or Ultra3 SCSI configuration, since the higher SCSI bus speeds can cause silent data corruption when the ILT cables are used.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© Hewlett-Packard Development Company, L.P.