hit counter script
IBM AIX HACMP SG24-5131-00 Hardware User Manual
IBM AIX HACMP SG24-5131-00 Hardware User Manual

IBM AIX HACMP SG24-5131-00 Hardware User Manual

Certification study guide
Table of Contents

Advertisement

Quick Links

IBM Certification Study Guide
AIX HACMP
David Thiessen, Achim Rehor, Reinhard Zettler
International Technical Support Organization
http://www.redbooks.ibm.com
SG24-5131-00

Advertisement

Table of Contents
loading

Summary of Contents for IBM AIX HACMP SG24-5131-00

  • Page 1 IBM Certification Study Guide AIX HACMP David Thiessen, Achim Rehor, Reinhard Zettler International Technical Support Organization http://www.redbooks.ibm.com SG24-5131-00...
  • Page 3 SG24-5131-00 International Technical Support Organization IBM Certification Study Guide AIX HACMP May 1999...
  • Page 4 11400 Burnet Road Austin, Texas 78758-3493 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 1999. All rights reserved.
  • Page 5: Table Of Contents

    Chapter 1. Certification Overview ......1 1.1 IBM Certified Specialist - AIX HACMP ......1 1.2 Certification Exam Objectives .
  • Page 6 5.1.1 Predefined Cluster Events ......117 5.1.2 Pre- and Post-Event Processing ......122 IBM Certification Study Guide AIX HACMP...
  • Page 7 5.1.3 Event Notification ........122 5.1.4 Event Recovery and Retry ......122 5.1.5 Notes on Customizing Event Processing .
  • Page 8 9.2 Kerberos Security........187 9.2.1 Configuring Kerberos Security with HACMP Version 4.3..189 IBM Certification Study Guide AIX HACMP...
  • Page 9 IBM Redbook Order Form ........
  • Page 10 IBM Certification Study Guide AIX HACMP...
  • Page 11: Figures

    19. RVSD Function ..........193 20. RVSD Subsystem and HA Infrastructure ......194 © Copyright IBM Corp. 1999...
  • Page 12 IBM Certification Study Guide AIX HACMP...
  • Page 13: Tables

    21. HACMP Log Files ......... . 143 © Copyright IBM Corp. 1999...
  • Page 14 IBM Certification Study Guide AIX HACMP...
  • Page 15: Preface

    Preface The AIX and RS/6000 Certifications offered through the Professional Certification Program from IBM are designed to validate the skills required of technical professionals who work in the powerful and often complex environments of AIX and RS/6000. A complete set of professional certifications is available.
  • Page 16: The Team That Wrote This Redbook

    HACMP skills, this book is for you. For additional information about certification and instructions on How to Register for an exam, call IBM at 1-800-426-8322 or visit our Web site at: http://www.ibm.com/certify The Team That Wrote This Redbook This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization Austin Center.
  • Page 17: Comments Welcome

    University of Frankfurt in Germany. This is his first redbook. Reinhard Zettler is an AIX Software Engineer in Munich, Germany. He has two years of experience working with AIX and HACMP. He has worked at IBM for two years. He holds a degree in Telecommunication Technology. This is his first redbook.
  • Page 18 IBM Certification Study Guide AIX HACMP...
  • Page 19: Chapter 1. Certification Overview

    Highly Available Clusters. Certification Requirement (two Tests): To attain the IBM Certified Specialist - AIX HACMP certification, candidates must first obtain the AIX System Administration or the AIX System Support certification. In order to obtain one of these prerequisite certifications, the candidate must pass one of the following two exams: Test 181: AIX V4.3 System Administration...
  • Page 20: Certification Exam Objectives

    • Configure an IP Address Takeover (IPAT). • Configure non-IP heartbeat paths. • Configure a network adapter. • Customize/tailor AIX. • Set up a shared disk (SSA). • Set up a shared disk (SCSI). • Verify a cluster configuration. IBM Certification Study Guide AIX HACMP...
  • Page 21 • Create an application server. • Set up Event Notification. • Set up event notification and pre/post event scripts. • Set up error notification. • Post Configuration Activities. • Configure a client notification and ARP update. • Implement a test plan. •...
  • Page 22: Certification Education Courses

    Table 1. AIX Version 4 HACMP Installation and Implementation Course Number Course Duration Course Abstract IBM Certification Study Guide AIX HACMP Q1054 (USA) AU54 (Worldwide) Five days This course provides a detailed understanding of the High Availability Clustered Multi-Processing for AIX.
  • Page 23: Aix Version 4 Hacmp System Administration

    The following table outlines information about the next course. Table 2. AIX Version 4 HACMP System Administration Q1150 (USA); AU50 (Worldwide) Course Number Course Duration Five days Course Abstract This course teaches the student the skills required to administer an HACMP cluster on an ongoing basis after it is installed.
  • Page 24 IBM Certification Study Guide AIX HACMP...
  • Page 25: Chapter 2. Cluster Planning

    Almost any model of the RISC System/6000 POWERserver family can be included in an HACMP environment and new models continue to be added to the list. The following table gives you an overview of the currently supported © Copyright IBM Corp. 1999...
  • Page 26: Cluster Node Considerations

    In this section, we will offer some guidelines to assist you in choosing and sizing appropriate machine models to build your clusters. IBM Certification Study Guide AIX HACMP 4.2/ES 4.3/ES...
  • Page 27 Much of the decision centers around the following areas: • Processor capacity • Application requirements • Anticipated growth requirements • I/O slot requirements These paradigms are certainly not new ones, and are also important considerations when choosing a processor for a single-system environment. However, when designing a cluster, you must carefully consider the requirements of the cluster as a total entity.
  • Page 28 9076 high node 9076 thin node (silver) 9076 wide node (silver) The switch adapter is onboard and does not need an extra slot. IBM Certification Study Guide AIX HACMP Number of Slots Integrated Ethernet Port 4 x MCA 4x PCI...
  • Page 29: Cluster Networks

    2.2 Cluster Networks HACMP differentiates between two major types of networks: TCP/IP networks and non-TCP/IP networks. HACMP utilizes both of them for exchanging heartbeats. HACMP uses these heartbeats to diagnose failures in the cluster. Non-TCP/IP networks are used to distinguish an actual hardware failure from the failure of the TCP/IP software.
  • Page 30 (et*), where * reflects the interface number. HACMP for AIX also has been tested with Token-Ring and Fiber Distributed Data Interchange (FDDI) networks, with IBM Serial Optical Channel Converter (SOCC), Serial Line Internet Protocol (SLIP), and Asynchronous Transfer Mode (ATM) point-to-point connections.
  • Page 31 Network types also differentiate themselves in the maximum distance they allow between adapters, and in the maximum number of adapters allowed on a physical network. • Ethernet supports 10 and 100 Mbps currently, and supports hardware address swapping. Alternate hardware addresses should be in the form , where xxxxxxxxxxyy xxxxxxxxxx...
  • Page 32: Non-Tcpip Networks

    Currently HACMP supports the following types of networks for non-TCP/IP heartbeat exchange between cluster nodes: • Serial (RS232) • Target-mode SCSI • Target-mode SSA All of them must be configured as Network Type: serial in the HACMP definitions. IBM Certification Study Guide AIX HACMP...
  • Page 33 2.2.2.2 Special Considerations As for TCP/IP networks, there are a number of restrictions on non-TCP/IP networks. These are explained for the three different types in more detail below. Serial (RS232) A serial (RS232) network needs at least one available serial port per cluster node.
  • Page 34: Cluster Disks

    The following is a brief description of SSA and the basic rules to follow when designing SSA networks. For a full description of SSA and its functionality, please read Monitoring and Managing IBM SSA Disk Subsystems, SG24-5251. SSA is a high-performance, serial interconnect technology used to connect disk devices and host adapters.
  • Page 35 SSA subsystems are built up from loops of adapters and disks. A simple example is shown in Figure 1. High performance 80 MB/s interface Loop architecture with up to 127 nodes per loop Up to 25 m (82 ft) between SSA devices with copper cables Up to 2.4 km (1.5 mi) between SSA devices with optical extender Spatial reuse (multiple simultaneous transmissions) Figure 1.
  • Page 36 020, 600, D40 and T40. The 7133 models 010 and 500 were the first SSA products announced in 1995 with the revolutionary new Serial Storage Architecture. Some IBM customers still use the Models 010 and 500, but these have been replaced by 7133 Model 020, and 7133 Model 600 respectively.
  • Page 37 Item Supported RAID level Supported adapters Hot-swappable disk 2.3.1.1 Disk Capacities Table 8 lists the different SSA disks, and provides an overview of their characteristics. Table 8. SSA Disks Name Capacities (GB) Starfire 1100 Starfire 2200 Starfire 4320 Scorpion 4500 Scorpion 9100 Sailfin 9100 Thresher 9100...
  • Page 38 • Each SSA loop must be connected to a valid pair of connectors on the SSA adapter (that is, either Connectors A1 and A2, or Connectors B1 and B2). IBM Certification Study Guide AIX HACMP Adapter Description Enhanced RAID-5 Number...
  • Page 39: Ssa Adapters

    For the IBM 7190-100 SCSI to SSA converter, the following rules apply: • There can be up to 48 disk drives per loop. • There can be up to four IBM 7190-100 attached to any one SSA loop. Cluster Planning...
  • Page 40 RAID Level 1 has data redundancy, but data should be regularly backed up on the array. This is the only way to recover data in the event that a file or directory is accidentally deleted. IBM Certification Study Guide AIX HACMP...
  • Page 41 RAID Levels 2 and 3 RAID 2 and RAID 3 are parallel process array mechanisms, where all drives in the array operate in unison. Similar to data striping, information to be written to disk is split into chunks (a fixed amount of data), and each chunk is written out to the same physical position on separate disks (in parallel).
  • Page 42: The Advantages And Disadvantages Of The Different Raid Levels

    RAID. So, if you want to connect more than two nodes into the loop, mirroring is the way to go. • A RAID array can consist of three to 16 disks. IBM Certification Study Guide AIX HACMP Availability Capacity Performance...
  • Page 43 • Hot-pluggable cables and disks. • Very high capacity per adapter - up to 127 devices per loop, although most adapter implementations limit this. For example, current IBM SSA adapters provide 96 disks per Micro Channel or PCI slot. • Distance between devices of up to 25 meters with copper cables, 10km with optical links.
  • Page 44: Scsi Disks

    The SCSI adapters that can be used to connect RAID subsystems on a shared SCSI bus in an HACMP cluster are: • SCSI-2 Differential Controller (MCA, FC: 2420, Adapter Label: 4-2) • SCSI-2 Differential Fast/Wide Adapter/A (MCA, FC: 2416, Adapter Label: 4-6) IBM Certification Study Guide AIX HACMP...
  • Page 45 In the event of failure of either controller, all I/O activity is switched to the remaining active controller. In the last few years, the 7133 SSA Subsystems have become more popular than 7135 RAIDiant Systems due to better technology. IBM decided to Cluster Planning...
  • Page 46: Resource Planning

    A resource group also includes the list of nodes that can acquire those resources and serve them to clients. A resource group is defined as one of three types: IBM Certification Study Guide AIX HACMP...
  • Page 47 • Cascading • Rotating • Concurrent Each of these types describes a different set of relationships between nodes in the cluster, and a different set of behaviors upon nodes entering and leaving the cluster. Cascading Resource Groups: All nodes in a cascading resource group are assigned priorities for that resource group.
  • Page 48: Shared Lvm Components

    • Third-Party Takeover Hot-Standby Configuration Figure 2 illustrates a two node cluster in a hot-standby configuration. IBM Certification Study Guide AIX HACMP The active node with the highest priority controls the resource group. All active nodes have access to the resource group.
  • Page 49 Figure 2. Hot-Standby Configuration In this configuration, there is one cascading resource group consisting of the four disks, hdisk1 to hdisk4, and their constituent volume groups and file systems. Node 1 has a priority of 1 for this resource group while node 2 has a priority of 2.
  • Page 50 1 or node 2 fails, or has to leave the cluster for a scheduled outage, the surviving node acquires the failed node’s resource groups and continues to provide the failed node’s critical services. IBM Certification Study Guide AIX HACMP...
  • Page 51 When a failed node reintegrates into the cluster, it takes back the resource group for which it has the highest priority. Therefore, even in this configuration, there is a break in service during reintegration. Of course, if you look at it from the point of view of performance, this is the best thing to do, since you have one node doing the work of two when any one of the nodes is down.
  • Page 52: Ip Address Takeover

    IP address. In order to achieve this, you must do the following: • Decide which types of networks and point-to-point connections to use in the cluster (see 2.2, “Cluster Networks” on page 11 for supported network types) IBM Certification Study Guide AIX HACMP...
  • Page 53 • Design the network topology • Define a network mask for your site • Define IP addresses (adapter identifiers) for each node’s service and standby adapters. • Define a boot address for each service adapter that can be taken over, if you are using IP address takeover or rotating resources.
  • Page 54 SOCC, SLIP, and ATM are point-to-point connection types. In HACMP clusters of four or more nodes, however, use an SOCC line only as a private network between neighboring nodes because it cannot guarantee cluster communications with nodes other than its neighbors. IBM Certification Study Guide AIX HACMP...
  • Page 55 The following diagram shows a cluster consisting of two nodes and a client. A single public network connects the nodes and the client, and the nodes are linked point-to-point by a private high-speed SOCC connection that provides an alternate path for cluster and lock traffic should the public network fail. Figure 7.
  • Page 56 Service Adapter The service adapter is the primary connection between IBM Certification Study Guide AIX HACMP the node and the network. A node has one service adapter for each physical network to which it connects. The...
  • Page 57 until it assumes the shared IP address. Consequently, Clinfo makes known the boot address for this adapter. In an HACMP for AIX environment on the RS/6000 SP, the SP Ethernet adapters can be configured as service adapters but should not be configured for IP address takeover.
  • Page 58 Cluster Manager to detect a failure and take action. IBM Certification Study Guide AIX HACMP service label (address) instead of the boot label. If the node should fail, a takeover node acquires the failed node’s service address on its standby adapter, thus...
  • Page 59: Nfs Exports And Nfs Mounts

    If you do not use Hardware Address Takeover, the ARP cache of clients can be updated by adding the clients’ IP addresses to the variable in the /usr/sbin/cluster/etc/clinfo.rc file. 2.4.4 NFS Exports and NFS Mounts There are two items concerning NFS when doing the configuration of a Resource Group: Filesystems to Export Filesystems to NFS mount...
  • Page 60: Performance Requirements

    The startup script especially must be able to recover the application from an abnormal termination, such as a power failure. You should verify that it runs properly in a uniprocessor environment before including the HACMP for AIX software. IBM Certification Study Guide AIX HACMP...
  • Page 61: Licensing Methods

    Note Application start and stop scripts have to be available on the primary as well as the takeover node. They are not transferred during synchronization; so, the administrator of a cluster has to ensure that they are found in the same path location, with the same permissions and in the same state, i.e.
  • Page 62: Customization Planning

    (or has just occurred), and that an event script succeeded or failed. For example, a site may want to use a network_down notification IBM Certification Study Guide AIX HACMP command that sends mail to indicate that an event is...
  • Page 63: Error Notification

    event to inform system administrators that traffic may have to be rerouted. Afterwards, you can use a network_up notification event to inform system administrators that traffic can again be serviced through the restored network. 2.6.1.3 Predictive Event Error Correction You can specify a command that attempts to recover from an event script failure.
  • Page 64 * Notify Method F1=Help F5=Reset F9=Shell Figure 8. Sample Screen for Add a Notification Method IBM Certification Study Guide AIX HACMP HPS_FAULT9_ER HPS_FAULT3_ER smit hacmp > RAS Support > Error you will find the menu allowing you to Add a Notify Method...
  • Page 65 The above example screen will add a Notification Method to the ODM, so that upon appearance of the HPS_FAULT9_ER entry in the error log, the error notification daemon will trigger the execution of the /usr/sbin/cluster/utilities/clstop -grsy gracefully with takeover. In this way, the switch failure is acted upon as a node failure.
  • Page 66: User Id Planning

    • Adding groups to all cluster nodes • Changing characteristics of a group on all cluster nodes • Removing a group from all cluster nodes IBM Certification Study Guide AIX HACMP is used, for that. On RS/6000 SP systems rdist...
  • Page 67: Cluster Passwords

    2.7.2 Cluster Passwords While user and group management is very much facilitated with C-SPOC, the password information still has to be distributed by some other means. If the system is not configured to use NIS or DCE, the system administrator still has to distribute the password information, meaning that found in the /etc/security/password file, to all cluster nodes.
  • Page 68 NFS files, if there are any. Next, it must unmount the NFS directory, acquire the shared volume (varyon the shared volume group) and mount the shared file system. Only after that can users access the application on the takeover node again. IBM Certification Study Guide AIX HACMP...
  • Page 69: Chapter 3. Cluster Hardware And Software Preparation

    HACMP can cover for the loss of any of the rootvg physical volumes. However, it is possible that a customer with business-critical applications will justify © Copyright IBM Corp. 1999...
  • Page 70 Procedure steps (2), (3), and (4). mirrorvg command takes dump devices and paging devices into account. mirrorvg If the dump devices are also the paging device, the logical volume will be IBM Certification Study Guide AIX HACMP mirrorvg...
  • Page 71 mirrored. If the dump devices are NOT the paging device, that dump logical volume will not be mirrored. 3.1.2.1 Procedure The following steps assume the user has rootvg contained on hdisk0 and is attempting to mirror the rootvg to a new disk: hdisk1. 1.
  • Page 72 7. Shutdown and reboot the system by executing the following command: shutdown -Fr This is so that the “Quorum OFF” functionality takes effect. IBM Certification Study Guide AIX HACMP is the first hdisk listed under the “PV” heading after the has executed.
  • Page 73: Aix Prerequisite Lpps

    3.1.2.2 Necessary APAR Fixes Table 11. Necessary APAR Fixes AIX Version To determine if either fix is installed on a machine, execute the following: instfix -i -k <apar number> 3.1.3 AIX Prerequisite LPPs In order to install HACMP and HACMP/ES the AIX setup must be in a proper state.
  • Page 74: Aix Parameter Settings

    I/O-bound node with a RESET packet. You can use I/O pacing to tune the system so that system resources are distributed more equitably during high disk I/O. You do this by setting high- IBM Certification Study Guide AIX HACMP...
  • Page 75 and low-water marks. If a process tries to write to a file at the high-water mark, it must wait until enough I/O operations have finished to make the low-water mark. Use the fastpath to set high- and low-water marks on the smit chgsys Change/Show Characteristics of the Operating System screen.
  • Page 76 NIS-managed users. Whether or not you log the fact that cron has been refreshed is optional. IBM Certification Study Guide AIX HACMP entry in rcnfs...
  • Page 77 #! /bin/sh # This script checks for a ypbind and a cron process. If both # exist and cron was started before ypbind, cron is killed so # it will respawn and know about any new users that are found # in the passwd file managed as an NIS map.
  • Page 78: Network Connection And Testing

    Configuring a notify method for these events can alert the network administrator to check and fix the broken hub. IBM Certification Study Guide AIX HACMP...
  • Page 79 Figure 9. Connecting Networks to a Hub 3.2.1.2 IP Addresses and Subnets The design of the HACMP for AIX software specifies that: • All client traffic be carried over the service adapter • Standby adapters be hidden from client applications and carry only internal Cluster Manager traffic Cluster Hardware and Software Preparation...
  • Page 80 • Use the ifconfig incorrect subnet masks, and improper broadcast addresses. IBM Certification Study Guide AIX HACMP command to make sure that the adapters are initialized command to check the point-to-point connectivity between command on all adapters to detect bad IP addresses,...
  • Page 81: Non Tcp/Ip Networks

    • Scan the /tmp/hacmp.out file to confirm that the /etc/rc.net script has run successfully. Look for a zero exit status. • If IP address takeover is enabled, confirm that the /etc/rc.net script has run and that the service adapter is on its service address and not on its boot address.
  • Page 82 This procedure has to be done for all the cluster nodes that are going to use a serial network of type tmscsi as defined in your planning sheets. IBM Certification Study Guide AIX HACMP fastpath to create a tty device on the nodes. On the resulting...
  • Page 83 3.2.2.4 Configuring Target Mode SSA The node number on each system needs to be changed from the default of zero to a number. All systems on the SSA loop must have a unique node number. To change the node number use the following command: chdev -l ssar -a node_number=# To show the system’s node number use the following command: lsattr -El ssar...
  • Page 84: Cluster Disk Setup

    • A maximum of three dummy disk drive modules can be connected next to each other. • The maximum length of an SSA cable is 25 m. With Fiber-Optic Extenders, the connection length can be up to 2.4 km. IBM Certification Study Guide AIX HACMP...
  • Page 85: Ssa Disks

    For more information regarding adapters and cabling rules see 2.3.1, “SSA Disks” on page 16 or the following documents: • 7133 SSA Disk Subsystems: Service Guide, SY33-0185-02 • 7133 SSA Disk Subsystem: Operator Guide, GA33-3259-01 • 7133 Models 010 and 020 SSA Disk Subsystems: Installation Guide, GA33-3260-02 •...
  • Page 86 • Accept the read and write subroutine calls to the special files. • Can be members of volume groups and have file systems mounted on them. In order to list the logical disk definitions, use the following command: IBM Certification Study Guide AIX HACMP SSA Enhanced Adapter SSA Adapter Router...
  • Page 87 #lsdev -Cc disk| grep SSA hdisk3 Available 00-07-L hdisk4 Available 00-07-L hdisk5 Available 00-07-L hdisk6 Available 00-07-L hdisk7 Available 00-07-L hdisk8 Available 00-07-L SSA physical disks: • Are configured as pdisk0, pdisk1,...,pdiskN. • Have errors logged against them in the system error log. •...
  • Page 88 8. Use the directory /usr/sys/inst.images as the install device 9. Select all filesets in this directory for install 10.Execute the command 11.Exit Smit IBM Certification Study Guide AIX HACMP relationships between physical (pdisk) and logical (hdisk) disks. This option enables you to format SSA disk drives.
  • Page 89 If, after repeating the procedure, the code levels do not match the latest ones, place a call with your local IBM Service Center. 16.If the adapters are in SSA loops which contain other adapters in other systems, please repeat this procedure on all systems as soon as possible.
  • Page 90: Scsi

    Select Add an SSA RAID Array to do the definitions. 3.3.2 SCSI The following sections contain important information about SCSI: cabling, connecting RAID subsystems, and adapter SCSI ID and termination change. IBM Certification Study Guide AIX HACMP SSA RAID Arrays F2=Refresh F3=Cancel...
  • Page 91 3.3.2.1 Cabling The following sections describe important information about cabling. SCSI Adapters A overview of SCSI adapters that can be used on a shared SCSI bus is given in 2.3.2.3, “Supported SCSI Adapters” on page 26. For the necessary adapter changes, see 3.3.2.3, “Adapter SCSI ID and Termination change”...
  • Page 92 Figure 10. 7135-110 RAIDiant Arrays Connected on Two Shared 8-Bit SCSI Buses To connect a set of 7135s to SCSI-2 Differential Fast/Wide Adapter/As or Enhanced SCSI-2 Differential Fast/Wide Adapter/As on a shared 16-bit SCSI bus, you need the following: • 16-Bit SCSI-2 Differential Y-Cable IBM Certification Study Guide AIX HACMP...
  • Page 93 FC: 2426 (0.94m), PN: 52G4234 • 16-Bit SCSI-2 Differential System-to-System Cable FC: 2424 (0.6m), PN: 52G4291 - OR - FC: 2425 (2.5m), PN: 52G4233 This cable is used only if there are more than two nodes attached to the same shared bus. •...
  • Page 94 6 bit) #2416 (16- 6 (16-bit) #2424 #2426 #2426 #2416 (16-b 6-bit) #2416 (16-bit) 6 (16-bit ) Maximum total cable length: 25m IBM Certification Study Guide AIX HACMP...
  • Page 95 #2902 #2426 #2902 #2416 (16-bit) #2416 (16-bit) #2426 #2416 (16-bit) #2416 (16-bit ) Figure 11. 7135-110 RAIDiant Arrays Connected on Two Shared 16-Bit SCSI Buses 3.3.2.3 Adapter SCSI ID and Termination change The SCSI-2 Differential Controller is used to connect to 8-bit disk devices on a shared bus.
  • Page 96 Fast/Wide Adapter/A) are shown in Figure 12 and Figure 13 respectively. Figure 12. Termination on the SCSI-2 Differential Controller P/N 56G7315 Figure 13. Termination on the SCSI-2 Differential Fast/Wide Adapters IBM Certification Study Guide AIX HACMP P/N 43G0176 Internal 16-bit SE Internal 8-bit SE...
  • Page 97 The ID of an SCSI adapter, by default, is 7. Since each device on an SCSI bus must have a unique ID, the ID of at least one of the adapters on a shared SCSI bus has to be changed. The procedure to change the ID of an SCSI-2 Differential Controller is: 1.
  • Page 98 Here, the adapter that you choose from the list you get after executing the device. Also, as shown below, you need to change the external SCSI ID only. IBM Certification Study Guide AIX HACMP [Entry Fields] scsi0...
  • Page 99: Shared Lvm Component Configuration

    Change/Show Characteristics of a SCSI Adapter SCSI adapter Description Status Location Internal SCSI ID External SCSI ID WIDE bus enabled Apply change to DATABASE only The command line version of this is: # chdev -l ascsi1 -a id=6 -P As in the case of the SCSI-2 Differential Controller, a system reboot is required to bring the change into effect.
  • Page 100: Creating Shared Vgs

    If you do not specify SSA disk fencing, assign node numbers using the following command: where x is the number to assign to that node. You must reboot the system to effect the change. IBM Certification Study Guide AIX HACMP fastpath to create a shared smit mkvg Description The name of the shared volume group should be unique within the cluster.
  • Page 101: Smit Mkvg Options (Concurrent, Non-Raid)

    Subsystems To create a concurrent access volume group on a RAID disk subsystem, such as an IBM 7135 disk subsystem, follow the same procedure as you would to create a non-concurrent access volume group. A concurrent access volume group can be activated (varied on) in either non-concurrent mode or concurrent access mode.
  • Page 102: Creating Shared Lvs And File Systems

    AIX assigns a logical volume name to each logical volume it creates. Examples of logical volume names are HACMP cluster, the name of any shared logical volume must be unique. Also, IBM Certification Study Guide AIX HACMP fastpath to create a shared volume group. Use the default Description The name of the shared volume group should be unique within the cluster.
  • Page 103 the journaled file system log (jfslog) is a logical volume that requires a unique name in the cluster. To make sure that logical volumes have unique names, rename the logical volume associated with the file system and the corresponding jfslog logical volume.
  • Page 104: Mirroring Strategies

    3.4.4.2 Importing a Volume Group onto the Destination Node This section covers how to import a volume group onto destination nodes using the SMIT interface. You can also use the TaskGuide utility for this task. IBM Certification Study Guide AIX HACMP command to varyoffvg...
  • Page 105: Smit Crjfs Options

    The TaskGuide uses a graphical interface to guide you through the steps of adding nodes to an existing volume group. For more information on the TaskGuide, see 3.4.6, “Alternate Method - TaskGuide” on page 90. Importing the volume group onto the destination nodes synchronizes the ODM definition of the volume group on each node on which it is imported.
  • Page 106: Quorum

    Enter: 3.4.5 Quorum Note This section does not apply to the IBM 7135-110 or 7135-210 RAIDiant Disk Array, which provides its own data redundancy. Quorum is a feature of the AIX LVM that determines whether or not a volume...
  • Page 107 command succeeds. If exactly half the copies are available, as with two of four, quorum is not achieved and the command fails. varyonvg 3.4.5.2 Quorum after Vary On If a write to a physical volume fails, the VGSAs on the other physical volumes within the volume group are updated to indicate that one physical volume has failed.
  • Page 108: Alternate Method - Taskguide

    The TaskGuide can reduce errors, as it does not allow a user to proceed with steps that IBM Certification Study Guide AIX HACMP varyonvg flag, which could cause unpredictable results. For...
  • Page 109 conflict with the cluster’s configuration. Online help panels give additional information to aid in each step. 3.4.6.1 TaskGuide Requirements Before starting the TaskGuide, make sure: • You have a configured HACMP cluster in place. • You are on a graphics capable terminal. 3.4.6.2 Starting the TaskGuide You can start the TaskGuide from the command line by typing: /usr/sbin/cluster/tguides/bin/cl_ccvg...
  • Page 110 IBM Certification Study Guide AIX HACMP...
  • Page 111: Chapter 4. Hacmp Installation And Cluster Definition

    © Copyright IBM Corp. 1999 must be confirmed. For parts /usr HACMP Base Client Libraries HACMP Base Client Runtime HACMP Base Client Utilities HACMP Base Server Diags...
  • Page 112 You might add your language’s messages if you want: cluster.msg.en_US.cspoc cluster.msg.en_US.client cluster.man.en_US.haview.data IBM Certification Study Guide AIX HACMP HACMP Base Server Utilities HACMP CSPOC Runtime Commands HACMP CSPOC commands HACMP CSPOC dsh and perl...
  • Page 113 • cluster.vsm The Visual Systems Management Fileset contains Icons and bitmaps for the graphical Management of HACMP Resources, as well as the command: cluster.vsm • cluster.haview This fileset contains the files for including HACMP cluster views into a TME 10 Netview Environment. It is installed on a Netview network management machine, and not on a cluster node: cluster.haview •...
  • Page 114: Upgrading From A Previous Version

    If your site is currently running an earlier version of the HACMP for AIX software in its cluster environment, except for Version 4.2.2 already running on AIX 4.3, the following procedures describe how to upgrade your existing IBM Certification Study Guide AIX HACMP Application Heart Beat Daemon AIX Run-time Executable...
  • Page 115 HACMP software to HACMP for AIX, Version 4.3. The comments on upgrading the Operating System are not included. If you are already running AIX 4.3, see the special note at the end of this section. Note Although your objective in performing a migration installation is to keep the cluster operational and to preserve essential configuration information, do not run your cluster with mixed versions of the HACMP for AIX software for an extended period of time.
  • Page 116 10.If using tty devices, check that the tty device is configured as a serial network using the 11.In order to verify and synchronize the configuration (if desired), you must have /.rhosts files on cluster nodes. If they do not exist, create the /.rhosts IBM Certification Study Guide AIX HACMP /etc/objrepos fastpath. smit chgtty program.
  • Page 117 file on Node A using the following command: /usr/sbin/cluster/utilities/cllsif -x >> /.rhosts This command will append information to the /.rhosts file instead of overwriting it. Then, you can ftp this file to the other nodes as necessary. 12.Verify the cluster topology on all nodes using the 13.Check that custom event scripts are properly installed.
  • Page 118: Defining Cluster Topology

    4.2 Defining Cluster Topology The cluster topology is comprised of the following components: • The cluster definition • The cluster nodes • The network adapters IBM Certification Study Guide AIX HACMP utility automatically updates the HACMP ODM object /etc/inittab /etc/rc.net smit clstop...
  • Page 119: Defining The Cluster

    • The network modules You define the cluster topology by entering information about each component into HACMP-specific ODM classes. You enter the HACMP ODM data by using the HACMP SMIT interface or the VSM utility utility is an X Windows tool for creating cluster configurations using xhacmpm icons to represent cluster components.
  • Page 120: Defining Adapters

    Adapter IP Label Network Type IBM Certification Study Guide AIX HACMP Enter the IP label (the name) of the adapter you have chosen as the service address for this adapter. Adapter labels can be any ASCII text string consisting of alphabetical and numeric characters, underscores, and hyphens, up to 31 characters.
  • Page 121 Network Name Enter an ASCII text string that identifies the network. The network name can include alphabetic and numeric characters and underscores. Use no more than 31 characters. The network name is arbitrary, but must be used consistently for adapters on the same physical network.
  • Page 122 HACMP configuration. IBM Certification Study Guide AIX HACMP Enter the IP address in dotted decimal format or a device file name. IP address information is required for non-serial network adapters only if the node’s address...
  • Page 123: Configuring Network Modules

    Note When IPAT is configured, the run level of the IP-related entries (e. g. rctcpip, rcnfs...) of the that these services are not started at boot time, but with HACMP. Adding or Changing Adapters after the Initial Configuration If you want to change the information about an adapter after the initial configuration, use the Change/Show an Adapter screen.
  • Page 124: Synchronizing The Cluster Definition Across Nodes

    DCDs on all cluster nodes are synchronized. In addition, the configuration data stored in the active configuration directory (ACD) on each cluster node is overwritten with the new configuration data, which becomes the new active IBM Certification Study Guide AIX HACMP...
  • Page 125 configuration. If the cluster manager is active on some other cluster nodes but not on the local node, the synchronization operation is aborted. Before attempting to synchronize a cluster configuration, ensure that all nodes are powered on, that the HACMP software is installed, and that the /etc/hosts and /.rhosts files on all nodes include all HACMP boot and service IP labels.
  • Page 126: Defining Resources

    The relationship can be one of Cascading, Rotating or Concurrent. See 2.4.1, “Resource Group Options” on page 28 for details. IBM Certification Study Guide AIX HACMP...
  • Page 127 4.3.1.1 Configuring Resources for Resource Groups Once you have defined resource groups, you further configure them by assigning cluster resources to one resource group or another. You can configure resource groups even if a node is powered down. However, SMIT cannot list possible shared resources for the node (making configuration errors likely).
  • Page 128 Resource Group. They consist of a (hopefully meaningful) name, in order to enable the cluster manager to identify the application server uniquely, as well IBM Certification Study Guide AIX HACMP If IP address takeover is being used, list the IP label to be moved when this resource group is taken over.
  • Page 129: Initial Testing

    as the path locations for start and stop scripts for the application. These scripts have to be in the same location on every service node. Just as for pre- and post-events, these scripts can be adapted to specific nodes. They don’t need to be equal in content. The system administrator has to ensure, however, that they are in the same location, use the same name, and are executable for the root user.
  • Page 130: Initial Startup

    -f /tmp/hacmp.out After the cluster has become stable, you might check the again to verify that the takeover node has acquired the IP address of the “failed” node. IBM Certification Study Guide AIX HACMP command: command. netstat -i In the panel that appears, choose smit clstart.
  • Page 131: Cluster Snapshot

    For cascading resource groups the failed node is going to reaquire its resources, once it is up and running again. So, you have to restart HACMP on it through smitty clstart clusters status. Further and more intensive debugging issues are covered in Chapter 7, “Cluster Troubleshooting”...
  • Page 132: Applying A Cluster Snapshot

    More detailed Information about Cluster Snapshot can be found in the HACMP for AIX, Version 4.3: Administration Guide , SC23-4279, Chapter 11 as well as in the HACMP for AIX, Version 4.3: Troubleshooting Guide , SC23-4280. IBM Certification Study Guide AIX HACMP...
  • Page 133 HACMP Installation and Cluster Definition...
  • Page 134 IBM Certification Study Guide AIX HACMP...
  • Page 135: Chapter 5. Cluster Customization

    5.1.1.1 Node Events This is the sequence of node_up events: node_up node_up_local © Copyright IBM Corp. 1999 This event occurs when a node joins the cluster. Depending on whether the node is local or remote, this event initiates either a node_up_local or node_up_remote event.
  • Page 136 Calls the start_server script to start application node_up_remote_completeAllows the local node to do an NFS mount only IBM Certification Study Guide AIX HACMP (If configured for IP address takeover.) Configures boot addresses to the corresponding service address, and starts TCP/IP servers and network daemons by running the command.
  • Page 137 event occurs only after a node_up_remote event has successfully completed. Sequence of node_down Events node_down This event occurs when a node intentionally leaves the cluster or fails. Depending on whether the exiting node is local or remote, this event initiates either the node_down_local or node_down_remote event, which in turn initiates a series of subevents.
  • Page 138 This event occurs only after a network_down network_up network_up_complete IBM Certification Study Guide AIX HACMP local node has left the cluster. This event occurs only after a node_down_local event has successfully completed. event runs only after a node_down_remote event has successfully completed.
  • Page 139 no actions since appropriate actions depend on the local network configuration. 5.1.1.3 Network Adapter Events swap_adapter This event occurs when the service adapter on a node fails. The swap_adapter event exchanges or swaps the IP addresses of the service and a standby adapter on the same HACMP network and then reconstructs the routing table.
  • Page 140: Pre- And Post-Event Processing

    If the retry count is greater than zero, and the recovery command succeeds, the event script command is rerun. You can also specify the number of times to attempt to execute the recovery command. IBM Certification Study Guide AIX HACMP dynamic reconfiguration has completed. Command field.
  • Page 141: Notes On Customizing Event Processing

    For example, a file system cannot be unmounted, because of a process running on it. Then, you might want to kill that process first, before unmounting the file system, in order to get the event script done. Now, since the event script didn’t succeed in its first run, the Retry feature enables HACMP for AIX to retry it until it finally succeeds, or the retry count is reached.
  • Page 142: Network Modules/Topology Services And Group Services

    Faster heartbeat rates also place a greater load on networks. • If your networks are very busy and you experience false failure detections, you can try changing the failure detection speed on the network modules to slow to avoid this problem. IBM Certification Study Guide AIX HACMP...
  • Page 143: Nfs Considerations

    The failure rate of networks varies, depending on their characteristics. For example, for an Ethernet, the normal failure detection rate is two keepalives per second; fast is about four per second; slow is about one per second. For an HPS network, because no network traffic is allowed when a node joins the cluster, normal failure detection is 30 seconds;...
  • Page 144: Exporting Nfs File Systems

    5.4.4.1 Server-to-Server NFS Cross Mounting HACMP allows you to configure a cluster so that servers can NFS-mount each other’s file systems. The following figure shows an example: IBM Certification Study Guide AIX HACMP utility that uses the cl_export_fs flag and specifies the file system names stored...
  • Page 145 Figure 14. NFS Cross Mounts When Node A fails, Node B uses the utility to close open files in cl_nfskill Node A:/afs, unmounts it, mounts it locally, and re-exports it to waiting clients. After takeover, Node B has: /bfs locally mounted /bfs nfs-exported /afs locally mounted /afs nfs-exported...
  • Page 146: Cross Mounted Nfs File Systems And The Network Lock Manager

    If you have non-cluster related NFS file systems where losing locks would be unacceptable, you may need to take appropriate steps before using this addition to the cl_deactivate_nfs script. Add the code below between the following two lines (three places): IBM Certification Study Guide AIX HACMP flock()
  • Page 147 ######## Add for NFS Lock Removal (start) ######## ######## Add for NFS Lock Removal (finish) ######## ############################################################################### # Name: cl_deactivate_nfs # Given a list of nfs-mounted filesystems, we try and unmount -f # any that are currently mounted. # Arguments: list of filesystems. ############################################################################### PROGNAME="$0"...
  • Page 148 2 done done ######## Add for NFS Lock Removal (start) ######## if [ "$STOPPED" = "true" ] then startsrc -s rpc.statd startsrc -s rpc.lockd ######## Add for NFS Lock Removal (finish) ######## exit 0 IBM Certification Study Guide AIX HACMP...
  • Page 149: Chapter 6. Cluster Testing

    Note that cluster services must be stopped on both nodes to perform this test. © Copyright IBM Corp. 1999 in order to clean up the VPD. on the first node and (enter twice!) on the second node where...
  • Page 150: System Parameters

    • Type netstat -r other cluster node interfaces and to clients. • Run no -a | more ipsendredirects IBM Certification Study Guide AIX HACMP to ensure that the dump space is sysdumpdev -e more /etc/inittab) crontab -l lsps -a ps -ef | more...
  • Page 151: Lvm State

    • Check that all interfaces communicate ( <ip-address>). • List the arp table entries with • Check the status of the TCP/IP daemons ( • Ensure that there are no bad entries in the /etc/hosts file, especially at the bottom of the file. •...
  • Page 152: Simulate Errors

    NodeF. • Verify that the swap adapter has occurred (including MAC Addressfailover) and that HACMP has turned the original service interface back on as the standby interface. IBM Certification Study Guide AIX HACMP /usr/sbin/cluster/diag/clconfig /usr/sbin/cluster/utilities/cllscf snmpinfo -m dump -o flag and periodic...
  • Page 153 • Use ifconfig to swap the service address back to the original service interface back ( ifconfig en1 down to failover back to the service adapter on NodeF. 6.2.1.2 Ethernet or Token Ring Adapter or Cable Failure Perform the following steps in the event of an Ethernet or Token Ring adapter or cable failure: •...
  • Page 154 -l NodeFvg • Verify that all sharedvg file systems and paging spaces are accessible ( lsps -a • Re-attach the cables. IBM Certification Study Guide AIX HACMP ) on NodeF and cause a node errpt -a | more netstat -i ps -U <appuid>...
  • Page 155: Node Failure / Reintegration

    • Verify that all sharedvg file systems and paging spaces are accessible ( lsps -a 6.2.2 Node Failure / Reintegration The following sections deal with issues of node failure and reintegration. 6.2.2.1 AIX Crash Perform the following steps in the event of an AIX crash: •...
  • Page 156: Network Failure

    • Check, by way of the verification commands, that all the Nodes in the cluster are up and running. • Optional: Prune the error log on NodeF ( IBM Certification Study Guide AIX HACMP netstat -i smit clstart netstat -i of a test file for volume groups, and sh /etc/tcp.clean...
  • Page 157: Disk Failure

    • Monitor the cluster log files on NodeT. • Disconnect the network cable from the appropriate service and all the standby interfaces at the same time (but not the Administrative SP Ethernet) on NodeF. This will cause HACMP to detect a network_down event.
  • Page 158 • Check, by way of the verification commands, that all the Nodes in the cluster are up and running. • Optional: Prune the error log on NodeF ( IBM Certification Study Guide AIX HACMP errclear 0 smit raidiant ; RAIDiant Disk Array Manager -> List all SCSI...
  • Page 159: Application Failure

    • Monitor cluster logfiles on NodeT if HACMP has been customized to monitor 7133 disk failures. • Since the 7133 disk is hot pluggable, remove a disk from drawer 1 associated with NodeF's shared volume group. • The failure of the 7133 disk will be detected in the error log ( ) on NodeF, and the logical volumes with copies on that disk will be more marked stale (...
  • Page 160 IBM Certification Study Guide AIX HACMP...
  • Page 161: Chapter 7. Cluster Troubleshooting

    Table 21. HACMP Log Files Log File Name /var/adm/cluster.log /tmp/hacmp.out © Copyright IBM Corp. 1999 Description Contains time-stamped, formatted messages generated by HACMP for AIX scripts and daemons. In this log file, there is one line written for the start of each event, and one line written for the completion.
  • Page 162: Config_Too_Long

    Contains time-stamped, formatted messages generated by HACMP for AIX clstrmgr activity. Information in this file is used by IBM Support personnel when the clstrmgr is in debug mode. Note that this file is overwritten every time cluster services are started; so, you should be careful to make a copy of it before restarting cluster services on a failed node.
  • Page 163: Deadman Switch

    hang. After a certain amount of time, by default 360 seconds, the cluster manager will issue a config_too_long message into the /tmp/hacmp.out file. The message issued looks like this: The cluster has been in reconfiguration too long;Something may be wrong. In most cases, this is because an event script has failed.
  • Page 164: Tuning The System Using I/O Pacing

    -o thewall=xxxxx where xxxxx is the value you want to be available for use by the communications subsystem. For example, no -o thewall=65536 IBM Certification Study Guide AIX HACMP smit chgsys syncd reports that requests for mbufs are being denied,...
  • Page 165: Changing The Failure Detection Rate

    7.3.4 Changing the Failure Detection Rate Use the SMIT Change/Show a Cluster Network Module failure detection rate for your network module only if enabling I/O pacing or extending the syncd frequency did not resolve deadman problems in your cluster. By changing the failure detection rate to “Slow”, you can extend the time required before the deadman switch is invoked on a hung node and before a takeover node detects a node failure and acquires a hung node’s resources.
  • Page 166: The Dgsp Message

    For example, in a cluster where one partition has NodeA and the other has NodeB, NodeB will be shut down. IBM Certification Study Guide AIX HACMP is sent...
  • Page 167: User Id Problems

    7.6 User ID Problems Within an HACMP cluster, you always have more than one node potentially offering the same service to a specific user or a specific user id. As the node providing the service can change, the system administrator has to ensure that the same user and group is known to all nodes potentially running an application.
  • Page 168 • Do not neglect the obvious. Small things can cause big problems. Check plugs, connectors, cables, and so on. • Keep a record of the tests you have completed. Record your tests and results, and keep an historical record of the problem, in case it reappears. IBM Certification Study Guide AIX HACMP...
  • Page 169: Chapter 8. Cluster Management And Administration

    AIX daemons are running on each node. Finally, if necessary, examine the other cluster log files to get a more in-depth view of the cluster status. © Copyright IBM Corp. 1999 utility, which reports the status of key screen, which shows the status of the...
  • Page 170: The Clstat Command

    Protocol (SNMP). It combines periodic polling and event notification through traps to retrieve cluster topology and state changes from the HACMP management agent, that is, the Cluster SMUX peer daemon ( IBM Certification Study Guide AIX HACMP /usr/sbin/cluster/clstat clstat utility runs on both ASCII and X Window...
  • Page 171: Cluster Log Files

    More details on how to configure HAView and on how to monitor your cluster with HAView can be found in Chapter 3, “Monitoring an HACMP cluster” in HACMP for AIX, Version 4.3: Administration Guide , SC23-4279. 8.1.3 Cluster Log Files HACMP for AIX writes the messages it generates to the system console and to several log files.
  • Page 172: Starting And Stopping Hacmp On A Node Or A Client

    8.1.3.8 /var/ha/log/grpsvcs.<filename> Contains timestamped messages in ASCII format. These track the execution of internal activities of the grpsvcs daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed regularly. Therefore, please save it promptly if there is a chance you may need it.
  • Page 173: Hacmp Daemons

    (C-SPOC) utility can be used to start and stop cluster services on all nodes in cluster environments. Starting cluster services refers to the process of starting the HACMP for AIX daemons that enable the coordination required between nodes in a cluster. Starting cluster services on a node also triggers the execution of certain HACMP for AIX scripts that initiate the cluster.
  • Page 174: Starting Cluster Services On A Node

    C-SPOC /usr/sbin/cluster/utilities/cl_rc.cluster node. The C-SPOC start cluster services on the nodes specified from the one node. The nodes IBM Certification Study Guide AIX HACMP /usr/sbin/cluster/etc/clinfo.rc daemon is optional on cluster nodes clinfo topsvcsd client;...
  • Page 175: Stopping Cluster Services On A Node

    are started in sequential order - not in parallel. The output of the command run on the remote node is returned to the originating node. Because the command is executed remotely, there can be a delay before the command output is returned. 8.2.2.1 Automatically Restarting Cluster Services You can optionally have cluster services start whenever the system is rebooted.
  • Page 176 If the SRC detects that any HACMP daemon has exited abnormally (without being shut down using the /usr/sbin/cluster/utilities/clexit.rc IBM Certification Study Guide AIX HACMP In a graceful stop, the HACMP software shuts down its applications and releases its resources. The other nodes do not take over the resources of the stopped node.
  • Page 177: Starting And Stopping Cluster Services On Clients

    prevents unpredictable behavior from corrupting the data on the shared disks. See the Important Note Never use the command causes the SRC to run the /usr/sbin/cluster/utilities/clexit.rc script which halts the system immediately, causing the surviving nodes to initiate failover. 8.2.4 Starting and Stopping Cluster Services on Clients Use the /usr/sbin/cluster/etc/rc.cluster start...
  • Page 178: Replacing Failed Components

    Most network and SSA cables can be changed online. Do some testing, for example, exchange the cables or try to connect to another port in your hub to see if the hub is your problem. IBM Certification Study Guide AIX HACMP...
  • Page 179: Disks

    • The new adapter must be of the same type or a compatible type as the replaced adapter. • When replacing or adding an SCSI adapter, remove the resistors for shared buses. Furthermore, set the SCSI ID of the adapter to a value different than 7.
  • Page 180 7. Add the new disk to the sharedvg ( 8. Increase the number of LV copies to span across the new disk ( cl_lvsc 9. Sync the volume group ( IBM Certification Study Guide AIX HACMP rmdev -l hdiskX -d; rmdev -l on all nodes. mkdev...
  • Page 181: Changing Shared Lvm Components

    8.4 Changing Shared LVM Components Changes to VG constructs are probably the most frequent kind of changes to be performed in a cluster. As a system administrator of an HACMP for AIX cluster, you may be called upon to perform any of the following LVM-related tasks: •...
  • Page 182: Lazy Update

    The time needed for takeover expands by a few minutes if a Lazy Update occurs. A Lazy Update is always performed the first time a takeover occurs in order to create the timestamp file on the takeover node. IBM Certification Study Guide AIX HACMP...
  • Page 183: C-Spoc

    Lazy Update has some limitations, which you need to consider when you rely on Lazy Update in general: • If the first disk in a sharedvg has been replaced, the will fail as Lazy Update expects to be able to match the hdisk number for the first disk to a valid PVID in the ODM.
  • Page 184 AIX commands or SMIT menus to create file systems, and use C-SPOC to update the VG information on the other nodes. • C-SPOC cannot be used for concurrent shared LVM components prior to HACMP 4.3 for AIX. IBM Certification Study Guide AIX HACMP...
  • Page 185: Taskguide

    To use the SMIT shortcuts to C-SPOC, type concurrent volume groups. Concurrent volume groups must be varied on in concurrent mode to perform tasks. 8.4.4 TaskGuide The TaskGuide is a graphical interface that simplifies the task of creating a shared volume group within an HACMP cluster configuration. The TaskGuide presents a series of panels that guide the user through the steps of specifying initial and sharing nodes, disks, concurrent or non-concurrent access, volume group name and physical partition size, and cluster settings.
  • Page 186: Add/Change/Remove Cluster Resources

    Chapter 3 in the HACMP for AIX, Version 4.3: Concepts and Facilities , SC23-4276) on the local node is copied to the ODMs stored in the DCDs on all cluster nodes. IBM Certification Study Guide AIX HACMP smit cm_add_grp smit cm_add_res...
  • Page 187: Dare Resource Migration Utility

    • If the Cluster Manager is active on the local node, synchronization triggers a cluster-wide, dynamic reconfiguration event. In dynamic reconfiguration, the configuration data stored in the DCD is updated on each cluster node, and, in addition, the new ODM data replaces the ODM data stored in the ACD (Active Configuration Directory) on each cluster node.
  • Page 188 The one instance in which a non-sticky migration of a cascading resource might make sense is if this resource has the IBM Certification Study Guide AIX HACMP...
  • Page 189 INACTIVE_TAKEOVER flag set to false and has not yet started because its primary node is down. In general, however, only rotating resource groups should be migrated in a non-sticky manner. Such migrations are one-time events and occur similar to normal rotating resource group flavors. After migration, the resource group immediately resumes a normal rotating resource group failover policy, but from the new location.
  • Page 190 You can specify a migration type after the second colon. Repeat this syntax on the command line for each resource group you want to migrate. Do not include spaces between arguments. IBM Certification Study Guide AIX HACMP , causes a resource group to be stop default, stop, command.
  • Page 191 Note that you cannot add nodes to the resource group list with the DARE Resource Migration utility. This task is performed through SMIT. Stopping Resource Groups If the location field of a migration contains the keyword actual nodename, the DARE Resource Migration utility attempts to stop the resource group, which includes taking down any service label, unmounting file systems, and so on.
  • Page 192: Applying Software Maintenance To An Hacmp Cluster

    If an update to the cluster.base.client.lib file set has been applied and you are using Cluster Lock Manager or Clinfo API functions, you may need to relink your applications. IBM Certification Study Guide AIX HACMP command to find out if sticky clfindres...
  • Page 193 5. Restart the HACMP for AIX software on the node using the fastpath and verify that the node successfully joined the cluster. 6. Repeat Steps 1 through 5 on the remaining cluster nodes. Figure 15 below shows the procedure: Fallover of System A System A System A...
  • Page 194: Backup Strategies

    The mirroring capability of the AIX Logical Volume Manager (LVM) can be used to address this issue. IBM Certification Study Guide AIX HACMP , refer to the AIX Commands tar, cpio, dd...
  • Page 195 8.7.1.1 How to do a split-mirror backup This same procedure can be used with just one mirrored copy of a logical volume. If you remove a mirrored copy of a logical volume (and file system), and then create a new logical volume (and file system) using the allocation map from that mirrored copy, your new logical volume and file system will contain the same data as was in the original logical volume.
  • Page 196: Using Events To Schedule A Backup

    As 2.7, “User ID Planning” on page 48 described, on an HACMP cluster, the administrator has to take care of user and group IDs throughout the cluster. If IBM Certification Study Guide AIX HACMP command to add back the logical volume copy you...
  • Page 197: Listing Users On All Cluster Nodes

    they don’t match, the user won’t get anything done after a failover happened. So, the administrator has to keep definitions equal throughout the cluster. Fortunately, the C-SPOC utility, as of HACMP Version 4.3 and later, does this for you. When you create a cluster group or user using C-SPOC, it makes sure that it has the same group id or user id throughout the cluster.
  • Page 198: Changing Attributes Of Users In A Cluster

    /etc/security/passwd file. IBM Certification Study Guide AIX HACMP to one clusternode after the other, or use the command or the Add a User to the Cluster SMIT screen.
  • Page 199: Managing Group Accounts

    To remove a user account from one or more cluster nodes, you can either use the AIX rmuser C-SPOC cl_rmuser Cluster SMIT screen. The command on all cluster nodes. Note The system removes the user account but does not remove the home directory or any files owned by the user.
  • Page 200 IBM Certification Study Guide AIX HACMP...
  • Page 201: Chapter 9. Special Rs/6000 Sp Topics

    Availability Workstation”, in the IBM Parallel System Support Programs for AIX Installation and Migration Guide , GA22-7347, or to Chapter 4,”Planning for a High Availability Workstation”, in the IBM RS/6000 SP Planning Volume 2, Control Workstation and Software Environment , GA22-7281.
  • Page 202: Software Requirements

    Contact your IBM representative for the neccessary hardware (see Figure 16 on page 184). Both the tty network and the RS/6000 SP internal ethernet are extended to the backup cws.
  • Page 203: Install High Availability Software

    The backup cws has to be installed with the same level of AIX and PSSP. Depending on the kerberos configuration of the primary cws, the backup cws has to be configured either as a secondary authentication server for the authentication realm of your RS/6000 SP when the primary cws is an authentication server itself, or as an authentication client when the primary cws is an authentication client of some other server.
  • Page 204: Setup And Test Hacws

    Run the command: /usr/sbin/hacws/install_hacws -p primary_hostname -b backup_hostname -s on the primary cws to set up HACWS with the 2 node names. IBM Certification Study Guide AIX HACMP [hacws_group1] [rotating] backup cws”] at least the hostname of the primary cws...
  • Page 205: Kerberos Security

    After that, identify the HACWS event scripts to HACMP by executing the /usr/sbin/hacws/spcw_addevents command, and verify the configuration with the /usr/sbin/hacws/hacws_verify command. You should also check the cabling from the backup cws with the /usr/sbin/hacws/spcw_verify_cabling command. Then reboot the primary and the backup cws, one after the other, and start cluster services on the primary cws with services is up and running, check that control workstation services, such as , are working as expected.
  • Page 206 What this ticket-granting ticket service does is to give the client systems a ticket that has a certain time span, whose purpose is to IBM Certification Study Guide AIX HACMP...
  • Page 207: Configuring Kerberos Security With Hacmp Version 4.3

    IP label, like hadave1_stby and a realm, so that the principal in its full length would look like godm.hadave1_stby@ITSO.AUSTIN.IBM.COM. Now after adding all the needed principals to the kerberos database, you must also add them to the /etc/krb-srvtab file on the nodes.
  • Page 208: Vsds - Rvsds

    Importantly, VSD supports only raw logical volumes, not file systems. The VSD facility is included in the ssp.csd.vsd fileset of PSSP. IBM developed VSD to enable Oracle’s parallel database on the SP. Oracle’s database architecture is strongly centralized. Any processing element, or node, must be able to “see”...
  • Page 209 I/O transaction sizes to minimize the packetizing workload of the VSD protocol. Buddy buffers are discussed in detail in IBM Parallel System Support Programs for AIX Managing Shared Disks , SA22-7279. Special RS/6000 SP Topics...
  • Page 210 The distributed data access aspect of VSD scales well. The SP Switch itself provides a very high-bandwidth, scalable interconnect between VSD clients and servers, while the VSD layers of code are efficient. The performance IBM Certification Study Guide AIX HACMP Undefined define...
  • Page 211: Recoverable Virtual Shared Disk

    I/O request through VSD relative to the normal VMM/LVM pathway is very small. IBM supports any IP network for VSD, but we recommend the switch for performance. VSD provides distributed data access, but not a locking mechanism to preserve data integrity.
  • Page 212 RVSD recovery activities begin and complete before the recovery of hc client applications takes place. This serialization helps ensure data integrity. Figure 20. RVSD Subsystem and HA Infrastructure IBM Certification Study Guide AIX HACMP RSVD Daemons rvsd Group Services...
  • Page 213: Sp Switch As An Hacmp Network

    9.4 SP Switch as an HACMP Network One of the fascinating things with an RS/6000 SP is the switch network. It has developed over time; so, currently there are two types of switches at customer sites. The “older” HPS or HiPS switch (High Performance Switch), also known as the TB2 switch, and the “newer”...
  • Page 214: Eprimary Management

    Recovery” on page 46. HACMP would be able to recognize the network failure when you configure the switch network as an HACMP network, and thus would react with a network_down event, which in turn would shut down the node from HACMP, causing a takeover. IBM Certification Study Guide AIX HACMP...
  • Page 215 In case this node was the Eprimary node on the switch network, and it is an SP switch, then the RS/6000 SP software would have chosen a new Eprimary independently from the HACMP software as well. Special RS/6000 SP Topics...
  • Page 216 IBM Certification Study Guide AIX HACMP...
  • Page 217: Chapter 10. Hacmp Classic Vs. Hacmp/Es Vs. Hanfs

    RS/6000 SPs. Since PSSP Version 2.2. RS/6000 SP Systems come with the Phoenix technology for managing availability of the nodes. This technology was already designed as a basic instrument for © Copyright IBM Corp. 1999 interfaces associated with a cluster. Each network module monitors one cluster network using one kind of communication protocol (for example, Ethernet or FDDI).
  • Page 218: Ibm Risc System Cluster Technology (Rsct)

    RS/6000s as of Version 4.3. 10.2.1 IBM RISC System Cluster Technology (RSCT) The High Availability services previously packaged with the IBM PSSP for AIX Availability Services, also known as the ssp.ha fileset, are now an integral part of the HACMP/ES software.
  • Page 219: Enhanced Cluster Security

    See Part 4 of HACMP for AIX, Version 4.3: Enhanced Scalability Installation and Administration Guide , SC23-4284, for more information on these services. 10.2.2 Enhanced Cluster Security With HACMP Version 4.3 comes an option to switch security Mode between Standard and Enhanced. Standard Synchronization is done through the facilities.
  • Page 220: Similarities And Differences

    If you still use the “old” HPS Switch and don’t want to lose its functionality, you are bound to the use of PSSP 2.4 or lower. Therefore HACMP Classic or HACMP/ES up to Version 4.2.2 only is the choice for you. IBM Certification Study Guide AIX HACMP...
  • Page 221 For switchless RS/6000 SP systems or SPs with the newer SP Switch, the decision will be based on a more functional level. Event Management is much more flexible in HACMP/ES, since you can define custom events. These events can act on anything that haemd detect, which is virtually anything measurable on an AIX system.
  • Page 222 IBM Certification Study Guide AIX HACMP...
  • Page 223: Appendix A. Special Notices

    References in this publication to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM’s product, program, or service may be...
  • Page 224 IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
  • Page 225 Microsoft Corporation. PC Direct is a trademark of Ziff Communications Company and is used by IBM Corporation under license. Pentium, MMX, ProShare, LANDesk, and ActionMedia are trademarks or registered trademarks of Intel Corporation in the U.S. and other countries.
  • Page 226 IBM Certification Study Guide AIX HACMP...
  • Page 227: Appendix B. Related Publications

    • HACMP/6000 Customization Examples, SG24-4498 • High Availability on the RISC System/6000 Family, SG24-4551 • Inside the RS/6000 SP, SG24-5145 • Monitoring and Managing IBM SSA Disk Subsystems , SG24-5251 • AIX Version 4.3 Migration Guide , SG24-5116 B.2 Redbooks on CD-ROMs Redbooks are also available on CD-ROMs.
  • Page 228: Other Publications

    B.3 Other Publications These publications are also relevant as additional sources of information: • IBM RS/6000 SP: Planning, Volume 2, Control Workstation and Software Environment , GA22-7281 • IBM PSSP for AIX: Installation and Migration Guide , GA22-7347 • IBM PSSP for AIX: Managing Shared Disks , SA22-7279 •...
  • Page 229: How To Get Itso Redbooks

    How to Get ITSO Redbooks This section explains how both customers and IBM employees can find out about ITSO redbooks, CD-ROMs, workshops, and residencies. A form for ordering books and CD-ROMs is also provided. This information was current at the time of publication, but is continually subject to change. The latest information may be found at http://www.redbooks.ibm.com/.
  • Page 230: How Customers Can Get Itso Redbooks

    United States (toll free) Canada Outside North America • 1-800-IBM-4FAX (United States) or (+1) 408 256 5422 (Outside USA) – ask for: Index # 4421 Abstracts of new redbooks Index # 4422 IBM redbooks Index # 4420 Redbooks for last six months •...
  • Page 231: Ibm Redbook Order Form

    IBM Redbook Order Form Please send me the following: Title First name Company Address City Telephone number Invoice to customer number Credit card number Credit card expiration date We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not available in all countries.
  • Page 232 IBM Certification Study Guide AIX HACMP...
  • Page 233: List Of Abbreviations

    Domain Name Service DSMIT Distributed System Management Interface Tool FDDI Fiber Distributed Data Interface Fast and Wide (SCSI) Gigabyte © Copyright IBM Corp. 1999 GODM Global Object Data Manager Graphical User Interface HACMP High Availability Cluster Multi-Processing HANFS High Availability...
  • Page 234 SNMP (see below) Multiplexor Systems Network Architecture SNMP Simple Network Management Protocol SOCC Serial Optical Channel Converter IBM Certification Study Guide AIX HACMP SPOF Single Point of Failure SPX/IPX Sequenced Package Exchange/Internetwork Packet Exchange System Resource Controller Serial Storage Architecture...
  • Page 235: Index

    Cascading Resource Group 29 cascading resource groups NFS crossmounting issues 126 changing user accounts 180 cl_lsuser command using 179 cl_mkuser command © Copyright IBM Corp. 1999 using 179 cldare command 172 clfindres 173 clinfo 156 cllockd 155 clsmuxpd 155 clstat 152...
  • Page 236 HACWS 183 HANFS for AIX 201 Hardware Address Swapping 12 hardware address swapping 40 planning 40 HAView 151 IBM Certification Study Guide AIX HACMP heartbeats 11 home directories 49 Hot Standby Configuration 30 hot standby configuration 30 I/O Pacing 56...
  • Page 237 Rootvg Mirroring 51 Rotating Resource Group 29 Rotating Standby Configuration 31 rotating standby configuration 31 RS232 15 RSCT 200 Rules IBM 7190 21 Rules for SSA Loops 20 Run-Time Parameters 110 RVSD 193 SCSI target mode 38 SCSI Disks 26...
  • Page 238 Token-Ring 13 Topology Service 200 topsvcsd 156 Upgrading 96 user accounts adding 179 changing 180 creating 179 removing 180 User and Group IDs 48 VGDA 88 VGSA 88 Virtual Shared Disk (VSDs) 190 xhacmpm 101 IBM Certification Study Guide AIX HACMP...
  • Page 239: Itso Redbook Evaluation

    • Use the online evaluation form found at http://www.redbooks.ibm.com • Fax this form to: USA International Access Code + 1 914 432 8264 • Send your comments in an Internet note to redbook@us.ibm.com Which of the following best describes you?
  • Page 240 IBM Certification Study Guide AIX HACMP SG24-5131-00...

This manual is also suitable for:

Aix hacmp

Table of Contents