Tuesday, 28 September 2010

RAC Clusterware Startup Sequence





Clusterware Startup Sequence

The following is the Clusterware startup sequence (image from the "Oracle Clusterware Admin

istration and Deployment Guide):



Don't let this picture scare you too much. You aren't responsible for managing all of these processes, that is the Clusterware's job!

Short summary of the startup sequence: INIT spawns init.ohasd (with respawn) which in turn starts the OHASD process (Oracle High Availability Services Daemon). This daemon spawns 4 processes.

Level 1: OHASD Spawns:

  • cssdagent - Agent responsible for spawning CSSD.
  • orarootagent - Agent responsible for managing all root owned ohasd resources.
  • oraagent - Agent responsible for managing all oracle owned ohasd resources.
  • cssdmonitor - Monitors CSSD and node health (along wth the cssdagent).

Level 2: OHASD rootagent spawns:

  • CRSD - Primary daemon responsible for managing cluster resources.
  • CTSSD - Cluster Time Synchronization Services Daemon
  • Diskmon
  • ACFS (ASM Cluster File System) Drivers

Level 2: OHASD oraagent spawns:

  • MDNSD - Used for DNS lookup
  • GIPCD - Used for inter-process and inter-node communication
  • GPNPD - Grid Plug & Play Profile Daemon
  • EVMD - Event Monitor Daemon
  • ASM - Resource for monitoring ASM instances

Level 3: CRSD spawns:

  • orarootagent - Agent responsible for managing all root owned crsd resources.
  • oraagent - Agent responsible for managing all oracle owned crsd resources.

Level 4: CRSD rootagent spawns:

  • Network resource - To monitor the public network
  • SCAN VIP(s) - Single Client Access Name Virtual IPs
  • Node VIPs - One per node
  • ACFS Registery - For mounting ASM Cluster File System
  • GNS VIP (optional) - VIP for GNS

Level 4: CRSD oraagent spawns:

  • ASM Resouce - ASM Instance(s) resource
  • Diskgroup - Used for managing/monitoring ASM diskgroups.
  • DB Resource - Used for monitoring and managing the DB and instances
  • SCAN Listener - Listener for single client access name, listening on SCAN VIP
  • Listener - Node listener listening on the Node VIP
  • Services - Used for monitoring and managing services
  • ONS - Oracle Notification Service
  • eONS - Enhanced Oracle Notification Service
  • GSD - For 9i backward compatibility
  • GNS (optional) - Grid Naming Service - Performs name resolution

Top image shows the various levels more clearly:


Important Log Locations

Clusterware daemon logs are all under <GRID_HOME>/log/<nodename>. Structure under <GRID_HOME>/log/<nodename>:

alert<NODENAME>.log - look here first for most clusterware issues
./admin:
./agent:
./agent/crsd:
./agent/crsd/oraagent_oracle:
./agent/crsd/ora_oc4j_type_oracle:
./agent/crsd/orarootagent_root:
./agent/ohasd:
./agent/ohasd/oraagent_oracle:
./agent/ohasd/oracssdagent_root:
./agent/ohasd/oracssdmonitor_root:
./agent/ohasd/orarootagent_root:
./client:
./crsd:
./cssd:
./ctssd:
./diskmon:
./evmd:
./gipcd:
./gnsd:
./gpnpd:
./mdnsd:
./ohasd:
./racg:
./racg/racgeut:
./racg/racgevtf:
./racg/racgmain:
./srvm:

The cfgtoollogs dir under <GRID_HOME> and $ORACLE_BASE contains other important logfiles. Specifically for rootcrs.pl and configuration assistants like ASMCA, etc...

ASM logs live under $ORACLE_BASE/diag/asm/+asm/<ASM Instance Name>/trace

The diagcollection.pl script under <GRID_HOME>/bin can be used to automatically collect important files for support. Run this as the root user.


11gR2 Clusterware and Grid Home - What You Need to Know [ID 1053147.1]

Friday, 24 September 2010

How to map ASMLIB disk to device name

How to map ASMLIB disk to device name

I had this question few times now, so here it goes.

When using ASMLIB to manage ASM disks, the device path info is not in gv$asm_disk.path.

If you are using ASMLIB Support Tools 2.1. and later (package oracleasm-support-2.1* and later) you can get that info by running 'oracleasm querydisk -p' as root:

# ls -l /dev/oracleasm/disks
total 0
brw-rw---- 1 grid asmadmin 8,  5 May  2 12:00 DISK1
brw-rw---- 1 grid asmadmin 8,  6 May  2 12:00 DISK2
brw-rw---- 1 grid asmadmin 8,  7 May  2 12:00 DISK3
...

# oracleasm querydisk -p DISK1
Disk "DISK1" is a valid ASM disk
/dev/sda5: LABEL="DISK1" TYPE="oracleasm"


Otherwise, that info can be obtained with a shell script like this:

#!/bin/bash
for asmlibdisk in `ls /dev/oracleasm/disks/*`
do
  echo "ASMLIB disk name: $asmlibdisk"
  asmdisk=`kfed read $asmlibdisk | grep dskname | tr -s ' '| cut -f2 -d' '`
  echo "ASM disk name: $asmdisk"
  majorminor=`ls -l $asmlibdisk | tr -s ' ' | cut -f5,6 -d' '`
  device=`ls -l /dev | tr -s ' ' | grep "$majorminor" | cut -f10 -d' '`
  echo "Device path: /dev/$device"
done


The script can be run as OS user that owns ASM or Grid Infrastructure home, i.e. it does not need to be run as privileged user. The only requirement it that 
kfed binary exists and that it is in the PATH.

If an ASMLIB disk was alrady deleted, it will not show up in /dev/oracleasm/disks. I can check for devices that are (or were) associated with ASM with a script like this:

#!/bin/bash
for device in `ls /dev/sd*`
do
  asmdisk=`kfed read $device | grep ORCLDISK | tr -s ' '| cut -f2 -d' ' | cut -c1-8`
  if [ "$asmdisk" = "ORCLDISK" ]
  then
    echo "Disk device $device may be an ASM disk"
  fi
done


This scripts takes a peek at sd devices in /dev, so in addition to kfed in the PATH, it needs to be run as privileged user. Of course you can look at /dev/dm*, /dev/mapper, etc or all devices in /dev, although that may not be a good idea.

Monday, 6 September 2010

ASMLIB vs EMCPowerPath

It's just my curious with this forum on oracle. when you check multi-path device withASMLib, perhaps you can confuse:

Example
(Assume EMC Storage):

# /etc/init.d/oracleasm querydisk -d /dev/emcpowera1
Device "/dev/emcpowera1" is marked an ASM disk with the label "ARC1"

# /etc/init.d/oracleasm querydisk -d ARC1
Disk "ARC1" is a valid ASM disk on device [
8, 38]

That show major,minor numbers -> 8, 38
So, check on device multi-path:

# ls -la /dev/emcpowera1
brw------- 1 root root
120, 1 Jun 11 17:01 /dev/emcpowera1

That made... confuse ;) because on /dev/emcpowera1 show major,minor numbers -> 120, 1

What's that mean? Because the ASMLib scanned on single path to show.
Really?

# ls -la /dev/sd* | grep 38
brw-rw---- 1 root disk
8, 38 Jun 11 17:01 /dev/sdc1

# powermt display dev=emcpowera
Pseudo name=emcpowera
Owner: default=SP A, current=SP A Array failover mode: 1
=================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
=================================================================
1 qla2xxx sdc SP A0 active alive 0 0
1 qla2xxx sde SP A1 active alive 0 0
1 qla2xxx sdg SP B0 active alive 0 0
1 qla2xxx sdi SP B1 active alive 0 0

That shows... single paths(sdc,sde,sdg,sdi) on /dev/emcpowera ;)

How we exclude scan on single paths? that's good question ... read on this site.

Excluding Single Path Disks

The system administrator configures ASMLib to ignore the single path disks. In the ASMLib configuration, he edits the ORACLEASM_SCANEXCLUDE variable to look like so:

ORACLEASM_SCANEXCLUDE="sdb sdc"


Here, the system administrator has been more specific. ASMLib should ignore exactly the disks /dev/sdb and /dev/sdc. It should not ignore other SCSI disks. While scanning, ASMLib will ignore those paths, only seeing the /dev/multipath disk. Once again, Oracle will use the multipath disk.

So, make it on example:

Edit /etc/sysconfig/oracleasm file.

# vi /etc/sysconfig/oracleasm
ORACLEASM_SCANEXCLUDE=""

change to =>
ORACLEASM_SCANEXCLUDE="sd"


And then restart:
#/etc/init.d/oracleasm restart


So, Check again:

# /etc/init.d/oracleasm querydisk /dev/emcpowera1
Device "/dev/emcpowera1" is marked an ASM disk with the label "
ARC1"

# /etc/init.d/oracleasm querydisk -d ARC1
Disk "ARC1" is a valid ASM disk on device [
120, 1]

# ls -al /dev/emcpowera1
brw------- 1 root root
120, 1 Jun 12 18:33 /dev/emcpowera1

On ASMLib show major,minor numbers -> 120, 1; that like on multi-path device (asmlib no scan on single paths) ;)

Metalink Note 309815.1


http://www.oracle.com/technetwork/topics/linux/multipath-097959.html______________________________________________________________

ASMLib Troubleshooting

I've noticed a few forum questions regarding ASM or indeed the OUI not being able to see devices that are managed via ASMLib. This prompted me to "upgrade" my knowledge of ASMLib and this blog is just a few extra tools for checking on your ASMLib devices.

By the way, anyone out there thinking ASMLib is not getting a whole lot of love from Oracle of late? The latest updates on the ASMLib page seems to be early 2007.

Anyway, first troubleshooting tip is a simple one, but make sure you have all three ASMLib rpms:


# rpm -qa |grep asm
oracleasm-support-2.0.3-1
oracleasmlib-2.0.2-1
oracleasm-2.6.9-22.ELsmp-2.0.3-1

You get odd behaviour without all of 'em. So what do each of these provide you:


# rpm -ql oracleasm-support
/etc/init.d/oracleasm
/etc/sysconfig/oracleasm
/usr/lib/oracleasm/oracleasm_debug_link
/usr/sbin/asmscan
/usr/sbin/asmtool

So the init.d oracleasm script is really where you configure disks and includes various options, like listing disks and querying. This is actually just a shell script that calls the executables asmscan and asmtool. There is a configuration file in /etc/sysconfig where you can change things like the pattern to scan for devices and you also have the ability to exclude devices using this configuration file. Excluding devices and explicitly setting the scanorder can be useful for multipath devices.

Once you have ran /etc/init.d/oracleasm configure you should see a new device:


# df -ha |grep asm
oracleasmfs 0 0 0 - /dev/oracleasm


# rpm -ql oracleasmlib
/opt/oracle/extapi
/opt/oracle/extapi/64
/opt/oracle/extapi/64/asm
/opt/oracle/extapi/64/asm/orcl
/opt/oracle/extapi/64/asm/orcl/1
/opt/oracle/extapi/64/asm/orcl/1/libasm.so
/usr/sbin/oracleasm-discover

So this rpm provides you with a library and an executable. Running the executable once you have configured devices is kinda nice:


# /usr/sbin/oracleasm-discover
Using ASMLib from /opt/oracle/extapi/64/asm/orcl/1/libasm.so
[ASM Library - Generic Linux, version 2.0.2 (KABI_V2)]
Discovered disk: ORCL:VOL1 [121634784 blocks (62277009408 bytes), maxio 512]
Discovered disk: ORCL:VOL2 [20971488 blocks (10737401856 bytes), maxio 512]
Discovered disk: ORCL:VOL3 [20971488 blocks (10737401856 bytes), maxio 512]
Discovered disk: ORCL:VOL4 [419424957 blocks (214745577984 bytes), maxio 512]

The final rpm is the kernel module:


# rpm -ql oracleasm-2.6.9-22.ELsmp
/lib/modules/2.6.9-22.ELsmp/kernel/drivers/addon/oracleasm
/lib/modules/2.6.9-22.ELsmp/kernel/drivers/addon/oracleasm/oracleasm.ko

You want to ensure that the oracleasm has been loaded by the kernel:


# /sbin/lsmod |grep oracleasm
oracleasm 55176 1

You can find information about the module with modinfo:


# /sbin/modinfo oracleasm
filename: /lib/modules/2.6.9-22.ELsmp/kernel/drivers/addon/oracleasm/oracleasm.ko
description: Kernel driver backing the Generic Linux ASM Library.
author: Joel Becker
version: 2.0.3
license: GPL
depends:
vermagic: 2.6.9-22.ELsmp SMP gcc-3.4

Make sure the devices you are trying to use are known by the kernel you can check in /dev/ or look in /proc/partitions. ASMLib likes to work on partitions, you can create this on a device using fdisk.

A list of devices marked by ASMLib is generated with:


# /etc/init.d/oracleasm listdisks
VOL1
VOL2
VOL3
VOL4

You can cross-reference this with what is in the /dev/oracleasm/disks directory:


# ls -l /dev/oracleasm/disks/
total 0
brw-rw---- 1 oracle oinstall 8, 17 Jun 24 09:13 VOL1
brw-rw---- 1 oracle oinstall 8, 49 Jun 24 09:13 VOL2
brw-rw---- 1 oracle oinstall 8, 65 Jun 24 09:13 VOL3
brw-rw---- 1 oracle oinstall 8, 97 Jun 24 09:13 VOL4

You can use querydisk to determine which device a particular ASMLib Volume corresponds to:


# /etc/init.d/oracleasm querydisk -d VOL1
Disk "VOL1" is a valid ASM disk on device [8, 17]

You can find out which devices this represents with the following:


# grep "8 17" /proc/partitions
8 17 60817392 sdb1

Still paranoid that this might not be your device, check the contents of the disk header:


# od -c /dev/sdb1 |head -10
0000000 001 202 001 001 200 036 - W 310
0000020
0000040 O R C L D I S K V O L 1
0000060
0000100 020 \n 001 003 V O L 1
0000120
0000140 D A T A 1
0000160
0000200 V O L 1
0000220

There is also a neat trick with blkid which shows the disk headers:


#./blkid|grep asm
/dev/sdb1: LABEL="VOL1" TYPE="oracleasm"
/dev/sdd1: LABEL="VOL2" TYPE="oracleasm"
/dev/sde1: LABEL="VOL3" TYPE="oracleasm"
/dev/sdg1: LABEL="VOL4" TYPE="oracleasm"
/dev/sdo1: LABEL="VOL1" TYPE="oracleasm"
/dev/sdq1: LABEL="VOL2" TYPE="oracleasm"
/dev/sdr1: LABEL="VOL3" TYPE="oracleasm"
/dev/sdt1: LABEL="VOL4" TYPE="oracleasm"
/dev/emcpowerf1: LABEL="VOL4" TYPE="oracleasm"
/dev/emcpowerp1: LABEL="VOL3" TYPE="oracleasm"
/dev/emcpowero1: LABEL="VOL2" TYPE="oracleasm"
/dev/emcpowern1: LABEL="VOL1" TYPE="oracleasm"

You can see from the above, that I have multiple devices corresponding to the same physical device and I am using EMC Powerpath as the multipathing software.

Note not all versions of blkid (well it's actually the E2fsprogs version) pick up oracleasm as a type.

AS you can see there are various techniques to check what devices you have configured via ASMLib for using with your ASM instance!

Wednesday, 1 September 2010

11g RAC Administration and Maintenance Tasks and Utilities:


Checking CRS Status:


The below two commands are generally used to check the status of CRS. The first command lists the status of CRS
on the local node where as the other command shows the CRS status across all the nodes in Cluster.


crsctl check crs <<-- for the local node
crsctl check cluster
 <<-- for remote nodes in the cluster

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]# 

Checking Viability of CSS across nodes:

crsctl check cluster

For this command to run, CSS needs to be running on the local node. The "ONLINE" status for remote node says that CSS is running on that node.
When CSS is down on the remote node, the status of "OFFLINE" is displayed for that node.


[root@node1-pub ~]# crsctl check cluster
node1-pub    ONLINE
node2-pub    ONLINE
 

Viewing Cluster name:


I use below command to get the name of Cluster. You can also dump the ocr and view the name from the dump file.


ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk '{print $3}'

[root@node1-pub ~]# ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk '{print $3}'
test-crs
[root@node1-pub ~]# 

OR

ocrconfig -export /tmp/ocr_exp.dat -s online
for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done



[root@node1-pub ~]# ocrconfig -export /tmp/ocr_exp.dat -s online
[root@node1-pub ~]# for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done
test-crs
[root@node1-pub ~]# 


OR

Oracle creates a directory with the same name as Cluster under the $ORA_CRS_HOME/cdata. you can get the cluster name from this directory as well.

[root@node1-pub ~]# ls /u01/app/crs/cdata
localhost  
test-crs

Viewing No. Of Nodes configured in Cluster:


The below command can be used to find out the number of nodes registered into the cluster.
It also displays the node's Public name, Private name and Virtual name along with their numbers.


olsnodes -n -p -i

[root@node1-pub ~]# olsnodes -n -p -i 
node1-pub       1       node1-prv       node1-vip
node2-pub       2       node2-prv       node2-vip

Viewing Votedisk Information:


The below command is used to view the no. of Votedisks configured in the Cluster.


crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
 0.     0    /u02/ocfs2/vote/VDFile_0
 1.     0    /u02/ocfs2/vote/VDFile_1
 2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]# 

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]# 

Viewing OCR Disk Information:


The below command is used to view the no. of OCR files configured in the Cluster. It also displays the version of OCR
as well as storage space information. You can only have 2 OCR files at max.


ocrcheck


[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3848
         Available space (kbytes) :     258272
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
                                    Device/File integrity check succeeded
 
         Cluster registry integrity check succeeded
 

Various Timeout Settings in Cluster:


Disktimeout: 
    Disk Latencies in seconds from node-to-Votedisk. Default Value is 200. (Disk IO)
Misscount: 
    Network Latencies in second from node-to-node (Interconnect). Default Value is 60 Sec (Linux, 10g R1,R2) and 30 Sec in (Linux,Unix,VMS,Windows ,, 11g) platform. (Network IO)
    Misscount < Disktimeout

NOTE: Do not change them without contacting Oracle Support. This may cause logical corruption to the Data.


IF
  (Disk IO Time > Disktimeout) OR (Network IO time > Misscount)

THEN
   REBOOT NODE
ELSE
   DO NOT REBOOT
END IF;


crsctl get css disktimeout
crsctl get css misscount
crsctl get css  reboottime

[
root@node1-pub ~]# crsctl get css disktimeout

200

[root@node1-pub ~]# crsctl get css misscount
Configuration parameter misscount is not defined.
 <<<<< This message indicates that the Misscount is not set maually and it is set to it's 
Default Value On Linux, it is default to 60 Second. If you want to chang it, you can do that as below. (Not recommended)


[root@node1-pub ~]# crsctl set css misscount 100
Configuration parameter misscount is now set to 100.
[root@node1-pub ~]# crsctl get css misscount
100


The below command sets the value of misscount back to its Default values:


 crsctl unset css misscount 

[root@node1-pub ~]# crsctl unset css misscount

[root@node1-pub ~]# crsctl get css  reboottime
3

Add/Remove OCR file in Cluster:


Removing OCR File

(1) Get the Existing OCR file information by running ocrcheck utility.

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       :  744414276
         
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <-- OCR
                                    Device/File integrity check succeeded
         
Device/File Name         : /u02/ocfs2/ocr/OCRfile_1 <-- OCR Mirror

                                    Device/File integrity check succeeded
 
         Cluster registry integrity check succeeded
 
(2) The First command removes the OCR mirror (/u02/ocfs2/ocr/OCRfile_1). If you want to remove the OCR
      file
 (
/u02/ocfs2/ocr/OCRfile_1) run the next command.

ocrconfig -replace ocrmirror
ocrconfig -replace ocr

[root@node1-pub ~]# ocrconfig -replace ocrmirror 
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <<-- OCR File
                                    Device/File integrity check succeeded
 
                                    Device/File not configured  <-- OCR Mirror not existed any more
 
         Cluster registry integrity check succeeded

Adding OCR

You need to add OCR or OCR Mirror file in a case where you want to move the existing OCR file location to the different Devices.
The below command add ths OCR mirror file if OCR file alread exists.

(1) Get the Current status of OCR:

 
[root@node1-pub ~]# ocrconfig -replace ocrmirror 
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <<-- OCR File
                                    Device/File integrity check succeeded
 
                                    Device/File not configured  <-- OCR Mirror does not exist
 
         Cluster registry integrity check succeeded

As You can see, I only have one OCR file but not the second file which is OCR Mirror.
So, I can add second OCR (OCR Mirror) as below command. 


ocrconfig -replace ocrmirror <File name>

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_1
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3852
         Available space (kbytes) :     258268
         ID                       :  744414276
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
                                    Device/File integrity check succeeded
         Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
                                    Device/File integrity check succeeded
 
         Cluster registry integrity check succeeded

You can have at most 2 OCR devices (OCR itself and its single Mirror) in a cluster. Adding extra Mirror gives you below error message
 
[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_2
PROT-21: Invalid parameter
[root@node1-pub ~]# 

Add/Remove Votedisk file in Cluster:


Adding Votedisk:

Get the existing Vote Disks associated into the cluster. To be safe, Bring crs cluster stack down on all the nodes 
but one on which you are going to add votedisk from.

(1) Stop CRS on all the nodes in cluster but one.


[root@node2-pub ~]# crsctl stop crs

(2) Get the list of Existing Vote Disks


crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
 0.     0    /u02/ocfs2/vote/VDFile_0
 1.     0    /u02/ocfs2/vote/VDFile_1
 2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).

(3) Backup the VoteDisk file

Backup the existing votedisks as below as oracle:

dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0

[root@node1-pub ~]# su - oracle
[oracle@node1-pub ~]$ dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
41024+0 records in
41024+0 records out
[oracle@node1-pub ~]$ 

(4) Add an Extra Votedisk into the Cluster: 

    If it is a OCFS, then touch the file as oracle. On raw devices, initialize the raw devices using "dd" command

touch /u02/ocfs2/vote/VDFile_3 <<-- as oracle
crsctl add css votedisk /u02/ocfs2/vote/VDFile_3 <<-- as oracle
crsctl query css votedisks


[root@node1-pub ~]# su - oracle
[oracle@node1-pub ~]$ touch /u02/ocfs2/vote/VDFile_3
[oracle@node1-pub ~]$ crsctl add css votedisk /u02/ocfs2/vote/VDFile_3
Now formatting voting disk: /u02/ocfs2/vote/VDFile_3.
Successful addition of voting disk /u02/ocfs2/vote/VDFile_3.

(5) Confirm that the file has been added successfully:

[root@node1-pub ~]# ls -l /u02/ocfs2/vote/VDFile_3
-rw-r-----  1 oracle oinstall 21004288 Oct  6 16:31 /u02/ocfs2/vote/VDFile_3
[root@node1-pub ~]# crsctl query css votedisks
Unknown parameter: votedisks
[root@node1-pub ~]# crsctl query css votedisk
 0.     0    /u02/ocfs2/vote/VDFile_0
 1.     0    /u02/ocfs2/vote/VDFile_1
 2.     0    /u02/ocfs2/vote/VDFile_2
 3.     0    /u02/ocfs2/vote/VDFile_3
Located 4 voting disk(s).
[root@node1-pub ~]# 

Removing Votedisk:

Removing Votedisk from the cluster is very simple. Tthe below command removes the given votedisk from cluster configuration.

crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3

[root@node1-pub ~]# crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3
Successful deletion of voting disk /u02/ocfs2/vote/VDFile_3.
[root@node1-pub ~]# 

[root@node1-pub ~]# crsctl query css votedisk
 0.     0    /u02/ocfs2/vote/VDFile_0
 1.     0    /u02/ocfs2/vote/VDFile_1
 2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]# 

Backing Up OCR


Oracle performs physical backup of OCR devices every 4 hours under the default backup direcory $ORA_CRS_HOME/cdata/<CLUSTER_NAME> 
and then it rolls that forward to Daily, weekly and monthly backup. You can get the backup information by executing below command. 

ocrconfig -showbackup

[root@node1-pub ~]# ocrconfig -showbackup 

node2-pub     2007/09/03 17:46:47     /u01/app/crs/cdata/test-crs/backup00.ocr
 
node2-pub     2007/09/03 13:46:45     /u01/app/crs/cdata/test-crs/backup01.ocr
 
node2-pub     2007/09/03 09:46:44     /u01/app/crs/cdata/test-crs/backup02.ocr
 
node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/day.ocr
 
node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/week.ocr
[root@node1-pub ~]# 

 
Manually backing up the OCR

ocrconfig -manualbackup <<--Physical Backup of OCR

The above command backs up OCR under the default Backup directory. You can export the contents of the OCR using below command (Logical backup).

ocrconfig -export /tmp/ocr_exp.dat -s online <<-- Logical Backup of OCR

Restoring OCR


The below command is used to restore the OCR from the physical backup. Shutdown CRS on all nodes.

ocrconfig -restore <file name>

Locate the avialable Backups

[root@node1-pub ~]# ocrconfig -showbackup
 
node2-pub     2007/09/03 17:46:47     /u01/app/crs/cdata/test-crs/backup00.ocr
 
node2-pub     2007/09/03 13:46:45     /u01/app/crs/cdata/test-crs/backup01.ocr
 
node2-pub     2007/09/03 09:46:44     /u01/app/crs/cdata/test-crs/backup02.ocr
 
node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/day.ocr
 
node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/week.ocr
 
node1-pub     2007/10/07 13:50:41     /u01/app/crs/cdata/test-crs/backup_20071007_135041.ocr


Perform Restore from previous Backup

[root@node2-pub ~]# ocrconfig -restore /u01/app/crs/cdata/test-crs/week.ocr

The above command restore the OCR from week old backup.
If you have logical backup of OCR (taken using export option), then You can import it with the below command.

ocrconfig -import /tmp/ocr_exp.dat

Restoring Votedisks


 

crsctl stop crs
crsctl query css votedisk
dd if=<backup of Votedisk> of=<Votedisk file> <<-- do this for all the votedisks
crsctl start crs

Changing Public and Virtual IP Address:



Current Config                                               Changed to

Node 1:

Public IP:       216.160.37.154                              192.168.10.11
VIP:             216.160.37.153                              192.168.10.111
subnet:          216.160.37.159                              192.168.10.0
Netmask:         255.255.255.248                             255.255.255.0
Interface used:  eth0                                        eth0
Hostname:        node1-pub.hingu.net                         node1-pub.hingu.net


Node 2:

Public IP:       216.160.37.156                              192.168.10.22
VIP:             216.160.37.157                              192.168.10.222
subnet:          216.160.37.159                              192.168.10.0
Netmask:         255.255.255.248                             255.255.255.0
Interface used:  eth0                                        eth0
Hostname:        node1-pub.hingu.net                         node2-pub.hingu.net

=======================================================================
(A)

Take the Services, Database, ASM Instances and nodeapps down on both the Nodes in Cluster. Also disable the nodeapps, asm and database instances to prevent them from restarting in case if this node gets rebooted during this process.

srvctl stop service -d test
srvctl stop database -d test
srvctl stop asm -n node1-pub
srvctl stop asm -n node2-pub

srvctl stop nodeapps -n node1-pub,node1-pub2
srvctl disable instance -d test -i test1,test2
srvctl disable asm -n node1-pub
srvctl disable asm -n node2-pub

srvctl disable nodeapps -n node1-pub
srvctl disable nodeapps -n node2-pub


(B)
Modify the /etc/hosts and/or DNS, ifcfg-eth0 (local node) with the new IP values
on All the Nodes

(C)
Restart the specific network interface in order to use the new IP.

ifconfig eth0 down
ifconfig eth0 up


Or, you can restart the network.
CAUTION: on NAS, restarting entire network may cause the node to be rebooted.

(D)
Update the OCR with the New Public IP. 
In case of public IP, you have to delete the interface first and then add it back with the new IP address. 

As oracle user, Issue the below command:

oifcfg delif -global eth0
oifcfg setif -global eth0/192.168.10.0:public


(E)
Update the OCR with the New Virtual IP.
Virtual IP is part of the nodeapps and so you can modify the nodeapps to update the Virtual IP information.

As privileged user (root), Issue the below commands:

srvctl modify nodeapps -n node1-pub -A 192.168.10.111/255.255.255.0/eth0 <-- for Node 1
srvctl modify nodeapps -n node1-pub -A 192.168.10.222/255.255.255.0/eth0 <-- for Node 2


(F)
Enable the nodeapps, ASM, database Instances for all the Nodes.

srvctl enable instance -d test -i test1,test2
srvctl enable asm -n node1-pub
srvctl enable asm -n node2-pub

srvctl enable nodeapps -n node1-pub
srvctl enable nodeapps -n node2-pub


(G)
Update the listener.ora file on each nodes with the correct IP addresses in case if it uses the IP address instead of the hostname.

(H)
Restart the Nodeapps, ASM and Database instance

srvctl start nodeapps -n node1-pub
srvctl start nodeapps -n node2-pub

srvctl start asm -n node1-pub
srvctl start asm -n node2-pub

srvctl start database -d test

=======================================================================