Hemesh RAC Edition

Wednesday, 29 May 2013

How To Setup ASM & ASMLIB On Native Linux Multipath Mapper disks

This note apply to Oracle Database using ASM and linux S.O.

After installed and configured the Multipath, follow bellow:

1. Download and Install ASMLIB :
http://www.oracle.com/technetwork/topics/linux/asmlib/index-101839.html

2. Check the disks :

ls -l /dev/mapper
brw-rw---- 1 root disk 253, 46 Jul 27 17:19 Prod_Orcl01
brw-rw---- 1 root disk 253, 67 Jul 27 17:59 Prod_Orcl02
brw-rw---- 1 root disk 253, 36 Jul 27 17:19 Prod_Orcl03
brw-rw---- 1 root disk 253, 58 Jul 27 18:00 Prod_Orcl04

3. Create the ASMLIB disks on mapper partitions as follow:

/etc/init.d/oracleasm createdisk DSKORA1  /dev/mapper/Prod_Orcl01
/etc/init.d/oracleasm createdisk DSKORA2  /dev/mapper/Prod_Orcl02

After create the disks, the ASM put a mark on the disks to know which are your own.

4. If this is a RAC configuration, then from each node execute:

/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks

5. Configure ASMLIB to use multipath (from each node on RAC environments):

By any path the ASMLIB can found the disks, but, the best path is using the multipath :

Modify in /etc/sysconfig/oracleasm :
ORACLEASM_SCANORDER=”dm”
ORACLEASM_SCANEXCLUDE=”sd”

note: The Oracle ASMLib configuration file is located at /etc/sysconfig/oracleasm. It is a link to file /etc/sysconfig/oracleasm-_dev_oracleasm.

Restart ASMLIB (from each node on RAC environments):
/etc/init.d/oracleasm stop
/etc/init.d/oracleasm start

6. Verify if the configuration is correct:

6.1 During the disk discovery, ASMLIB uses file /proc/partitions, see :

# cat /proc/partitions

   8     0  877264896 sda
   8     1     104391 sda1
   8     2  877157032 sda2
   8    16  209715200 sdb
   8    32  382730240 sdc
   8    48  379596800 sdd
   8    64    2097152 sde
   8    80    2097152 sdf
   8    96 1169776640 sdg
 253     6  209715200 dm-6
 253     7  382730240 dm-7
 253     8  379596800 dm-8
 253     9    2097152 dm-9
 253    10    2097152 dm-10

6.2 The ASMLIB mount disks at /dev/oracleasm/disks, see:

# ls -la /dev/oracleasm/disks

brw-rw---- 1 grid asmadmin 253,  6 Ago 16 16:33 DSKORA1
brw-rw---- 1 grid asmadmin 253,  7 Ago 16 16:33 DSKORA2

6.3 Check if major and minor of disks “dm” not “sd”, is the same in /proc/partitions and /dev/oracleasm/disks , see “253″ and “6″ bellow:

/proc/partitions 253     6  209715200 dm-6

/dev/oracleasm/disks
brw-rw---- 1 grid asmadmin 253,  6 Ago 16 16:33 DSKORA1

You can check using querydisk too :

# /etc/init.d/oracleasm querydisk -d DSKORA1
Disk "DSKORA1" is valid ASM disk on device [253, 6]

Good Luck !
======================================================================
Checking asm group with asm disk that have been created by ASMLib

bash> /etc/init.d/oracleasm listdisks
DISK1
DISK2
DISK3
DISK4
DISK5

SQL> conn / as sysasm
Connected.
SQL> col path format a15

SQL> select a.name,a.state,b.name,b.path from v$asm_diskgroup a, v$asm_disk b where a.group_number=b.group_number order by b.name;

NAME                           STATE       NAME                           PATH
------------------------------ ----------- ------------------------------ ---------------
DGROUP1                        MOUNTED     DISK1                          ORCL:DISK1
DGROUP1                        MOUNTED     DISK2                          ORCL:DISK2
DGROUP2                        MOUNTED     DISK3                          ORCL:DISK3
DGROUP2                        MOUNTED     DISK4                          ORCL:DISK4
DGROUP2                        MOUNTED     DISK5                          ORCL:DISK5

Thursday, 28 March 2013

Trace File Analyzer (TFA) for Oracle cluster databases

TFA Collector- The Preferred Tool for Automatic or ADHOC Diagnostic Gathering Across All Cluster Nodes [ID 1513912.1]

Trace File Analyzer (TFA) - new smart clusterwide diagnostic collection utility

Oracle RAC Assurance Team has released Trace File Analyzer (TFA)

TFA Collector - The Preferred Tool for Automatic or ADHOC Diagnostic Gathering Across All Cluster Nodes (Doc ID 1513912.1)

TFA is "smart" in the sense that it simplifies, automates and reduces the size of diagnostic collections clusterwide for Oracle Real Application Clusters. TFA only collects files that might be relevant based on the time of events. Collections can be triggered automatically or on demand. Files that are larger than a configurable size are "trimmed" to remove irrelevant data thereby making uploads to Support only as large as they need to be. TFA automatically collects data from all cluster nodes and packages them into a single archive for upload to Support. Collections can be limited to specific components such as RDBMS, ASM, Clusterware depending upon the problem.

TFA's design goal is to provide all the necessary standard diagnostics across the cluster that Support will need in order to quickly assist customers with resolution of problems and minimize delays due the kind of "back and forth" that is typical when necessary files are not provided for whatever reason when an SR is first opened. Installation is easy and configuration is minimal.

Wednesday, 7 November 2012

Troubleshoot Grid Infrastructure Startup Issue

Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]

11gR2 Clusterware and Grid Home - What You Need to Know [ID 1053147.1]

Wednesday, 12 September 2012

RACcheck - RAC Configuration Audit Tool [ID 1268927.1]

Thursday, 23 June 2011

Troubleshooting RAC Public Network Failure

Here are some steps I used to troubleshoot the failure of a public network used for SCAN in a 2-node RAC cluster.

Note: I used an aliased crsstat for the command: crsctl stat res –t

Check the status of the Clusterware resources. You can see that there are several resources that are offline below. The ones highlighted in red are the ones we are interested in. The resource for the local listener is now offline while the VIP has been failed over. You will also notice that all of the SCAN Listeners have been failed over to the surviving node.

grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >crsstat
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DG_DATA.dg
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
ora.DG_FLASH.dg
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
ora.LISTENER.lsnr
ONLINE OFFLINE tibora30
ONLINE ONLINE tibora31
ora.asm
ONLINE ONLINE tibora30 Started
ONLINE ONLINE tibora31 Started
ora.gsd
OFFLINE OFFLINE tibora30
OFFLINE OFFLINE tibora31
ora.net1.network
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
ora.ons
ONLINE ONLINE tibora30
ONLINE OFFLINE tibora31
ora.registry.acfs
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE tibora31
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE tibora31
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE tibora31
ora.cvu
1 ONLINE OFFLINE
ora.oc4j
1 ONLINE ONLINE tibora31
ora.scan1.vip
1 ONLINE ONLINE tibora30
ora.scan2.vip
1 ONLINE ONLINE tibora31
ora.scan3.vip
1 ONLINE ONLINE tibora31
ora.tibora30.vip
1 ONLINE INTERMEDIATE tibora31 FAILED OVER
ora.tibora31.vip
1 ONLINE ONLINE tibora31
ora.tibprd.db
1 ONLINE ONLINE tibora30 Open
2 ONLINE ONLINE tibora31 Open
ora.tibprd.tibprd_applog.svc
1 ONLINE ONLINE tibora31
ora.tibprd.tibprd_basic.svc
1 ONLINE ONLINE tibora31
ora.tibprd.tibprd_smap.svc
1 ONLINE ONLINE tibora31

Then look at the CRS logs under $GI_HOME/log/hostname/alerthostname.log for entries similar to the ones below:

2011-06-21 09:43:57.844

[/u01/11.2.0/grid/bin/orarootagent.bin(21168162)]CRS-5818:Aborted command 'check for resource: ora.net1.network tibora30 1' for resource 'ora.net1.network'. Details at (:CRSAGF00113:) {0:9:2} in /u01/11.2.0/grid/log/tibora30/agent/crsd/orarootagent_root/orarootagent_root.log.

2011-06-21 09:44:00.459

[/u01/11.2.0/grid/bin/oraagent.bin(22413372)]CRS-5016:Process "/u01/11.2.0/grid/opmn/bin/onsctli" spawned by agent "/u01/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/11.2.0/grid/log/tibora30/agent/crsd/oraagent_grid/oraagent_grid.log"

2011-06-21 09:44:01.112

[/u01/11.2.0/grid/bin/oraagent.bin(22413372)]CRS-5016:Process "/u01/11.2.0/grid/bin/lsnrctl" spawned by agent "/u01/11.2.0/grid/bin/oraagent.bin" for action "

check" failed: details at "(:CLSN00010:)" in "/u01/11.2.0/grid/log/tibora30/agent/crsd/oraagent_grid/oraagent_grid.log"

2011-06-21 09:44:01.180

[/u01/11.2.0/grid/bin/oraagent.bin(22413372)]CRS-5016:Process "/u01/11.2.0/grid/bin/lsnrctl" spawned by agent "/u01/11.2.0/grid/bin/oraagent.bin" for action "

check" failed: details at "(:CLSN00010:)" in "/u01/11.2.0/grid/log/tibora30/agent/crsd/oraagent_grid/oraagent_grid.log"

Check the status of the VIP on the node

grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl status vip -n tibora30
VIP tibora30-vip is enabled
VIP tibora30-vip is not running

Also check the status of the SCAN resources.

grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node tibora30
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node tibora31
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node tibora31

In this particular case the SCAN VIPs were running on both nodes. It turns out that on another cluster which also experienced a network failure the SCAN VIPs were all running on node.

First we need to start the local listener:

grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl start listener
Now check the status of the resources.
grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >crsstat
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DG_DATA.dg
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
ora.DG_FLASH.dg
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
ora.LISTENER.lsnr
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
ora.asm
ONLINE ONLINE tibora30 Started
ONLINE ONLINE tibora31 Started
ora.gsd
OFFLINE OFFLINE tibora30
OFFLINE OFFLINE tibora31
ora.net1.network
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
ora.ons
ONLINE ONLINE tibora30
ONLINE OFFLINE tibora31
ora.registry.acfs
ONLINE ONLINE tibora30
ONLINE ONLINE tibora31
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE tibora31
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE tibora31
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE tibora31
ora.cvu
1 ONLINE OFFLINE
ora.oc4j
1 ONLINE ONLINE tibora31
ora.scan1.vip
1 ONLINE ONLINE tibora30
ora.scan2.vip
1 ONLINE ONLINE tibora31
ora.scan3.vip
1 ONLINE ONLINE tibora31
ora.tibora30.vip
1 ONLINE ONLINE tibora30
ora.tibora31.vip
1 ONLINE ONLINE tibora31
ora.tibprd.db
1 ONLINE ONLINE tibora30 Open
2 ONLINE ONLINE tibora31 Open
ora.tibprd.tibprd_applog.svc
1 ONLINE ONLINE tibora31
ora.tibprd.tibprd_basic.svc
1 ONLINE ONLINE tibora31
ora.tibprd.tibprd_smap.svc
1 ONLINE ONLINE tibora31

Starting the local listener also caused the VIP to relocate to the previous node.

In another situation I had to manually relocate the VIP to the original node.

Next we need to check our nodeapps including ONS.

grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl status nodeapps

VIP tibora30-vip is enabled

VIP tibora30-vip is running on node: tibora30

VIP tibora31-vip is enabled

VIP tibora31-vip is running on node: tibora31

Network is enabled

Network is running on node: tibora30

Network is running on node: tibora31

GSD is disabled

GSD is not running on node: tibora30

GSD is not running on node: tibora31

ONS is enabled

ONS daemon is running on node: tibora30

ONS daemon is not running on node: tibora31

Here you can see that the ONS daemon is not running on the tibora31.
To start the ONS daemon issue the following command:

grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl start nodeapps -n tibora31

PRKO-2421 : Network resource is already started on node(s): tibora31

PRKO-2420 : VIP is already started on node(s): tibora31

Check the status of the nodeapps again

grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl status nodeapps
VIP tibora30-vip is enabled
VIP tibora30-vip is running on node: tibora30
VIP tibora31-vip is enabled
VIP tibora31-vip is running on node: tibora31
Network is enabled
Network is running on node: tibora30
Network is running on node: tibora31
GSD is disabled
GSD is not running on node: tibora30
GSD is not running on node: tibora31
ONS is enabled
ONS daemon is running on node: tibora30
ONS daemon is running on node: tibora31

The CVU resource can be started as follows:

grid@tibora30[+ASM1]-/u01/11.2.0/grid/log/tibora30 >srvctl start cvu -n tibora30

You can now verify connectivity to the database/services. I prefer to use SQL Developer to test connectivity to my databases with one connection for each service name.

All the SCAN listeners were running on a single node. At least one needed to be relocated to service the requests coming from the SCAN on that node.

grid@tibora31[+ASM2]-/home/grid >srvctl relocate scan_listener -i 1 -n tibora30
grid@tibora31[+ASM2]-/home/grid >srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node tibora30
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node tibora31
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node tibora31

Once the SCAN was relocated the application connected successfully.

Thursday, 17 March 2011

Stopping the Oracle RAC 10g Environment

The first step is to stop the Oracle instance. When the instance (and related services) is down, then bring down the ASM instance. Finally, shut down the node applications (Virtual IP, GSD, TNS Listener, and ONS).

$ export ORACLE_SID=orcl1 $ emctl stop dbconsole $ srvctl stop instance -d orcl -i orcl1 $ srvctl stop asm -n linux1 $ srvctl stop nodeapps -n linux1

Starting the Oracle RAC 10g Environment

The first step is to start the node applications (Virtual IP, GSD, TNS Listener, and ONS). When the node applications are successfully started, then bring up the ASM instance. Finally, bring up the Oracle instance (and related services) and the Enterprise Manager Database console.

$ export ORACLE_SID=orcl1 $ srvctl start nodeapps -n linux1 $ srvctl start asm -n linux1 $ srvctl start instance -d orcl -i orcl1 $ emctl start dbconsole

Start/Stop All Instances with SRVCTL

Start/stop all the instances and their enabled services. I have included this step just for fun as a way to bring down all instances!

$ srvctl start database -d orcl  $ srvctl stop database -d orcl

**************************************************************************************

A brief Summary of Steps to Start / Shutdown RAC/ASM Setup (Oracle 11g):

Shutdown

1. shutdown database
srvctl stop database -d  "that will shutdown ALL running instances of "
srvctl stop instance -d  -i  "that will shutdown only specified instance of , other running instances will continue to run"

2. shutdown ASM
srvctl stop asm -n  "run it for every rac node one-by-one"

3. shutdown nodeapps
srvctl stop nodeapps -n  "run it for every rac node one-by-one"

4. Shutdown crs processes
crsctl stop crs
Startup

1. crsctl start crs -> that will start crs processes (HA-Engine), NodeApps, then ASM then Database's Instances then HA-Services (if any)

Monday, 14 March 2011

Adding a Node to 11gR2 RAC

http://asanga-pradeep.blogspot.com/2011/02/adding-node-to-11gr2-rac.html