Hemesh RAC Edition

Wednesday, 1 September 2010

CSS Timeout Computation in Oracle Clusterware

Applies to:

Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.1.0.6
Oracle Server - Standard Edition - Version: 10.1.0.2 to 11.1.0.6
Information in this document applies to any platform.
Oracle Clusterware

Purpose

The purpose of this Note is to document default CSS misscount timeout calculations in 10g Release 1, 10g Release 2 , 11g and higher versions.

Scope and Application

Define misscount parameter
Define the default calculations for the misscount parameter
Describe Cluster Synchronization Service (CSS) heartbeats and their interrelationship
Describe the cases where the default calculation may be too sensitive

CSS Timeout Computation in Oracle Clusterware

MISSCOUNT DEFINITION AND DEFAULT VALUES
The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to

evict the node. The following are the default values for the misscount parameter and their respective versions when using Oracle Clusterware* in seconds:

OS	10g (R1 &R2)	11g
Linux	60	30
Unix	30	30
VMS	30	30
Windows	30	30

*CSS misscount default value when using vendor (non-Oracle) clusterware is 600 seconds. This is to allow the vendor clusterware ample time to resolve any possible split brain scenarios.

On AIX platforms with HACMP starting with 10.2.0.3 BP#1, the misscount is 30. This is documented in Note 551658.1

CSS HEARTBEAT MECHANISMS AND THEIR INTERRELATIONSHIP
The synchronization services component (CSS) of the Oracle Clusterware maintains two heartbeat mechanisms 1.) the disk heartbeat to the voting device and 2.) the network heartbeat across the interconnect which establish and confirm valid node membership in the cluster. Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting. There has been some variation in this relationship
between versions as described below:

9.x.x.x	NOTE, MISSCOUNT WAS A DIFFERENT ENTITY IN THIS RELEASE
10.1.0.2	No one should be on this version
10.1.0.3	DTO = MC - 15 seconds
10.1.0.4	DTO = MC - 15 seconds
10.1.0.4+Unpublished Bug 3306964	DTO = MC - 3 seconds
10.1.0.4 with CRS II Merge patch	DTO =Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration
10.1.0.5	IOT = MC - 3 seconds
10.2.0.1 +Fix for unpublished Bug 4896338	IOT=Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration
10.2.0.2	Same as above (10.2.0.1 with Patch Bug:4896338
10.1 - 11.1	During node join and leave (reconfiguration) in a cluster we need to reconfigure, in that particular case we use Short Disk TimeOut (SDTO) which is in all versions SDTO = MC – reboottime (usually 3 seconds)

Misscount drives cluster membership reconfigurations and directly effects the availability of the cluster. In most cases, the default settings for MC should be acceptable. Modifying the default value of misscount not only influences the timeout interval for the i/o to the voting disk, but also influences the tolerance for missed network heartbeats across the interconnect.

LONG LATENCIES TO THE VOTING DISKS
If I/O latencies to the voting disk are greater than the default DTO calculations noted above, the cluster may experience CSS node evictions depending on (a)the Oracle Clusterware (CRS) version, (b)whether merge patch has been applied and (c)the state of the Cluster. More details on this are covered in the section "Change in Behavior with CRS Merge PATCH (4896338 on 10.2.0.1)".

These latencies can be attributed to any number of problems in the i/o subsystem or problems with any component in the i/o path. The following is a non exhaustive list of reported problems which resulted in CSS node eviction due to latencies to the voting disk longer than the default Oracle Clusterware i/o timeout value(DTO):

QLogic HBA cards with a Link Down Timeout greater than the default misscount.
Bad cables to the SAN/storage array that effect i/o latencies
SAN switch (like Brocade) failover latency greater than the default misscount
EMC Clariion Array when trespassing the SP to the backup SP greater than default misscount
EMC PowerPath path error detection and I/O repost and redirect greater than default misscount
NetApp Cluster (CFO) failover latency greater than default misscount
Sustained high CPU load which effects the CSSD disk ping monitoring thread
Poor SAN network configuration that creates latencies in the I/O path.

The most common problems relate to multi-path IO software drivers, and the reconfiguration times resulting from a failure in the IO path. Hardware and (re)configuration issues that introduce these latencies should be corrected. Incompatible failover times with underlying OS, network or storage hardware or software may be addressed given a complete understanding of the considerations listed below.

Misscount should NOT be modified to workaround the above-mentioned issues. Oracle support recommends that you apply the latest patchset which changes the CSS behaviour.More details covered in next section.

Change in Behavior with Bug:4896338 applied on top of 10.2.0.1
Starting with 10.2.0.1+Bug:4896338, CSS will not evict the node from the cluster due to (DTO) I/O to voting disk taking more than misscount seconds unless it is during the initial cluster formation or slightly before reconfiguration.
So if we have a N number of nodes in a cluster and one of the nodes takes more than misscount seconds to access the voting disk, the node will not be evicted as long as the access to the voting disk is completed within disktimeout seconds. Consequently with this patch, there is no need to increase the misscount at all.

Additionally this merge patch introduces Disktimeout which is the amount of time that a lack of disk ping to voting disk(s) will be tolerated.

Note: applying the patch will not change your value for Misscount.

The table below explains in the conditions under which the eviction will occur

Network Ping	Disk Ping	Reboot
Completes within misscount seconds	Completes within Misscount seconds	N
Completes within Misscount seconds	Takes more than misscount seconds but less than Disktimeout seconds	N
Completes within Misscount seconds	Takes more than Disktimeout seconds	Y
Takes more than Misscount Seconds	Completes within Misscount seconds	Y

* By default Misscount is less than Disktimeout seconds

CONSIDERATIONS WHEN CHANGING MISSCOUNT FROM THE DEFAULT VALUE

Customers drive SLA and cluster availability. The customer ultimately defines Service Levels and availability for the cluster. Before recommending any change to misscount, the full impact of that change should be described and the impact to cluster availability measured.
Customers may have timeout and retry logic in their applications. The impact of delaying reconfiguration may cause 'artificial' timeouts of the application, reconnect failures and subsequent logon storms.
Misscount timeout values are version dependent and are subject to change. As we have seen, misscount calculations are variable between releases and between versions within a release. Creating a false dependency on misscount calculation in one version may not be appropriate for later versions.
Internal I/O timeout interval (DTO) algorithms may change in later releases as stated above, there exists a direct relationship between the internal I/O timeout interval and misscount. This relationship is subject to change in later releases.
An increase in misscount to compensate for i/o latencies directly effects reconfiguration times for network failures. The network heartbeat is the primary indicator of connectivity within the cluster. Misscount is the tolerance level of missed 'check ins' that trigger cluster reconfiguration. Increasing misscount will prolong the time to take corrective action in the event of network failure or other anomalies effecting the availability of a node in the cluster. This directly effects cluster availability.
Changing misscount to workaround voting disk latencies will need to be corrected when the underlying disk latency is corrected, misscount needs to be set back to the default The customer needs to document the change and set the parameter back to the default when the underlying storage I/O latency is resolved.
Do not change default misscount values if you are running Vendor Clusterware along with Oracle Clusterware. The default values for misscount should not be changed when using vendor clusterware. Modifying misscount in this environment may cause clusterwide outages and potential corruptions.
Changing misscount parameter incurs a clusterwide outage. As note below, the customer will need to schedule
a clusterwide outage to make this change.
Changing misscount should not be used to compensate for poor configurations or faulty hardware
Cluster and RDBMS availability are directly effected by high misscount settings.
In case of stretched clusters and stretched storage systems and a site failure where we loose one storage and N number of nodes we go into a reconfiguration state and then we revert to ShortDiskTimeOut value as internal I/O timeout for the votings. Several cases are known with stretched clusters where when a site failure happen the storage failover cannot complete within SDTO. If the I/O to the votings is blocked more than SDTO the result is node evictions on the surviving side.

To Change MISSCOUNT back to default Please refer to Note:284752.1
THIS IS THE ONLY SUPPORTED METHOD. NOT FOLLOWING THIS METHOD RISKS EVICTIONS AND/OR CORRUPTING THE OCR

10g Release 2 MIRRORED VOTING DISKS AND VENDOR MULTIPATHING SOLUTIONS
Oracle RAC 10g Release 2 allows for multiple voting disks so that the customer does not have to rely on a multipathing solution from a storage vendor. You can have n voting disks (up to 31) where n = m*2+1 where m is the number of disk failures you want to survive. Oracle recommends each voting disk to be on a separate physical disk.

References

NOTE:284752.1 - 10g RAC: Steps To Increase CSS Misscount, Reboottime and Disktimeout
NOTE:559365.1 - Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

Tuesday, 31 August 2010

Cifs "mount error 11 = Resource temporarily unavailable"

When I try mounting a windows 2000/XP share with cifs I get an error about the resource not being available:

Code:

mount -t cifs //hostname/share /mnt/temp -o username=someuser,password=somepassword

mount error 11 = Resource temporarily unavailable

Refer to the mount.cifs(8) manual page (e.g.man mount.cifs)

Other times this works normally and mounts the samba/windows share.
But at the time the cifs command returns "Resource temporarily unavailable" I immediately repeat the command with smbfs instead of cifs:

Code:

mount -t smbfs //hostname/share /mnt/temp -o username=someuser,password=somepassword

and it works as normal and I can browse around the filesystem!

I then immediately try the cifs line again and it gives me the same "Resource temporarily unavailable error"!!!!!!!!!!!!!!

What the hell is wrong with cifs? I thought it was supposed to be better than smbfs...

***************************OR******************************************************************************************

Greetings,

I had a similar problem with error 11 with a linux to linux share. I ran a yum update on both machines and that resolved the problem.

Hope this helps.

Thursday, 12 August 2010

LOG Files In RAC Environment

The Cluster Ready Services Daemon (crsd) Log Files

Log files for the CRSD process (crsd) can be found in the following directories:

CRS home/log/hostname/crsd

The Oracle HA Services Daemon (ohasd) Log Files(11g)

Log files for the ohasd process (ohasd) can be found in the following directories:

CRS home/log/hostname/ohasd

Oracle Cluster Registry (OCR) Log Files

The Oracle Cluster Registry (OCR) records log information in the following location:

CRS Home/log/hostname/client

Cluster Synchronization Services (CSS) Log Files

You can find CSS information that the OCSSD generates in log files in the following locations:

CRS Home/log/hostname/cssd

Event Manager (EVM) Log Files

Event Manager (EVM) information generated by evmd is recorded in log files in the following locations:

CRS Home/log/hostname/evmd

RACG Log Files

The Oracle RAC high availability trace files are located in the following two locations:

CRS home/log/hostname/racg
$ORACLE_HOME/log/hostname/racg

Core files are in the sub-directories of the log directories. Each RACG executable has a sub-directory assigned exclusively for that executable. The name of the RACG executable sub-directory is the same as the name of the executable.

VIP Log Files

You can find VIP related log files under following location :

ORA_CRS_HOME/log/nodename/racg

Monday, 9 August 2010

Disable/Enable Automatic startup Oracle HAS

On 11gR2, Oracle Clusterware consists of two separate stacks: an upper stack anchored by the Cluster Ready Services (CRS) daemon (crsd) and a lower stack anchored by the Oracle High Availability Services daemon (ohasd).

So.. How to disable/enable Oracle HAS.
Use the crsctl disable has command to disable automatic startup of the Oracle High Availability Services stack when the server boots up.

# crsctl config has
CRS-4622: Oracle High Availability Services autostart is enabled.

How to know Oracle HAS is enabled(if doesn't use "crsctl config has")

# cat /etc/oracle/scls_scr/rhel5-test/root/ohasdstr
enable

# crsctl disable has
CRS-4621: Oracle High Availability Services autostart is disabled.

# crsctl config has
CRS-4621: Oracle High Availability Services autostart is disabled.

# cat /etc/oracle/scls_scr/rhel5-test/root/ohasdstr
disable

Use the crsctl enable has command to enable automatic startup of the Oracle High Availability Services stack when the server boots up.

# crsctl enable has
CRS-4622: Oracle High Availability Services autostart is enabled.

# cat /etc/oracle/scls_scr/rhel5-test/root/ohasdstr
enable

If We just check HAS Disable/Enable status, that uses "crsctl config has" command, it's easier than "ohasdstr" file checking.

How about "crsctl disable/enable crs" on 11gR2?
They disable/enable automatic startup of Oracle HAS.

I posted "check enable/disable the startup of CRS".. that show Oracle Clusterware version <= 11gR1, we can check from "crsstart" file. On 11gR2, crsstart file is not used ???

Use the crsctl disable crs command to prevent the automatic startup of Oracle High Availability Services when the server boots.

Use the crsctl enable crs command to enable automatic startup of Oracle High Availability Services when the server boots.

# crsctl config has
CRS-4622: Oracle High Availability Services autostart is enabled.

# crsctl config crs
CRS-4622: Oracle High Availability Services autostart is enabled.

# ls -ltr /etc/oracle/scls_scr/rhel5-test/root/
-rw-r--r-- 1 root root 7 Sep 7 00:56 crsstart
-rw-r--r-- 1 root oinstall 5 Nov 22 17:04 ohasdrun
-rw-r--r-- 1 root oinstall 7 Nov 22 17:10 ohasdstr

# cat /etc/oracle/scls_scr/rhel5-test/root/crsstart
enable

# cat /etc/oracle/scls_scr/rhel5-test/root/ohasdstr
enable

# crsctl disable crs
CRS-4621: Oracle High Availability Services autostart is disabled.

# crsctl config crs
CRS-4621: Oracle High Availability Services autostart is disabled.

# crsctl config has
CRS-4621: Oracle High Availability Services autostart is disabled.

# ls -ltr /etc/oracle/scls_scr/rhel5-test/root/
-rw-r--r-- 1 root root 7 Sep 7 00:56 crsstart
-rw-r--r-- 1 root oinstall 5 Nov 22 17:04 ohasdrun
-rw-r--r-- 1 root oinstall 8 Nov 22 17:12 ohasdstr

# cat /etc/oracle/scls_scr/rhel5-test/root/crsstart
enable

# cat /etc/oracle/scls_scr/rhel5-test/root/ohasdstr
disable

However, check CRSCTL Utility Reference

Check enable/disable the startup of Oracle Clusterware Daemons

We can use crsctl

commands as follows to enable and disable the startup of the Oracle Clusterware daemons. Run the following command to enable startup for all of the Oracle Clusterware daemons:

crsctl enable crs

Run the following command to disable the startup of all of the Oracle Clusterware daemons:

crsctl disable crs

Indeed... Oracle Document told me like that. But How can I know enable/disable startup status for all of the Oracle Clusterware daemons now?

I know... someone don't need to know because they can run "crsctl enable/disable crs" again and again.

Oracle has scls_scr directory at /etc/oracle path. We can check about enable/disable startup status of the Oracle Clusterware daemons at crsstart file in /etc/oracle/scls_scr//root/ path.

really? example:

root@rac1# cat /etc/oracle/scls_scr/rac1/root/crsstart

enable

root@rac1# cd CRS_HOME/bin

root@rac1# ./crsctl disable crs

root@rac1# cat /etc/oracle/scls_scr/rac1/root/crsstart

disable

after disabled by "crsctl disable crs", crsstart file was changed be "disable"

root@rac1# ./crsctl enable crs

root@rac1# cat /etc/oracle/scls_scr/rac1/root/crsstart

enable

after enabled by "crsctl enable crs", crsstart file was changed be "enable"

that justs idea ;)