Ingress IT Solutions: August 2011

Wednesday, August 17, 2011

Oracle 11g Release 2: ASM Intelligent Data Placement (IDP)

Intelligent Data Placement is a new concept that is introduced in Oracle 11g Release 2.

As we know that the data stored in disk (HDD) is mostly in circular manner on disk platter. There is 50% of the performance change between the inner side and outer side of the platter. Hot files area always provide a 50% better performance than cold files area given in the below figure.

IDP feature always provides its best performance when ASM disks are whole disks.

We can mark ASM files as HOT/COLD by using below statement.

ALTER DISKGROUP DG_NAME MODIFY FILE 'FILE_NAME' ATTRIBUTES HOT/COLD;

We can always rebalance the files by placing a single copy in HOT and other copy on COLD as shown in below figure.

We can assign files to disk region by using template as well by below statements.

ALTER DISKGROUP DG_NAME ADD TEMPLATE DATAFILE_HOT ATTRIBUTES (HOT MIRRORHOT);

ALTER DISKGROUP DG_NAME ADD TEMPLATE DATAFILE_COLD ATTRIBUTES (COLD MIRRORCOLD);

To view the performance of any candidate files we can use below views.

V$ASM_FILE

V$ASM_DISK

V$ASM_DISK_STAT

And to view the performance benefit we can use AWR report that was generated before the changes and the one that is generated now after making changes.

Oracle 11g Release 2: CTSS and NTP Service

CTSS daemon service is a new service introduced in Oracle 11g Release 2 for the time synchronization between the nodes. As I have mentioned earlier few of the details about it in Oracle 11g Rel 2 New Features it is a replacement of NTP service.

CTSS runs in 2 ways:

Observer mode: whenever NTP is installed on the system, CTSS only observes

Active mode: time in cluster is synchronized against the CTSS master (node)

CTSS daemon tags first node started in the cluster as the master time manager. In any case, if NTP service is not available then other CTSS daemons will communicate with this master CTSS and validate the time. During this process if a time difference between cluster nodes is detected then it will adjust the time, same way as it is done by NTP daemon. If there is any minor time differences it will be reported in the alert.log file but if the time difference between the nodes is greater than 1000 msec then Oracle Clusterware will not startup in the non-master node and alert will be written in the alert.log file located under Clusterware home.

In such a case we need to manually set the time and start Oracle Clusterware node.

You can use CLUVFY utility as given below to view the details and the links that is between CTSS and NTP.

cluvfy comp clocksync

Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed

Checking daemon liveness...

Liveness check passed for "ntpd"
NTP daemon slewing option check passed
TP daemon's boot time configuration check for slewing option passed
NTP common Time Server Check started...
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Clock time offset check passed
Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful.

Please share your comments.

Oracle 11g Release 2: Cluster Verify Components (CLUVFY)

CVU is used for component verfication and can be used at any stage. It is used for multiple verfication like basic, free disk space, check Oracle clusterware stake. It is used to see any specific behavior of a cluster component like availability, integrity, etc.

Below are the options that are provided in CLUVFY utility in 11g Release 2:

Usage: cluvfy comp "name of the component from the below list"

nodereach : checks reachability between nodes

nodecon : checks node connectivity

cfs : checks CFS integrity

ssa : checks shared storage accessibility

space : checks space availability

sys : checks minimum system requirements

clu : checks cluster integrity

clumgr : checks cluster manager integrity

ocr : checks OCR integrity

olr : checks OLR integrity

ha : checks HA integrity

crs : checks CRS integrity

nodeapp : checks node applications existence

admprv : checks administrative privileges

peer : compares properties with peers

software : checks software distribution

asm : checks ASM integrity

acfs : checks ACFS integrity

gpnp : checks GPnP integrity

gns : checks GNS integrity

scan : checks SCAN configuration

ohasd : checks OHASD integrity

clocksync : checks Clock Synchronization

vdisk : check Voting Disk Udev settings

Friday, August 12, 2011

Oracle 11g Release 2: New Features

Here are some of the new features of Oracle 11g Release 2 that I have come across.

Oracle ASM & Oracle Clusterware Installation

With Oracle Grid Infrastructure 11g Release 2, Oracle ASM and Oracle clusterware can be installed into a single home directory which is always tagged as Grid Infrastructure home.

Oracle ASM and Oracle Clusterware Files

OCR and voting disks can be placed on Oracle ASM storage which enables a unified storage solution, storing all the data for the clusterware and the database without the need of any third party volume manager or cluster filesystems.

Cluster Time Synchronization Service

This service is newly introduced in Oracle 11g Rel 2 which ensures that there is a synchronization of the time in the cluster. If NTP daemon service is not found during cluster configuration then CTSS is configured to ensure time synchronization.

Fixup Scripts and Grid Infrastructure Checks

When Oracle Universal Installer detects that the minimum requirements for installation are not completed it creates a shell script programs and called it as fixup scripts. Once if OUI detects and incomplete task and is marked as fixable then we can easily fix the issue by generating a fixup script by a single click on "FIX & CHECK AGAIN"

Always this script is executed as root because there might be the system parameters that needs to be changed which cannot be changed with some different users.

Also with the help of cluster verify utility (CVU) we can generate a fixup scripts before moving ahead with the installation.

Improved Input/Output Fencing Processes

Oracle 11g Rel 2 had replaced oprocd and Hangcheck processes with the cluster synchronization service daemon agent and monitor to provide more accurate recognition of hangs and to avoid false termination.

Oracle Clusterware Out-Of-Place upgrade

This topic is already discussed on our blog at oracle 11g release 2 oracle clusterware out of place upgrade

SCAN for Simplified Client Access

This topic is already discussed on our blog at oracle 11g rel 2 scan single client

Voting Disk Backup Procedure

Voting disk was backed up using dd command in prior release but from Oracle 11g Rel 2 and later it is not supported

Backing up your voting disk is not required any more as it is backed up automatically in the OCR as a part of any configuration changes. Voting disk data is automatically restored to any adding voting disks.

This article is published with help of OTN Document. Hope this will help you out in your career.

Monday, August 1, 2011

Oracle 11g Release 2: Viewing Logical Content of OCR (OCRDUMP Utility)

The ocrdump utility is used to view the logical conent of the OCR. This is used mostly for troubleshooting. The ocrdump utility enables to view the logical information by writing all/limited amount of content to a file. If ocrdump command is executed without any parameter the default file OCRDUMPFILE will be created to the current directory.

As I have stated above that it writes all/limited amount of content because the information contained within the OCR is organized by the keys that are associated with privileges. Below are the statement to get the content of OCR in a viewing format

[root]$ ocrdump filename_full_result.txt
[grid]$ ocrdump filename_limited_result.txt

So if ocrdump is excuted with root we get all the output and if it is executed with grid we get limited output.

To check the changes in the OCR you can use below method.

Take the logical backup of OCR by ocrdump -backupfile week.ocr
After a week see the difference by executing ocrdump -stdout - packupfile week.ocr | diff - OCRDUMPFILE

Please let me know if you have any issues or need any more help.

Oracle 11g: Process that cause Node Reboot / Avoiding False Reboot

There are few processes that can evict nodes from the cluster or cause a node rebooting issue.

hangcheck-timer: This process is used to monitor for machine hangs and pauses
oclskd: This process is used by CSS to reboot a node based on requests from the other node with in the cluster.
ocssd: This process is used to monitor the internode's health.

But from Oracle 11g Release 2 hangcheck-timer is no longer needed.

To identify that which of the above process is causing node reboot we need to go through some log files

hangcheck-timer

/var/log/messages

oclskd

GRID_HOME/log/hostname/client/oclskd.log

ocssd

/var/log/messages
GRID_HOME/log/hostname/cssd/ocssd.log

Below are the few lines that are mentioned by these above processes at the time of reboot in the log.

hangcheck-timer

"Hangcheck: hangcheck is restarting the machine."

ocssd

"Oracle CSSD failure. Rebooting for cluster integrity"
There might some more information similar to "Begin Dump" and "End Dump" just before the rebooting.
If you dont find any identification about the node rebooting then you might need to enable tracing and additional debugging.

There might be the case that sometimes there is a false reboot due to low MARGIN settings and heavy CPU load or a scheduler bug.

There have been wide variations in scheduling latencies observed across operating systems and versions of operating systems that can result us with false rebooting.

Increase the value of diagwait if it is set to too low and false rebooting is occured by using the below command

crsctl set css diagwait -force

If hangcheck-timer is used and found as a cause then increase the value of hangcheck_margin parameter of the hangcheck-timer module. To validate the values of diagwait or hangcheck_margin you can use the below method.

CSS misscount > (TIMEOUT + MARGIN)

To get the current css misscount please use crsctl get css misscount

CSS misscount > diagwait
CSS misscount > hangcheck_tick + hangcheck_margin

Note: It is recommended not to change the value of misscount and disk timeout until and unless it is not recommended by Oracle Support.

Pages