Wednesday, August 17, 2011

Oracle 11g Release 2: ASM Intelligent Data Placement (IDP)

Intelligent Data Placement is a new concept that is introduced in Oracle 11g Release 2.

As we know that the data stored in disk (HDD) is mostly in circular manner on disk platter. There is 50% of the performance change between the inner side and outer side of the platter. Hot files area always provide a 50% better performance than cold files area given in the below figure.

IDP feature always provides its best performance when ASM disks are whole disks.

We can mark ASM files as HOT/COLD by using below statement.
  • ALTER DISKGROUP DG_NAME MODIFY FILE 'FILE_NAME' ATTRIBUTES HOT/COLD;

We can always rebalance the files by placing a single copy in HOT and other copy on COLD as shown in below figure.



We can assign files to disk region by using template as well by below statements.
  • ALTER DISKGROUP DG_NAME ADD TEMPLATE DATAFILE_HOT ATTRIBUTES (HOT MIRRORHOT);

  • ALTER DISKGROUP DG_NAME ADD TEMPLATE DATAFILE_COLD ATTRIBUTES (COLD MIRRORCOLD);

To view the performance of any candidate files we can use below views.
  • V$ASM_FILE

  • V$ASM_DISK

  • V$ASM_DISK_STAT

And to view the performance benefit we can use AWR report that was generated before the changes and the one that is generated now after making changes.

Oracle 11g Release 2: CTSS and NTP Service

CTSS daemon service is a new service introduced in Oracle 11g Release 2 for the time synchronization between the nodes. As I have mentioned earlier few of the details about it in Oracle 11g Rel 2 New Features it is a replacement of NTP service.

CTSS runs in 2 ways:
  • Observer mode: whenever NTP is installed on the system, CTSS only observes

  • Active mode: time in cluster is synchronized against the CTSS master (node)

CTSS daemon tags first node started in the cluster as the master time manager. In any case, if NTP service is not available then other CTSS daemons will communicate with this master CTSS and validate the time. During this process if a time difference between cluster nodes is detected then it will adjust the time, same way as it is done by NTP daemon. If there is any minor time differences it will be reported in the alert.log file but if the time difference between the nodes is greater than 1000 msec then Oracle Clusterware will not startup in the non-master node and alert will be written in the alert.log file located under Clusterware home.

In such a case we need to manually set the time and start Oracle Clusterware node.

You can use CLUVFY utility as given below to view the details and the links that is between CTSS and NTP.

cluvfy comp clocksync

Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
CTSS resource check passed
Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed
Checking daemon liveness...
Liveness check passed for "ntpd"
NTP daemon slewing option check passed
TP daemon's boot time configuration check for slewing option passed
NTP common Time Server Check started...
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Clock time offset check passed
Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful.

 
Please share your comments.

Oracle 11g Release 2: Cluster Verify Components (CLUVFY)

CVU is used for component verfication and can be used at any stage. It is used for multiple verfication like basic, free disk space, check Oracle clusterware stake. It is used to see any specific behavior of a cluster component like availability, integrity, etc.

Below are the options that are provided in CLUVFY utility in 11g Release 2:

Usage: cluvfy comp "name of the component from the below list"
  • nodereach : checks reachability between nodes

  • nodecon : checks node connectivity

  • cfs : checks CFS integrity

  • ssa : checks shared storage accessibility

  • space : checks space availability

  • sys : checks minimum system requirements

  • clu : checks cluster integrity

  • clumgr : checks cluster manager integrity

  • ocr : checks OCR integrity

  • olr : checks OLR integrity

  • ha : checks HA integrity

  • crs : checks CRS integrity

  • nodeapp : checks node applications existence

  • admprv : checks administrative privileges

  • peer : compares properties with peers

  • software : checks software distribution

  • asm : checks ASM integrity

  • acfs : checks ACFS integrity

  • gpnp : checks GPnP integrity

  • gns : checks GNS integrity

  • scan : checks SCAN configuration

  • ohasd : checks OHASD integrity

  • clocksync : checks Clock Synchronization

  • vdisk : check Voting Disk Udev settings 

Friday, August 12, 2011

Oracle 11g Release 2: New Features

Here are some of the new features of Oracle 11g Release 2 that I have come across.
  • Oracle ASM & Oracle Clusterware Installation

    • With Oracle Grid Infrastructure 11g Release 2, Oracle ASM and Oracle clusterware can be installed into a single home directory which is always tagged as Grid Infrastructure home.

  • Oracle ASM and Oracle Clusterware Files

    • OCR and voting disks can be placed on Oracle ASM storage which enables a unified storage solution, storing all the data for the clusterware and the database without the need of any third party volume manager or cluster filesystems.

  • Cluster Time Synchronization Service

    • This service is newly introduced in Oracle 11g Rel 2 which ensures that there is a synchronization of the time in the cluster. If NTP daemon service is not found during cluster configuration then CTSS is configured to ensure time synchronization.

  • Fixup Scripts and Grid Infrastructure Checks

    • When Oracle Universal Installer detects that the minimum requirements for installation are not completed it creates a shell script programs and called it as fixup scripts. Once if OUI detects and incomplete task and is marked as fixable then we can easily fix the issue by generating a fixup script by a single click on "FIX & CHECK AGAIN"

    • Always this script is executed as root because there might be the system parameters that needs to be changed which cannot be changed with some different users.

    • Also with the help of cluster verify utility (CVU) we can generate a fixup scripts before moving ahead with the installation.

  • Improved Input/Output Fencing Processes

    • Oracle 11g Rel 2 had replaced oprocd and Hangcheck processes with the cluster synchronization service daemon agent and monitor to provide more accurate recognition of hangs and to avoid false termination.

  • Oracle Clusterware Out-Of-Place upgrade

  • SCAN for Simplified Client Access

  • Voting Disk Backup Procedure

    • Voting disk was backed up using dd command in prior release but from Oracle 11g Rel 2 and later it is not supported

    • Backing up your voting disk is not required any more as it is backed up automatically in the OCR as a part of any configuration changes. Voting disk data is automatically restored to any adding voting disks.

This article is published with help of OTN Document. Hope this will help you out in your career.




Monday, August 1, 2011

Oracle 11g Release 2: Viewing Logical Content of OCR (OCRDUMP Utility)

The ocrdump utility is used to view the logical conent of the OCR. This is used mostly for troubleshooting. The ocrdump utility enables to view the logical information by writing all/limited amount of content to a file. If ocrdump command is executed without any parameter the default file OCRDUMPFILE will be created to the current directory.

As I have stated above that it writes all/limited amount of content because the information contained within the OCR is organized by the keys that are associated with privileges. Below are the statement to get the content of OCR in a viewing format
  • [root]$ ocrdump filename_full_result.txt
  • [grid]$ ocrdump filename_limited_result.txt
So if ocrdump is excuted with root we get all the output and if it is executed with grid we get limited output.

To check the changes in the OCR you can use below method.
  1. Take the logical backup of OCR by ocrdump -backupfile week.ocr
  2. After a week see the difference by executing ocrdump -stdout - packupfile week.ocr | diff - OCRDUMPFILE
Please let me know if you have any issues or need any more help.

Oracle 11g: Process that cause Node Reboot / Avoiding False Reboot

There are few processes that can evict nodes from the cluster or cause a node rebooting issue.
  • hangcheck-timer: This process is used to monitor for machine hangs and pauses
  • oclskd: This process is used by CSS to reboot a node based on requests from the other node with in the cluster.
  • ocssd: This process is used to monitor the internode's health.
But from Oracle 11g Release 2 hangcheck-timer is no longer needed.

To identify that which of the above process is causing node reboot we need to go through some log files
  • hangcheck-timer
    • /var/log/messages

  • oclskd
    • GRID_HOME/log/hostname/client/oclskd.log
  • ocssd
    • /var/log/messages
    • GRID_HOME/log/hostname/cssd/ocssd.log 
Below are the few lines that are mentioned by these above processes at the time of reboot in the log.
  • hangcheck-timer
    • "Hangcheck: hangcheck is restarting the machine."
  • ocssd
    • "Oracle CSSD failure. Rebooting for cluster integrity"
    • There might some more information similar to "Begin Dump" and "End Dump" just before the rebooting.
    • If you dont find any identification about the node rebooting then you might need to enable tracing and additional debugging.
There might be the case that sometimes there is a false reboot due to low MARGIN settings and heavy CPU load or a scheduler bug.

There have been wide variations in scheduling latencies observed across operating systems and versions of operating systems that can result us with false rebooting.

Increase the value of diagwait if it is set to too low and false rebooting is occured by using the below command
  • crsctl set css diagwait -force
If hangcheck-timer is used and found as a cause then increase the value of hangcheck_margin parameter of the hangcheck-timer module. To validate the values of diagwait or hangcheck_margin you can use the below method.
  • CSS misscount > (TIMEOUT + MARGIN)
    • To get the current css misscount please use crsctl get css misscount
  • CSS misscount > diagwait
  • CSS misscount > hangcheck_tick + hangcheck_margin
Note: It is recommended not to change the value of misscount and disk timeout until and unless it is not recommended by Oracle Support.