Wednesday, April 11, 2007

SOLVING ORACLE DATA BLOCK CORRUPTION

0. Abstract

Whenever you encounter the following Oracle error messages,

ORA-00600: Internal message code, arguments: [01578] [...] [...] [] [] [].
ORA-01578: Oracle data block corrupted (file ..., block ...).

you are faced with, what at first sight looks like, a very serious problem. Somewhere in your database a block of data has been corrupted (at the byte-level) in such a way that the Oracle server causes the above internal exception to be raised each time a SQL-statement accesses (reads or writes) that block.
This paper will describe the options that are available to you for solving the ORA-00600[01578] error. We will see that in most cases the above problem can be easily and quickly solved, and that in much fewer cases more effort and time will be neccessary to solve the problem.


1. Introduction

Having encountered the Oracle data block corruption, we must firstly investigate which database segment (name and type) the corrupted block is allocated to. Chances are that the block belongs either to an index or to a table segment, since these two type of segments fill the major part of our databases. The following query will reveil the segment that holds the corrupted block identified by and (which were given to you in the error message):

select ds.*
from dba_segments ds, sys.uet$ e
where ds.header_file=e.segfile#
and ds.header_block=e.segblock#
and e.file#=
and between e.block# and e.block#+e.length-1;

If the segment turns out to be an index segment, then the problem can be very quickly solved. Since all the table data required for recreating the index is still accessable, we can drop and recreate the index (since the block will reformatted, when taken from the free-space list and reused for the index).
If the segment turns out to be a table segment a number of options for solving the problem are available:

- Using datafile backups and archived redo logfiles we could do a recovery of the datafile that holds the corrupt data block. We must be sure though that the data block was not already corrupted in the backup datafile.
This option involves a lot of effort: shutting down the database, restoring an old enough datafile, then recover that datafile by applying multiple archived logfiles and restarting the database. It will however restore our corrupted table segment completely up to the time when we discovered the corrupt data block.
- If we regularly made dumpfiles using the Export-utility, another option could be to restore the table segment using the Import-utility. Of course this can only be done if the version of the table as recorded in the dumpfile is acceptable (ie. not too old) compared to other related tables in our database. Before doing the table import we would have to truncate the current table which deallocates the bad block.
- The last option involves using SQL to select as much data as possible from the current corrupted table segment and save the selected rows into a new table.
Selecting data that is stored in segment blocks that preceede the corrupted block can be easily done using a full table scan (via a cursor). Rows stored in blocks after the corrupted block cause a problem. A full table scan will never reach these. However these rows can still be fetched using rowids (single row lookups).

The next section will investigate the third option more closely. Part of the trick is to find out what the rowid's are of the rows that lie behind the corrupted block. The presence of an index on the corrupted table will aid us a lot in this area.


2. Table data recovery using SQL

2.1 Table was indexed

Suppose the table was indexed, which is a very likely assumption since every table should have a primary key and primary keys are implemented using unique indexes. The index contains pointers (rowid's) to each row in our corrupted table which can be used to directly access rows that are stored after the corrupt data block in the table segment.
Using an optimizer hint we can write a query that selects the rows from the table via an index scan (using rowid's), instead of via a full table scan. Let's assume our table is named X with columns a, b and c. And table X is indexed uniquely on columns a and b by index X_I, the query would look like:

select /*+index(X X_I) */ a, b, c
from X;

The execution plan for this query is:

Select statement
>table access (by rowid) X
>index range scan X_I

We must now exclude the corrupt block from being accessed to avoid the internal exception ORA-00600[01578]. Since the blocknumber is a substring of the rowid ([1]) this can very easily be achieved:

select /*+index(X X_I) */ a, b, c
from X
where rowid not like '.%.';

The execution plan for this query stays the same:

Select statement
>table access (by rowid) X
>index range scan X_I

But it is important to realize that the where-clause gets evaluated right after the index is accessed and before the table is accessed. Otherwise we would still get the ORA-00600[01578] exception. Using the above query as a subquery in an insert statement we can restore all rows of still valid blocks to a new table.

Since the index holds the actual column values of the indexed columns we could also use the index to restore all indexed columns of rows that reside in the corrupt block. The following query,

select /*+index(X X_I) */ a, b
from X
where rowid like '.%.';

retreives only indexed columns a and b from rows inside the corrupt block. The optimizer will not access the table for this query. It can retreive the column values using the index segment only. The execution plan for the above query is:

Select statement
>index range scan X_I

Using this technique we are able to restore all indexed column values of the rows inside the corrupt block, without accessing the corrupt block at all. Suppose in our example that column c of table X was also indexed by index X_I2. This enables us to completely restore rows inside the corrupt block.

First restore columns a and b using index X_I:

create table X_a_b(rowkey,a,b) as
select /*+index(X X_I) */ rowid, a, b
from X
where rowid like '.%.';

Then restore column c using index X_I2:

create table X_c(rowkey,c) as
select /*+index(X X_I2) */ rowid, c
from X
where rowid like '.%.';

And finally join the columns together using the restored rowid:

select x1.a, x1.b, x2.c
from X_a_b x1, X_c x2
where x1.rowkey=x2.rowkey;


In summary:
Indexes on the corrupted table segment can be used to restore all columns of all rows that are stored outside the corrupted data blocks. Of rows inside the corrupted data blocks, only the columns that were indexed can be restored. We might even be able to use an old version of the table (via Import) to further restore non-indexed columns of these records.


2.2 Table has no indexes

This situation should rarely occur since every table should have a primary key and therefore a unique index. However when no index is present, all rows of corrupted blocks should be considered lost. All other rows can be retrieved using rowid's. Since there is no index we must build a rowid generator ourselves. The SYS.UET$ table shows us exactly which extents (file#, startblock, endblock) we need to inspect for possible rows of our table X. If we make an estimate of the maximum number of rows per block for table X, we can build a PL/SQL-loop that generates possible rowid's of records inside table X. By handling the 'invalid rowid' exception, and skipping the corrupted data block, we can restore all rows except those inside the corrupted block.

declare
v_rowid varchar2(18);
v_xrec X%rowtype;
e_invalid_rowid exception;
pragma exception_init(e_invalid_rowid,-1410);
begin for v_uetrec in (select file# file, block# start_block, block#+length#-1 end_block
from uet$
where segfile#=6 and segblock#=64) -- Identifies our segment X.
loop for v_blk in v_uetrec.start_block..v_uetrec.end_block
loop if v_uetrec.file<>6 and v_blk<>886 -- 886 in file 6 is our corrupted block.
then for v_row in 0..200 -- 200 is maximum number of rows per block for segment X.
loop begin select a,b,c into v_rec
from x
where rowid=chartorowid('0000'hex(v_blk)'.'
hex(v_row)'.'hex(v_uetrec.file);
insert into x_saved(a,b,c) values(v_rec.a,v_rec.b,v_rec.c);
commit;
exception when e_invalid_rowid then null;
end;
end loop; /*row-loop*/
end if;
end loop; /*blk-loop*/
end loop; /*uet-loop*/
end;
/

The above code assumes that block id's never exceed 4 hexadecimal places. A definition of the hex-function which is used in the above code can be found in the appendix.


3. Conclusion

Solving Oracle data block corruption is hopefully not an every day task for you. As we have shown in this short paper the solution is very straightforward. In many situations almost all data can be restored from corrupted data segments. The presence of an index on the corrupted table segment greatly aids in restoring the data, since it not only supplies us with all the rowid's, but also holds column values of rows that cannot be accessed inside the table segment.


Appendix: function HEX.

Function hex accepts a decimal number and returns the hexadecimal representation of this number.

create or replace function hex(v_num in binary_integer) return varchar2 as
-- Assumption: 'v_num' is less than power(16,4).
v_tmp3 binary_integer; v_tmp2 binary_integer;
v_tmp1 binary_integer; v_tmp0 binary_integer;
v_buf binary_integer;
function hexchr(v_c in binary_integer) return varchar2 as
begin if v_c between 0 and 9 then return to_char(v_c);
elsif v_c=10 then return 'A'; elsif v_c=11 then return 'B';
elsif v_c=12 then return 'C'; elsif v_c=13 then return 'D';
elsif v_c=14 then return 'E'; elsif v_c=15 then return 'F';
end if;
end;
function div(i in binary_integer, j in binary_integer) return binary_integer as
v_buf binary_integer:=i; a binary_integer:=0;
begin while v_buf>=j
loop a:=a+1; v_buf:=v_buf‑j; end loop;
return a;
end;
begin v_buf:=v_num;
v_tmp3:=div(v_buf,4096); v_buf:=v_buf‑4096*v_tmp3;
v_tmp2:=div(v_buf,256); v_buf:=v_buf‑256*v_tmp2;
v_tmp1:=div(v_buf,16); v_buf:=v_buf‑16*v_tmp1;
v_tmp0:=v_buf;
return hexchr(v_tmp3)hexchr(v_tmp2)hexchr(v_tmp1)hexchr(v_tmp0);
end;
/
[1] A rowid holds the physical address of a row. Its value is formatted as follows: 'xxxxxxxx.yyyy.zzz'. The first part (x) equals the block offset within the datafile. The second part (y) equals the row offset within the data block. The last part (z) identifies the datafile. All values are in hexadecimal representation.

1 comment:

  1. I came across the same situation in which a block of data in database has been corrupted.

    Petter Smith
    http://www.hdrconline.com

    ReplyDelete