ora-08104 online index (re)build cleanup-CSS-86后生记录生活

ora-08104 online index (re)build cleanup

2023-09-09 阅读 37 评论 0

摘要：问题产生原因，对一张业务操作频繁（DML）的表进行索引的rebuildonline，执行10分钟后，未执行完，且发现被其中一个会话锁住，处于等待状态，遂取消此操作，后发现应用后台报错，无法连接（实际是无法对

问题产生原因，对一张业务操作频繁（DML）的表进行索引的rebuild online，执行10分钟后，未执行完，且发现被其中一个会话锁住，处于等待状态，遂取消此操作，后发现应用后台报错，无法连接（实际是无法对此表做DML操作），查找资料如下

online index (re)build cleanup: objn=96528 maxretry=2000 forever=0

ora-01843 not a valid month， tomcat日志报错信息如下：

nested exception is java.sql.BatchUpdateException: ORA-00600: 内部错误代码, 参数: [25027], [7], [1], [], [], [], [], [], [], [], [], []

alert 日志如下：

ORA-00600: 内部错误代码, 参数: [25027], [7], [1], [], [], [], [], [], [], [], [], []

--执行此sql，由于该表一直处于DML（不断进行update）操作，遂重启数据库（重启需慎重），执行如下，完成（不重启，该函数无法执行完成，原因是有DML不停操作此表，如果有update正在锁表，会导致函数执行不成功，loop循环，可查看vlocked_object.object_id找到需要操作的表，kill掉相关session，发生此问题时可以对表进行update，但是无法insert或delete，执行报以上错误）

10gR2中我们可以使用dbms_repair.online_index_clean来手动清理online index rebuild的遗留问题

declare

isClean boolean;

begin

isClean := FALSE;

while isClean=FALSE loop

isClean := dbms_repair.online_index_clean(dbms_repair.all_index_id,

dbms_repair.lock_wait);

dbms_lock.sleep(2);

end loop;

exception

when others then

RAISE;

end;

或

SQL> declare

2 isClean boolean;

3 begin

4 isClean := dbms_repair.online_index_clean();

5 end;

6 /

解释：

在线重建索引过程出现错误，如网络中断、用户终止等，当再次建立索引时，有可能会产生该错误，ora-08104，这是由于之前的操作痕迹没有被清除而造成的。

create /rebuild index online时会修改数据库字典表obj$,并在该索引用户下创建表sys_journal_obj#（具体的对象号）和在ind$、ind_online$表里(ind_online$字典基表记录了索引在线创建/重建的历史)标记256或512,（11g里rebuild online是514？？）。

如果服务器进程在语句执行过程中意外终止的话，可能会导致相关在ind$标记位信息及在线日志中间表不能及时处理及清除（清除动作一般有smon进程处理,如果重建过程异常中断,smon会清理重建痕迹,但是如果系统非常繁忙导致smon无法清除,或dml操作频繁,导致smon无法获取相关表上的锁,从而造成无法清理痕迹,当再次重建索引或对表进行dml操作会报本篇提示错误），这将导致对该索引的后续操作因ora-08104错误而无法继续，如果是分区表，索引是global，在添加分区也无法继续

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> drop index i_c_lqq;

drop index i_c_lqq

ERROR at line 1:

ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

SQL> alter index i_c_lqq rebuild online;

alter index i_c_lqq rebuild online

ERROR at line 1:

ORA-08104: this index object 90675 is being online built or rebuilt

SQL> create index i_c_lqq on C_RETAIL_CO_LINE_lqq(co_num) online;

create index i_c_lqq on C_RETAIL_CO_LINE_lqq(co_num) online

ERROR at line 1:

ORA-00955: name is already used by an existing object

查询相关视图信息发现并没有被smon清除从而报错：

SQL> select * from dba_objects where object_name='I_C_LQQ';

OWNER OBJECT_NAM SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE CREATED LAST_DDL_ TIMESTAMP STATUS T G S NAMESPACE EDITION_NAME

---------- ---------- --------------- ---------- -------------- --------------- --------- --------- ------------------- ------- - - - ---------- ---------------

RTMS I_C_LQQ 90675 90675 INDEX 18-MAY-15 18-MAY-15 2015-05-18:11:32:32 VALID N N N 4

SQL> select owner#,obj#,name from obj$ where name='I_C_LQQ';

OWNER# OBJ# NAME

---------- ---------- ------------------------------

85 90675 I_C_LQQ

SQL> select obj#,ts#,file#,flags from ind$ where obj#=90675;

OBJ# TS# FILE# FLAGS

---------- ---------- ---------- ----------

90675 6 0 67109379

SQL> select * from ind_online$;

OBJ# TYPE# FLAGS

---------- ---------- ----------

90675 1 67109379

如果调用如下函数去处理时，有dml操作此表，可能会导致函数一直loop去处理，但是没有处理成功，在alert里会报出如下信息，针对这种情况需要找出相关的session会话（kill掉），然后在进行处理

declare

isClean boolean;

begin

isClean := FALSE;

while isClean=FALSE loop

isClean := dbms_repair.online_index_clean(dbms_repair.all_index_id,

dbms_repair.lock_wait);

dbms_lock.sleep(2);

end loop;

exception

when others then

RAISE;

end;

online index (re)build cleanup: objn=90675 maxretry=2000 forever=0

这个函数也不是万能的，在11.2.0.3.6的版本上，就遇到一个利用此函数也不能清除的案例，最后，查询发现索引对象已经不存在，并且在ind$和obj$表里查不到相关对象了，但是在ind_online$表里发现还有记录，可能原因是smon在做clean动作时，需要去关联相关表，发现某些记录不存在，不能识别然后做clean动作导致吧，这里纯属推测，针对这种情形，手工drop table xxxx purge，保留online_ind$里的信息；如果在10gR1之前遇到这种情况，可能是遇到了bug。

通过函数处理完之后此表就可以进行dml等操作了。

函数介绍:

online_index_clean function

function online_index_clean(

object_id IN binary_integer DEFAULT ALL_INDEX_ID,

wait_for_lock IN binary_integer DEFAULT LOCK_WAIT)

return boolean;

-- Example Usage of online_index_clean:

-- DECLARE

-- isClean BOOLEAN;

-- BEGIN

-- isClean := FALSE;

-- WHILE isClean=FALSE

-- LOOP

-- isClean := DBMS_REPAIR.ONLINE_INDEX_CLEAN(DBMS_REPAIR.ALL_INDEX_ID,

-- DBMS_REPAIR.LOCK_WAIT);

-- DBMS_LOCK.SLEEP(10);

-- END LOOP;

-- EXCEPTION

-- WHEN OTHERS THEN

-- RAISE;

-- END;

-- /

END dbms_repair;

This function performs a manual cleanup of failed or interrupted online index builds or rebuilds. This action is also performed periodically by SMON, regardless of user-initiated cleanup.

This function returns TRUE if all indexes specified were cleaned up and FALSE if one or more indexes could not be cleaned up.

注意事项:在执行过程中,需要在索引所在表获取锁,因此应尽可能保证索引表不被其他事务锁定,以尽快清理临时数据,如果长时间不能清除数据,后台会报错:

online index (re)build cleanup: objn=96528 maxretry=2000 forever=0

手工修改数据字典,清除中间表:

update ind$ set flags=flags-512 where obj#=<object id>; /* 首先要确认flags>512如果不是，说明这个标志是正常的*/

update ind$ set flags=flags-512 where obj#=90675;

drop table .sys_journal_; /*这个步骤可能会报资源忙，因为有大量的日志正在插入，可以反复重试一下 */

drop table rtms.sys_journal_90675;

注意操作顺序,修改数据字典慎重使用

SMON(system monitor process)系统监控后台进程，有时候也被叫做system cleanup process，这么叫的原因是它负责完成很多清理(cleanup)任务

SMON的主要作用包括:

1.清理临时段(SMON cleanup temporary segments) 触发场景很多人错误地理解了这里所说的临时段temporary segments，认为temporary segments是指temporary tablespace临时表空间上的排序临时段(sort segment)。事实上这里的临时段主要指的是永久表空间(permanent tablespace)上的临时段，当然临时表空间上的temporary segments也是由SMON来清理(cleanup)的，但这种清理仅发生在数据库实例启动时(instance startup)。永久表空间上同样存在临时段，譬如当我们在某个永久表空间上使用create table/index等DDL命令创建某个表/索引时，服务进程一开始会在指定的永久表空间上分配足够多的区间(Extents)，这些区间在命令结束之前都是临时的(Temporary Extents)，直到表/索引完全建成才将该temporary segment转换为permanent segment。另外当使用drop命令删除某个段时，也会先将该段率先转换为temporary segment，之后再来清理该temporary segment(DROP object converts the segment to temporary and then cleans up the temporary segment)。常规情况下清理工作遵循谁创建temporary segment，谁负责清理的原则。换句话说，因服务进程rebuild index所产生的temporary segment在rebuild完成后应由服务进程自行负责清理。一旦服务进程在成功清理temporary segment之前就意外终止了，亦或者服务进程在工作过程中遇到了某些ORA-错误导致语句失败，那么SMON都会被要求(posted)负责完成temporary segment的清理工作。对于永久表空间上的temporary segment，SMON会三分钟清理一次(前提是接到post)，如果SMON过于繁忙那么可能temporary segment长期不被清理。temporary segment长期不被清理可能造成一个典型的问题是:在rebuild index online失败后，后续执行的rebuild index命令要求之前产生的temporary segment已被cleanup，如果cleanup没有完成那么就需要一直等下去。

在10gR2中我们可以使用dbms_repair.online_index_clean来手动清理online index rebuild的遗留问题。

与永久表空间上的临时段不同，出于性能的考虑临时表空间上的Extents并不在操作(operations)完成后立即被释放和归还。相反，这些Temporary Extents会被标记为可用，以便用于下一次的排序操作。SMON仍会清理这些Temporary segments，但这种清理仅发生在实例启动时(instance startup):

SMON会在实例启动(after startup of DB is started cleanup function again)时以及启动后的每12个小时执行一次清理任务(the cleanup is scheduled to run after startup and then every 12 hours)。

SMON负责在启动后(startup)的每小时执行一次对IND$基表中因在线创建/重建索引失败所留下记录的清理，这种清理工作由kdicclean函数驱动(kdicclean is run by smon every 1 hour，called from SMON to find if there is any online builder death and cleanup our ind$ and obj$ and drop the journal table, stop journaling)。这种清理工作典型的调用堆栈stack call如下:

ksbrdp -> ktmSmonMain ktmmon -> kdicclean -> kdic_cleanup -> ktssdrp_segment

注意因为SMON进程的清理工作每小时才执行一次，而且在工作负载很高的情况下可能实际很久都不会得到清理。

我们可以利用以下语句找出系统中可能需要恢复的IND$记录，注意不要看到查询有结果就认为这是操作失败的征兆，很可能是有人在线创建或重建索引:

select i.obj#, i.flags, u.name, o.name, o.type#

from sys.obj$ o, sys.user$ u, sys.ind_online$ i

where (bitand(i.flags, 256) = 256 or bitand(i.flags, 512) = 512)