原 视图DBA_GOLDENGATE_SUPPORT_MODE引起的集成模式OGG不同步问题排查过程
Tags: Oracle原创故障处理OGGbug慢11.2.0.4故障排查不同步
现象
使用OGG 21.3可以远程同步Oracle 11.2.0.4的数据库,这个我之前已经测试过,参考:https://www.dbaup.com/shiyongogg-21-3yuanchengshishihuxiangtongbuoracle-11-2-0-4shuangzhu.html
然后,客户这边有个需求,需要同步Oracle 11.2.0.4的rac到单机环境,我也是按照这个过程配置的,但是配置完成后,extract进程不能抽取数据,很是奇怪,也不报错,只是“Lag at Chkpt”延迟一直在增大,执行stats exta报错“No active extraction maps.”,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | GGSCI (ogg) 1> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING JAGENT STOPPED PMSRVR STOPPED EXTRACT RUNNING EXTA 16:22:31 00:00:06 EXTRACT RUNNING EXTB 16:09:25 00:00:05 REPLICAT RUNNING REPA 00:00:00 00:00:02 REPLICAT RUNNING REPB 00:00:00 00:00:04 GGSCI (ogg) 86> stats exta Sending STATS request to Extract group EXTA ... No active extraction maps. DDL replication statistics (for all trails): *** Total statistics since extract started *** Operations 0.00 Mapped operations 0.00 Unmapped operations 0.00 Other operations 0.00 Excluded operations 0.00 GGSCI (ogg) 48> info exta Extract EXTA Last Started 2022-11-03 09:15 Status RUNNING Checkpoint Lag 16:39:15 (updated 00:00:00 ago) Process ID 15884 Log Read Checkpoint Oracle Integrated Redo Logs 2022-11-02 16:37:03 SCN 7.3280852740 (33345623812) GGSCI (ogg) 49> info exta,showch Extract EXTA Last Started 2022-11-03 09:15 Status RUNNING Checkpoint Lag 16:39:15 (updated 00:00:08 ago) Process ID 15884 Log Read Checkpoint Oracle Integrated Redo Logs 2022-11-02 16:37:03 SCN 7.3280852740 (33345623812) Current Checkpoint Detail: Read Checkpoint #1 Oracle Integrated Redo Log Startup Checkpoint (starting position in the data source): Timestamp: 2022-11-02 16:36:25.000000 SCN: 0.0 (0) Recovery Checkpoint (position of oldest unprocessed transaction in the data source): Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852739 (33345623811) Current Checkpoint (position of last record read in the data source): Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852740 (33345623812) Write Checkpoint #1 GGS Log Trail Current Checkpoint (current write position): Sequence #: 2 RBA: 1478 Timestamp: 2022-11-03 09:16:18.549673 Extract Trail: ./dirdat/ea Seqno Length: 9 Flip Seqno Length: No Trail Type: EXTTRAIL Header: Version = 2 Record Source = A Type = 13 # Input Checkpoints = 1 # Output Checkpoints = 1 Configuration: Data Source = 3 Transaction Integrity = 1 Task Type = 0 Status: Start Time = 2022-11-03 09:15:18 Last Update Time = 2022-11-03 09:16:18 Stop Status = A Last Result = 520 GGSCI (ogg) 51> send exta showtrans Sending SHOWTRANS request to Extract group EXTA ... Extract is currently in recovery mode (reading transactions from trail file). Please try again in a few minutes. GGSCI (ogg) 66> send extract exta status Sending STATUS request to Extract group EXTA ... EXTRACT EXTA (PID 16108) Current status: In recovery[1]: Processing data Current read position: Redo thread #: 1 Sequence #: 257 RBA: 922376 Timestamp: 2022-11-02 16:37:03.000000 SCN: 7.3280852740 (33345623812) Current write position: Sequence #: 3 RBA: 1478 Timestamp: 2022-11-03 09:23:39.027185 Extract Trail: ./dirdat/ea |
另外,自己新建的环境都没有问题,可以实时同步,就客户的环境不能同步!!!
分析过程
1、用以下这些命令查询分析并没找到错误的原因
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | kill exta START EXTRACT exta BRRESET START EXTRACT exta BRINTERVAL 20M info exta,showch info exta,detail send exta showtrans send extract exta status GGSCI (ogg) 3> send exta showtrans Sending SHOWTRANS request to Extract group exta ... ------------------------------------------------------------ XID: 0.5.29.7834 Items: 0 Extract: ext8 Redo Thread: 1 Start Time: 2022-11-03:10:08:09 SCN: 7.3281664446 (33346435518) Redo Seq: 1707 Redo RBA: 20690864 Status: Running |
这里其实有个很奇怪的现象,就是send exta showtrans查询出来的是数据库未提交的事务,但是gv$transaction视图却查不出来结果,找不到该事务XID:
1 | SELECT * FROM gv$transaction; |
2、数据库的告警日志有如下输出:
1 2 3 | setting IGNORE_UNSUPPORTED_TABLE for table (*) GoldenGate Capture:OGG$CAP_EXT1 setting _FILTER_PARTIAL_ROLLBACK: Setting XOUT_CLIENT_EXISTS to Y for Capture: OGG$CAP_EXT1 |
怀疑是否是这个问题,于是配置:
1 2 | exec DBMS_CAPTURE_ADM.SET_PARAMETER('OGG$CAP_EXTA','IGNORE_UNSUPPORTED_TABLE','-'); exec DBMS_CAPTURE_ADM.SET_PARAMETER('OGG$CAP_EXT1','IGNORE_UNSUPPORTED_TABLE','-'); |
仍然不行。参考:https://docs.oracle.com/database/121/ARPLS/d_cap_a.htm#ARPLS306
3、考虑到之前用的docker环境可以,于是对比一下2个库的差异,发现字符集和redo大小不一样。我之前用的环境是AL32UTF8字符集,而客户环境是ZHS16GBK环境,
1 2 3 4 | export NLS_LANG=AMERICAN_AMERICA.ZHS16GBK export NLS_LANG=AMERICAN_AMERICA.AL32UTF8 setenv(NLS_LANG="AMERICAN_AMERICA.ZHS16GBK") |
仍然不行。
一般来说,字符集不一样,只会导致数据传输过来后是乱码,不会导致进程hang住。
另外,由于好的环境用的是50M的redo日志,而客户环境是300M的redo日志,于是修改redo日志为50M
1 2 3 4 5 6 | SELECT * FROM v$log; alter database add logfile size 52428800; alter database drop logfile group 13; alter system checkpoint; alter system switch logfile; |
还是不行。
4、怀疑是否和rac有关,另外,客户的环境也打了最新的PSU,为了验证这个问题,所以特意新建了2个库,redo为500m,字符集为ZHS16GBK,但是测试都没问题:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | dbca -silent -createDatabase -templateName General_Purpose.dbc -responseFile NO_VALUE \ -gdbname gbk -sid gbk \ -sysPassword oracle -systemPassword oracle \ -datafileDestination '/u01/app/oracle/oradata' \ -recoveryAreaDestination '/u01/app/oracle/flash_recovery_area' \ -redoLogFileSize 500 \ -storageType FS \ -characterset ZHS16GBK -nationalCharacterSet AL16UTF16 \ -sampleSchema true \ -memoryPercentage 10 \ -databaseType OLTP \ -emConfiguration NONE dbca -silent -createDatabase -templateName General_Purpose.dbc -responseFile NO_VALUE \ -gdbname gbk -sid gbk \ -sysPassword oracle -systemPassword oracle \ -datafileDestination '+DATA' -recoveryAreaDestination 'DATA/' \ -storageType ASM -asmsnmpPassword oracle -diskGroupName 'DATA' \ -characterset ZHS16GBK -nationalCharacterSet AL16UTF16 \ -redoLogFileSize 300 \ -sampleSchema true \ -memoryPercentage 10 \ -databaseType OLTP \ -emConfiguration NONE \ -nodeinfo rac1,rac2 |
那说明是客户数据库本身的问题。难道是客户的表太多导致的吗,客户端表才2万张表左右,其实不多!!!!
5、怀疑是否有特殊的触发器导致。经过查询,没有特殊的触发器!!!
6、使用OGG巡检报告
参考:使用OGG 21.3远程实时互相同步Oracle 11.2.0.4(双主) – 小麦苗DBA宝典 (dbaup.com)
https://www.dbaup.com/ogg-for-oraclejiankangxunjianguanfangjiaoben.html
这里其实有个特殊现象,就是跑脚本的时候,总是卡住,巡检结果根本出不来,于是分析脚本,找到卡住的位置如下:
1 2 3 4 5 6 | prompt prompt ++ TABLES SUPPORT BY GOLDENGATE Integrated Capture ++ prompt Lists tables that can not be supported by OGG (NONE) prompt Lists table that are supported via OGG FETCH (ID KEY) select * from DBA_GOLDENGATE_SUPPORT_MODE where support_mode in ('ID KEY', 'NONE') order by owner,object_name; |
于是拿到数据库中查询视图DBA_GOLDENGATE_SUPPORT_MODE,发现根本不能出结果。
DBA_GOLDENGATE_SUPPORT_MODE displays information about the level of Oracle GoldenGate capture process support for the tables in the database. 捕获进程对数据库中表的支持级别的信息
| Column | Datatype | NULL | Description |
|---|---|---|---|
OWNER | VARCHAR2(128) | Table owner | |
OBJECT_NAME | VARCHAR2(128) | Table name | |
SUPPORT_MODE | VARCHAR2(6) | Capture process support level for the table:FULL - A capture process can capture changes made to all of the columns in the tableID KEY - A capture process can capture changes made to the key columns and any other columns in the table supported by the capture process, except for LOB, LONG, LONG RAW, and XMLType columns.INTERNAL - A capture process cannot capture changes made to any columns in the table because the table is secondary to a user-created table and is updated implicitly when changes are made to the user-created table. Such tables include mapping tables for index-organized tables, storage tables for nested tables, materialized view logs, secondary objects associated with domain indexes, and temporary tables.NONE - A capture process cannot capture changes made to any columns in the table because the table is not supported for replication. |
参考:https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/DBA_GOLDENGATE_SUPPORT_MODE.html
找到这个视图的定义,发现视图dba_goldengate_support_mode基于视图DBA_XSTREAM_OUT_SUPPORT_MODE:


