The customer is testing a 3 pipeline design pattern of Oracle CDC → Snowflake use-case on SDC 5.10, Oracle RAC DB with PDB. They are testing the jobs in a test environment with production like volumes. During the test, they noticed in the logs that the same mining window gets mined multiple times in case the sequence for the online redo logs change. This adds to the Oracle CDC stage lag.
The below logs are from the customer’s SDC where we can see that 10 min window was mined multiple times and it took almost 45 minutes for it to finish.
2024-07-25 11:59:02,292 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO OracleCDCOrigin - Oracle CDC Origin fetch tracking: starting records fetching...: from scn 6426657095797 to scn 6426662650129
2024-07-25 12:02:32,977 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:0] [thread:Idle Close Thread] [stage:LocalFS_1] INFO RecordWriterManager - Writer for /sset/data/bi/bi-cdc/cdc/_tmp_sdc_0.sdc.gz was idle closed, renaming..
2024-07-25 12:04:57,175 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO DiskSpaceMonitor - Took 0.544 ms to pollDiskSpace for /sset/data/bi/diskspill/sdc-oracle-cdc-DML-2024.07.25-12.4.57.168-ca865e31-abb9-44b8-91a2-fdff3c264ea9-10899180769530092652
2024-07-25 12:20:47,336 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO OracleCDCOrigin - Oracle CDC Origin fetch tracking: finished records fetching from scn 6426657095797 to scn 6426662650129: Time: 1305044 ms - Read: 669823 rows - Read DDL: 669823 rows - Read DML: 0 rows - Skipped: 1 rows - Skipped DDL: 0 rows - Skipped DML: 1 rows - Out of Order: 0 rows - Out of Order DDL: 0 rows - Out of Order DML: 0 rows -
2024-07-25 12:20:47,587 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] WARN OracleCDCOrigin - Sequence number for online log {"logGroup":8,"logThread":2,"logSequence":6773,"logBytes":4294967296,"logBlockSize":512,"logMembers":1,"logArchived":true,"logStatus":"INACTIVE","logFirstChange":6426660590828,"logFirstTime":"2024-07-25T10:21:07","logNextChange":6426662692032,"logNextTime":"2024-07-25T10:31:01","logContainer":0,"fileGroup":8,"fileStatus":null,"fileType":"ONLINE","fileMember":"+CVUT1\/CDBUT1\/ONLINELOG\/group_8.1082.1166074789","fileRecovery":false,"fileContainer":"0"} has changed from 6773 to 6778. Online log replaced by Optional[{"logGroup":8,"logThread":2,"logSequence":6778,"logBytes":4294967296,"logBlockSize":512,"logMembers":1,"logArchived":false,"logStatus":"CURRENT","logFirstChange":6426679318129,"logFirstTime":"2024-07-25T12:09:20","logNextChange":9295429630892703743,"logNextTime":null,"logContainer":0,"fileGroup":8,"fileStatus":null,"fileType":"ONLINE","fileMember":"+CVUT1\/CDBUT1\/ONLINELOG\/group_8.1082.1166074789","fileRecovery":false,"fileContainer":"0"}]
2024-07-25 12:20:47,587 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] WARN OracleCDCOrigin - Mining again last LogMiner window to ensure no data loss: 6426657095797 to 6426662650129
2024-07-25 12:20:55,820 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO OracleCDCOrigin - Oracle CDC Origin fetch tracking: starting records fetching...: from scn 6426657095797 to scn 6426662650129
2024-07-25 12:26:49,354 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO DiskSpaceMonitor - Took 0.743 ms to pollDiskSpace for /sset/data/bi/diskspill/sdc-oracle-cdc-DML-2024.07.25-12.26.49.343-291f94ff-4e7b-4a20-b213-690121f30e74-13524047838174331153
2024-07-25 12:43:00,698 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO OracleCDCOrigin - Oracle CDC Origin fetch tracking: finished records fetching from scn 6426657095797 to scn 6426662650129: Time: 1324878 ms - Read: 724025 rows - Read DDL: 724025 rows - Read DML: 0 rows - Skipped: 1 rows - Skipped DDL: 0 rows - Skipped DML: 1 rows - Out of Order: 0 rows - Out of Order DDL: 0 rows - Out of Order DML: 0 rows -
Given this, the customer has requested that the Oracle CDC stage be updated to add the ability to mine only the archived redo logs with primary database.
The customer does not prefer using the standby DB option as the latency would be much higher than compared to the primary database.
Please Note: The customer already has the same use-case running successfully in production against a DB which sees 2-3 GB of redo logs being generated per hour (which is one-third of the load compared to their latest use-case). These jobs are running on SDC 5.8.1.
Updates from the customer:
August 29, 2024: The customer mentioned that they will be using standby instance for Oracle CDC in the production. The standby instance may not be always available or in sync with primary instance. One of the examples that the customer shared is when the standby has to be patched. Given this, they would like to have this feature available as a backup when standby instance is not available.
The customer has a new production go-live date as third week of September. By the go-live date, the customer would like to either have the feature ready or an ETA by when this feature can be made available.
August 22, 2024: The customer wanted to share the qlik cdc documentation, section: Limitations when using Oracle LogMiner to access the redo logs. The customer says that this seems true for their setup as they are on Oracle RAC with PDB.
July 26, 2024: CI opened in StreamSets JIRA (https://streamsets.atlassian.net/browse/CI-2137)