Skip to Main Content
IBM Data and AI Ideas Portal for Customers


This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:


Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,


Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Employees should enter Ideas at https://ideas.ibm.com


Status Under review
Workspace StreamSets
Created by Guest
Created on Aug 30, 2024

Oracle CDC: Add the ability to mine only the archived redo logs with Primary DB

The customer is testing a 3 pipeline design pattern of Oracle CDC → Snowflake use-case on SDC 5.10, Oracle RAC DB with PDB. They are testing the jobs in a test environment with production like volumes. During the test, they noticed in the logs that the same mining window gets mined multiple times in case the sequence for the online redo logs change. This adds to the Oracle CDC stage lag.

The below logs are from the customer’s SDC where we can see that 10 min window was mined multiple times and it took almost 45 minutes for it to finish.

2024-07-25 11:59:02,292 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO  OracleCDCOrigin - Oracle CDC Origin fetch tracking: starting records fetching...: from scn 6426657095797 to scn 6426662650129
2024-07-25 12:02:32,977 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:0] [thread:Idle Close Thread] [stage:LocalFS_1] INFO  RecordWriterManager - Writer for /sset/data/bi/bi-cdc/cdc/_tmp_sdc_0.sdc.gz was idle closed, renaming..
2024-07-25 12:04:57,175 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO  DiskSpaceMonitor - Took 0.544 ms to pollDiskSpace for /sset/data/bi/diskspill/sdc-oracle-cdc-DML-2024.07.25-12.4.57.168-ca865e31-abb9-44b8-91a2-fdff3c264ea9-10899180769530092652
2024-07-25 12:20:47,336 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO  OracleCDCOrigin - Oracle CDC Origin fetch tracking: finished records fetching from scn 6426657095797 to scn 6426662650129: Time: 1305044 ms - Read: 669823 rows - Read DDL: 669823 rows - Read DML: 0 rows - Skipped: 1 rows - Skipped DDL: 0 rows - Skipped DML: 1 rows - Out of Order: 0 rows - Out of Order DDL: 0 rows - Out of Order DML: 0 rows -
2024-07-25 12:20:47,587 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] WARN  OracleCDCOrigin - Sequence number for online log {"logGroup":8,"logThread":2,"logSequence":6773,"logBytes":4294967296,"logBlockSize":512,"logMembers":1,"logArchived":true,"logStatus":"INACTIVE","logFirstChange":6426660590828,"logFirstTime":"2024-07-25T10:21:07","logNextChange":6426662692032,"logNextTime":"2024-07-25T10:31:01","logContainer":0,"fileGroup":8,"fileStatus":null,"fileType":"ONLINE","fileMember":"+CVUT1\/CDBUT1\/ONLINELOG\/group_8.1082.1166074789","fileRecovery":false,"fileContainer":"0"} has changed from 6773 to 6778. Online log replaced by Optional[{"logGroup":8,"logThread":2,"logSequence":6778,"logBytes":4294967296,"logBlockSize":512,"logMembers":1,"logArchived":false,"logStatus":"CURRENT","logFirstChange":6426679318129,"logFirstTime":"2024-07-25T12:09:20","logNextChange":9295429630892703743,"logNextTime":null,"logContainer":0,"fileGroup":8,"fileStatus":null,"fileType":"ONLINE","fileMember":"+CVUT1\/CDBUT1\/ONLINELOG\/group_8.1082.1166074789","fileRecovery":false,"fileContainer":"0"}]
2024-07-25 12:20:47,587 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] WARN  OracleCDCOrigin - Mining again last LogMiner window to ensure no data loss: 6426657095797 to 6426662650129
2024-07-25 12:20:55,820 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO  OracleCDCOrigin - Oracle CDC Origin fetch tracking: starting records fetching...: from scn 6426657095797 to scn 6426662650129
2024-07-25 12:26:49,354 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO  DiskSpaceMonitor - Took 0.743 ms to pollDiskSpace for /sset/data/bi/diskspill/sdc-oracle-cdc-DML-2024.07.25-12.26.49.343-291f94ff-4e7b-4a20-b213-690121f30e74-13524047838174331153
2024-07-25 12:43:00,698 [user:*9a1e18d0-6fda-11ec-9545-67b2952936e2@b4ab9513-6fda-11ec-9545-fd39a2f1461a] [pipeline:EBS-SF-P1-CVUT1/EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a] [runner:] [thread:ProductionPipelineRunnable-EBSSFP1__f70c1b40-dcaf-4ef8-8eb3-da61e2316bfb__b4ab9513-6fda-11ec-9545-fd39a2f1461a-EBS-SF-P1-CVUT1 - <Oracle CDC Origin Main>::<0769e150-1114-4819-af49-5bfa50b00060> - <Oracle CDC Origin Delver>::<0769e150-1114-4819-af49-5bfa50b00060>::Mine::<1>] [stage:] INFO  OracleCDCOrigin - Oracle CDC Origin fetch tracking: finished records fetching from scn 6426657095797 to scn 6426662650129: Time: 1324878 ms - Read: 724025 rows - Read DDL: 724025 rows - Read DML: 0 rows - Skipped: 1 rows - Skipped DDL: 0 rows - Skipped DML: 1 rows - Out of Order: 0 rows - Out of Order DDL: 0 rows - Out of Order DML: 0 rows -

Given this, the customer has requested that the Oracle CDC stage be updated to add the ability to mine only the archived redo logs with primary database.

The customer does not prefer using the standby DB option as the latency would be much higher than compared to the primary database.

Please Note: The customer already has the same use-case running successfully in production against a DB which sees 2-3 GB of redo logs being generated per hour (which is one-third of the load compared to their latest use-case). These jobs are running on SDC 5.8.1.

Updates from the customer:

August 29, 2024: The customer mentioned that they will be using standby instance for Oracle CDC in the production. The standby instance may not be always available or in sync with primary instance. One of the examples that the customer shared is when the standby has to be patched. Given this, they would like to have this feature available as a backup when standby instance is not available.

The customer has a new production go-live date as third week of September. By the go-live date, the customer would like to either have the feature ready or an ETA by when this feature can be made available.

August 22, 2024: The customer wanted to share the qlik cdc documentation, section: Limitations when using Oracle LogMiner to access the redo logs. The customer says that this seems true for their setup as they are on Oracle RAC with PDB.

July 26, 2024: CI opened in StreamSets JIRA (https://streamsets.atlassian.net/browse/CI-2137)

Needed By Month