Automated Reference Data Synchronization with Datalake HIVE Tables

Problem Statement: Currently, synchronizing Reference Data with Datalake and accessible from Hive Tables is a manual process involving data export from IKC and manual transfer to the Azure cloud platform. This approach is inefficient and error-prone.

Proposed Solution: Implement an automated data synchronization function that can handle both incremental and full data syncs between Reference Data and Datalake and accessible from Hive Tables. Function can be invoked via UI or via API for Data sync-up.

Function Requirements:

Incremental Data Sync:
- CDC Tracking: Utilize Change Data Capture (CDC) mechanisms to track changes in Reference Data.
- Delta Load: Identify delta changes and efficiently load them into Datalake and accessible from Hive
- Checkpoint Management: Maintain a checkpoint to ensure that only new or modified data is synchronized.
Full Data Sync:
- Data Extraction: Extract the entire Reference Data dataset.
- Data Loading: Load the extracted data into Datalake and accessible from Hive Tables, potentially optimizing the loading process for large datasets.
Error Handling: Implement robust error handling mechanisms to prevent data loss or corruption during the synchronization process.
Security: Ensure data security and access controls during the synchronization process.

Benefits of Automation:

Efficiency: Reduce manual effort and time spent on data synchronization.
Accuracy: Minimize the risk of errors during data transfer and updates.
Consistency: Ensure consistent data synchronization, reducing the likelihood of inconsistencies between Reference Data and Datalake and accessible from Hive
Scalability: Handle synchronization for large datasets and complex scenarios.
Real-time Updates: Enable near real-time data updates in Datalake and accessible from Hive Tables based on changes in Reference Data.

Needed By

Quarter

Post comment

Admin

Michal Szylar

May 25, 2025

Hi, before we can prioritise this, we need more clarity. For now, we’ll mark it as not under consideration, but we can revisit and update the status once we have more details.

Reply
Hide replies

Admin

Michal Szylar

Dec 4, 2024

Hello, could you please confirm it is about uni-directional synchronisation from IBM Knowledge Catalog to HIVE? Best regards

Reply
Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Automated Reference Data Synchronization with Datalake HIVE Tables