Skip to Main Content
IBM Data and AI Ideas Portal for Customers


This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:


Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,


Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Employees should enter Ideas at https://ideas.ibm.com


Status Needs more information
Workspace Knowledge Catalog
Created by Guest
Created on Sep 9, 2024

Automated Reference Data Synchronization with Datalake HIVE Tables

Problem Statement: Currently, synchronizing Reference Data with Datalake and accessible from Hive Tables is a manual process involving data export from IKC and manual transfer to the Azure cloud platform. This approach is inefficient and error-prone.

Proposed Solution: Implement an automated data synchronization function that can handle both incremental and full data syncs between Reference Data and Datalake and accessible from Hive Tables. Function can be invoked via UI or via API for Data sync-up.

Function Requirements:

  • Incremental Data Sync:

    • CDC Tracking: Utilize Change Data Capture (CDC) mechanisms to track changes in Reference Data.

    • Delta Load: Identify delta changes and efficiently load them into Datalake and accessible from Hive

    • Checkpoint Management: Maintain a checkpoint to ensure that only new or modified data is synchronized.

  • Full Data Sync:

    • Data Extraction: Extract the entire Reference Data dataset.

    • Data Loading: Load the extracted data into Datalake and accessible from Hive Tables, potentially optimizing the loading process for large datasets.

  • Error Handling: Implement robust error handling mechanisms to prevent data loss or corruption during the synchronization process.

  • Security: Ensure data security and access controls during the synchronization process.

Benefits of Automation:

  • Efficiency: Reduce manual effort and time spent on data synchronization.

  • Accuracy: Minimize the risk of errors during data transfer and updates.

  • Consistency: Ensure consistent data synchronization, reducing the likelihood of inconsistencies between Reference Data and Datalake and accessible from Hive

  • Scalability: Handle synchronization for large datasets and complex scenarios.

  • Real-time Updates: Enable near real-time data updates in Datalake and accessible from Hive Tables based on changes in Reference Data.

Needed By Quarter
  • Admin
    Michal Szylar
    Reply
    |
    Dec 4, 2024

    Hello, could you please confirm it is about uni-directional synchronisation from IBM Knowledge Catalog to HIVE? Best regards