This portal is to open public enhancement requests against products and services offered by the IBM Data Platform organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).
Shape the future of IBM!
We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Search existing ideas
Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,
Post your ideas
Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,
Post an idea
Upvote ideas that matter most to you
Get feedback from the IBM team to refine your idea
Specific links you will want to bookmark for future use
Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.
IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.
ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.
IBM Employees should enter Ideas at https://ideas.ibm.com
Hi Satish,
a small correction, the expected namespace value pattern is
file://(.+), meaning you need to append something after the slashes. For the test, for example,file://localhostwould work.Best, Jakub
Hi Satish,
to follow up on today's meeting .. I recommend trying out how the solution is going to behave when we interpret the Datasets as files (case #2 described above). You can do this test on a very small sample of several OpenLineage events, and if you are satisfied with the outcomes we can discuss next steps.
In your OpenLineage payloads, change the namespace of the inputs/outputs from
filetofile://Scan the OpenLineage events by Manta (please make sure this isn't mixed with what you scanned before, so that you can observe the "new" behaviour without too much noise)
Observe that the Datasets are now represented as files in Manta - under Filesystem Resource
Create a Virtual Data Source in Alation
Map the Filesystem to the Alation VDS as documented here - use "true" for the "create assets" option
Run the export and upload to Alation and observer lineage in Alation
Best,
Jakub
Hi Sathis,
there is additional complexity I was not aware of last week. For context:
OpenLineage Data Sets can be in Manta represented in 3 different ways, depending on whether they are "mapped":
They are mapped to assets (tables, views) under other scanned technologies -> this case is already supported, the export for these should already be supported in cases when we support the export of that technology, and we can map the lineage to assets created by Alation connectors
They are represented as files (if they have
filenamespace prefix) -> this case is already supported, it is possible to export files into a Filesystem or VDS in Alation (however, it is unlikely that we will be able to match the assets created by the Alation OpenLineage connector in this case - I'm still exploring this)They are represented as generic Data Sets as a fallback; we cannot export these at the moment. Hierarchy in Manta (how your events are interpreted now):
OpenLineage namespace
Job
Data Set (akka "write task")
ColumnFlow
"Datasets" (Job)
Data Set (actual Data Set)
ColumnFlow
With engineering, I verified that the example payloads you shared are incorrectly interpreted as the 3rd case, instead of the 2nd case. That's something that we should be able to fix quickly, if you confirm this is a good direction (but as mentioned above, we would probably not be able to match the assets created by Alation in this case, you would need to use Manta to create the file Assets in Alation - details below).
Expected hierarchy for files created in Alation by Manta:
- <filesyste_id/virtual_datasource_id>/opt>mdp>td>spark-warehouse>CORP_CLDR
Hierarchy in Alation created by the custom Alation connector for OpenLineage
- <datasource_id>/COMN_PKG(OpenLineage namespace of the Job as schema)>CORP_CLDR
I would also like to confirm if the inputs and outputs have the
filenamespace in all the cases that are now problematic.Thank you, Jakub
Hi Satish,
documenting what we reviewed on today's meeting.
- The custom OCF connector creates assets for individual files (represented as inputs/outputs in the OpenLineage payloads) and their columns
- Unclear what Alation data source type is used - please confirm
- The Schema within the Alation data source corresponds to the OpenLineage namespace of the Job
- The Asset name is the name of the File (name field on the input/output in the OpenLineage event). As the name in the OpenLineage event contains a full path to the file, only the last segment of that name needs to be parsed and used as the name of the assets in Alation.
- The column name corresponds to the column name in the OpenLineage event (documented in the schema facet on the input/output)
I would like to ask you to:
- verify the statements above
- provide the type of the Alation data source used
- provide information on whether this naming convention is Dell-specific (and potentially subject to change) or not.
Thanks, Jakub
Hi Satish,
Thank you for creating the Idea. The high-level approach we discussed on the call was for Manta to have a configurable behaviour based on which:
Manta would create assets in an Alation virtual data source for OpenLineage datasets (and export the lineage for them). (This is only relevant for OpenLineage datasets that are not represented as tables under one of the technologies Manta can scan; for those, the solution would not change.)
Manta would only export the lineage and reference assets created by the Alation custom connector
This is very similar to what we currently support for SAS Tables and Views - you can review the documentation for that functionality here to verify it is in line with your expectations - https://www.ibm.com/docs/en/manta-data-lineage?topic=mfea-manta-flow-alation-automatic-data-lineage-alation-resource-mapping#sas-resource-mapping-config.
Thank you for the examples sent over email, however, I cannot access the links. Would it be possible to attach the example OpenLineage payloads to this Idea? Together with that, could you please provide the list of Assets, their names, and IDs that the Alation custom connector creates? (For option #2 we need to match the IDs of the assets for the stitching to work.)
Thank you, Jakub