Skip to Main Content
IBM Data and AI Ideas Portal for Customers

This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (

Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea

Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal ( - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal ( - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM. - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Data & AI Roadmaps ( - Use this site to view roadmaps for Data & AI products.

IBM Employees should enter Ideas at

Status Future consideration
Workspace Connectivity
Created by Guest
Created on Aug 16, 2021

In DataStage, wildcards in the Azure Data Lake Storage connector are only allowed the filename, not in the filepath

In the Azure Data Lake Storage connector DataStage can read multiple files at once by specifying wildcards. But these wildcards are only allowed in the filename, not in the filepath. This means that all the files we want to read have to be present in the same directory.

In a data lake files are often distributed in folders and subfolders to improve performance, by limiting the number of files to be read or allowing parallel processing.

As an example, let’s assume we are measuring the air quality at hundreds of measurement points. In the data lake the folder structure that has been set up and the filenames looks like this: /gold/air/<measurement point>/<year>/<pollutant>/<month><day>.parquet

In the same directory we have 365 (or 366 for leap years) files with de measurement of 1 pollutant in 1 year at 1 measurement point.

Some folders:

  • /gold/air/Antwerp/2019/NH3

  • /gold/air/Antwerp/2019/CO2

  • /gold/air/Antwerp/2020/NH3

  • /gold/air/Antwerp/2020/CO2

  • /gold/air/Ghent/2019/NH3

  • /gold/air/Ghent/2019/CO2

  • /gold/air/Ghent/2020/NH3

  • /gold/air/Ghent/2020/CO2

Depending on the question asked I will select the files based on the measurement point, the year, the pollutant, the month and/or the day of the month

To get the concentration values of Ammoniac in July (of all available years) I will use the wildcard “/gold/air/*/*/NH3/07*.parquet”, and for all pollutants in Antwerp in 2019 “/gold/air/Antwerp/2019/*/*.parquet”.

Fast and easy.

To achieve the same functionality with the wildcards only on the filename we would have to locate a huge number (more than 1.000.000) of files in one directory.

Needed By Quarter