Skip to Main Content
IBM Data and AI Ideas Portal for Customers

This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (

Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea

Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal ( - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal ( - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM. - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Data & AI Roadmaps ( - Use this site to view roadmaps for Data & AI products.

IBM Employees should enter Ideas at

Status Future consideration
Created by Guest
Created on Oct 21, 2022

File inclusion and file exclusion filtering in crawler

The Dutch Ministry of Defense is using Watson Discovery for text document archiving purposes as well as text document analysis from these archives. They have experience with Watson Explorer Content Analytics and are migrating to Watson Discovery. They are facing serious issues with file system crawling and the inflexibility of the Watson Discovery file system crawler. The Ministry of Defense's analytics pipeline heavily relies on the filtering options available in WEX-AC (the include and exclude filtering). What they want in Watson Discovery is the ability to provide a file extension inclusion list (see 2_Crawler_inclusion_filter), just like they know from the Watson Explorer Content Analytics file system crawler. In essence, the inclusion list directs the crawler only to crawler files with a pre-defined set of extensions, while leaving the rest of the data it may encounter untouched. That would enable them to point the crawler to their data sources without the need of cleaning them prior to indexing. The reality with WD is different. What they face with WD is that they have to do tedious, time-consuming and labour-intensive data source cleaning and/or operations before they can ingest data from the sources. We've tried to point the WD file system crawler to these data sources, but we end up in 'black holes' of directories with filetypes that a) we don't want to index, b)time consuming operations to go through folders and zip-files containing non-supported file types. And actually, with the amounts of data the Defense Ministry wants to analyze, the systems becomes very unstable and unpredictable. The bad message is that because of the file system crawling issues experienced, the Ministry of Defense is contemplating to migrate backwards to WEX-AC.
Needed By Yesterday (Let's go already!)