Skip to Main Content
IBM Data and AI Ideas Portal for Customers


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea

Help IBM prioritize your ideas and requests

The IBM team may need your help to refine the ideas so they may ask for more information or feedback. The product management team will then decide if they can begin working on your idea. If they can start during the next development cycle, they will put the idea on the priority list. Each team at IBM works on a different schedule, where some ideas can be implemented right away, others may be placed on a different schedule.

Receive notification on the decision

Some ideas can be implemented at IBM, while others may not fit within the development plans for the product. In either case, the team will let you know as soon as possible. In some cases, we may be able to find alternatives for ideas which cannot be implemented in a reasonable time.

Additional Information

To view our roadmaps: http://ibm.biz/Data-and-AI-Roadmaps

Reminder: This is not the place to submit defects or support needs, please use normal support channel for these cases

IBM Employees:

The correct URL for entering your ideas is: https://hybridcloudunit-internal.ideas.aha.io


Status Delivered
Created by Guest
Created on Sep 19, 2018

Data Rules - Limit the number of exception output records

• introduce runtime option for logging a configurable "maximum" number of detailed violations, based on the design-time configuration of the particular rule:
•
o -1 is the default, which means no maximum (all detailed violations are logged, so long as the rule is configured at design-time to output the violations)
o 0 would mean that no detailed violations are logged, effectively ignoring any configuration at design-time in the rule that would output the records with violations
o 100 would mean that only 100 detailed violations are logged (eg. probably just be the first 100 violations); after that, any further violations are not recorded
• in all cases, the metrics (ie. percentage passed / failed) are still based on all records, it's just the output of the failed records themselves can be limited
This would give us the maximum flexibility to improve performance fully where we don't need the failed records, while also allowing us to heavily optimise even the scenarios where we do want the failed records (ie. issue remediation) -- even in the latter case, having a table with millions of records is unlikely to be actionable, but being able to review the first 100 or 1000 would allow investigation of the root cause and remediation.

The approach itself would also be very similar to what we can already configure for column analysis in Information Analyzer:
• we can configure the number of "distinct values" recorded from frequency distribution
• we can configure a sampling method: maximum number of records, percentage of overall records, etc
Thee option requested should be availabe for both command line (IAAdmin) as well as rest api (RunTask) way of running dq rules.
  • Guest
    Aug 27, 2020

    Shipped with IA version 11.7.1 FP1

  • Guest
    Mar 13, 2019

    It should be a property of the output definition of the data rule, just like the other properties like "Output distinct rows only", or "Output type", instead of a workspace or runtime property. => next to the field letting the user choose the output type (no record, records passing the rule, records failing the rule, all records), there should just be one field allowing the rule author to set a maximum of outputted output records, and this should be stored in the data rule itself.