We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Post your ideas
Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,
Post an idea
Upvote ideas that matter most to you
Get feedback from the IBM team to refine your idea
Help IBM prioritize your ideas and requests
The IBM team may need your help to refine the ideas so they may ask for more information or feedback. The product management team will then decide if they can begin working on your idea. If they can start during the next development cycle, they will put the idea on the priority list. Each team at IBM works on a different schedule, where some ideas can be implemented right away, others may be placed on a different schedule.
Receive notification on the decision
Some ideas can be implemented at IBM, while others may not fit within the development plans for the product. In either case, the team will let you know as soon as possible. In some cases, we may be able to find alternatives for ideas which cannot be implemented in a reasonable time.
Provide means to limit number of retries following unable to establish connection
This idea is related to real life scenario we encountered in production - we raised a case but we were advised CDC is working as designed and we should raise an AHA idea.
Currently we have a number of subscriptions defined/active across a number of our production z-Series sysplexes . The subscriptions are marked as persistent and this works well for us however in the scenario where a subscription encounters an issue attempting to reestablish connection to the target(in our case CDC kafka on Linux) CDC will queue retries but with no limit on the number of retries.
CDC issued warning messages CHC6453W/CHC9694W and queued a restart attempt after n minutes(in our case 5), however the number of retries is unlimited and nothing is written to CDC JESMSGLG - only to CHCPRINT.
Regardless of the root cause(which in our case required a restart of the target instance) we expect CDC on the source side to not queue an unlimited number of retries with no escalation to a hard error at some point and a message written to JESMSGLG.
We were advised there is no such capability for subscriptions marked persistent.
We made reference to handling of decompression errors where the is capability for subscriptions marked as persistent to have a finite number of retries following a log read error based on setting, for example, ONDECOMPRESSIONERROR=(60,STOP).
We accept this is not quite the same as an issue attempting to reestablish connection to a target instance but nonetheless we feel CDC on the source side should not be initiating a unlimited number of retries for any issue without at least issuing an error message at some point to JESMSGLG and giving up - this would permit customers to code an message rule for automation to alert the relevant support team to review and take the required actions.
As it stands there is an exposure that can result in replicating being down for many hours with no notification.
We are progressing heath check scripts as a backup to automated altering on specific CHC messages written to JESMSGLG but would greatly appreciate a correction to current logic of unlimited of retries.
Do not place IBM confidential, company confidential, or personal information into any field.