Provide means to limit number of retries following unable to establish connection

This idea is related to real life scenario we encountered in production - we raised a case but we were advised CDC is working as designed and we should raise an AHA idea.

Currently we have a number of subscriptions defined/active across a number of our production z-Series sysplexes . The subscriptions are marked as persistent and this works well for us however in the scenario where a subscription encounters an issue attempting to reestablish connection to the target(in our case CDC kafka on Linux) CDC will queue retries but with no limit on the number of retries.

CDC issued warning messages CHC6453W/CHC9694W and queued a restart attempt after n minutes(in our case 5), however the number of retries is unlimited and nothing is written to CDC JESMSGLG - only to CHCPRINT.

Regardless of the root cause(which in our case required a restart of the target instance) we expect CDC on the source side to not queue an unlimited number of retries with no escalation to a hard error at some point and a message written to JESMSGLG.

We were advised there is no such capability for subscriptions marked persistent.

We made reference to handling of decompression errors where the is capability for subscriptions marked as persistent to have a finite number of retries following a log read error based on setting, for example, ONDECOMPRESSIONERROR=(60,STOP).

We accept this is not quite the same as an issue attempting to reestablish connection to a target instance but nonetheless we feel CDC on the source side should not be initiating a unlimited number of retries for any issue without at least issuing an error message at some point to JESMSGLG and giving up - this would permit customers to code an message rule for automation to alert the relevant support team to review and take the required actions.

As it stands there is an exposure that can result in replicating being down for many hours with no notification.

We are progressing heath check scripts as a backup to automated altering on specific CHC messages written to JESMSGLG but would greatly appreciate a correction to current logic of unlimited of retries.

Thanks,

Howard

Needed By

Yesterday (Let's go already!)

Post comment

Guest

Aug 26, 2021

Thanks for raising this IDEA. We have reviewed the suggestion and agree this is a valid requirement . It is currently in development and targetted for Q3 - '21 release.

Reply
Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Provide means to limit number of retries following unable to establish connection