Enhancing DataStage Monitoring and Debugging Capabilities

One of the critical aspects of monitoring, tuning, or debugging individual jobs or an entire DataStage project is to clearly understand which database system activities are triggered by which stage in a specific DataStage job within a particular DataStage project. Currently, there is no functionality to view, at the project, job, or stage level, the number of connections opened against a specific database system by DataStage, or the current status of each of these connections from a DataStage perspective (such as waiting for a DB response, processing, waiting for input data, idle, etc.).

Databases typically maintain a unique identifier for each session. As a first step, we propose logging this identifier to the DataStage job log at the start and end of each session. This would provide a clear picture of when each connection is initiated and terminated, allowing for more effective monitoring and debugging.

Furthermore, it would be beneficial to log the total waiting time from a DataStage perspective at the end of each session. This information would provide valuable insights into potential bottlenecks or inefficiencies in the system, enabling optimizations and improvements to be made. This enhancement would greatly improve the transparency and manageability of DataStage projects, leading to more efficient and reliable data processing operations.

Needed By

Quarter

Post comment

Guest

May 10, 2024

Oracle allows for a client session identifier (CLIENT_IDENTIFIER) which can be set at the start of the session. Currently one could explicitly set the identifier in the conductor (before-sql) and player processes (before-sql (node)) like this
call dbms_session.set_identifier('#DSProjectName#_#DSJobName#.#DSJobInvocationId#_#DSStageName#');
This requires manually setting it up in each stage though. It would be beneficial to have an option to set this automatically. With it one can trace the execution from within the database system.
Oracle is just an example here. Other database systems provide similiar capabilities, like e.g. Teradata supports Query Bands, which can also be set from the Datastage job / stage.
I think we would need both: logging database sessions with the identifier in DataStage and setting the client identifier.

Reply
Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Enhancing DataStage Monitoring and Debugging Capabilities