Enhancement for DB2 z/OS database backup/recovery

BUSINESS REQUIREMENT

Database Recovery is probably the most critical area for DBAs and Application Support teams. As important as new features and performance are, Recovery is absolutely crucial – Production data loss is unacceptable, long database outages are unacceptable, and in recovery situations databases have to be recovered quickly. We have a lot of critical applications that use DB2 z/OS with millions of customers accessing databases 24 X 7 X 365. Business units can’t tolerate down time without serious consequences, so database recoveries have to run quickly; failure to recover data quickly may result in financial loss, reputational impact, legal consequences, etc.

Database recovery should be simple and quick. This is what business demands from us, and this is actually what business is used to – we have thousands of databases in MS SQL Server, DB2 LUW and Oracle environments, and recoveries are done on a database level in a very fast and simple way. DB2 on z/OS team chose a different approach, and it doesn’t meet our business needs anymore.

Application teams want to be able to use database-level recovery in backout situations. They plan and test their implementations, but complex implementations sometimes have to be backed out. Recovery of the database to the point-in-time prior to implementation is often the most acceptable, fastest way to backout failed implementation, and this is the method widely used by applications that use MS SQL Server, DB2 LUW and Oracle. Some of these implementations include database schema change, so we require a quick and simple way to restore database objects and data to the point-in-time prior to the implementation.

PROBLEM

DB2 z/OS doesn’t offer BACKUP DATABASE / RECOVER DATABASE utilities. The largest recovery unit is a tablespace.

Complex implementations, in addition to code changes, often include database changes (creation of new objects, alteration and deletion of existing objects), and data changes (batch updates that change a lot of data).

Recovery of unchanged objects can be done with RECOVER utility. DBAs prepare multi-step jobs that take tablespaces offline (in utility mode), RECOVER tablespaces/partitions, followed by REBUILD index utility step for each index, followed by CHECK DATA, CHECK LOB (if necessary), and then bring tablespaces online.
Recoveries of the altered DB2 objects are more complex. When we change one of the existing objects (and the change can be simple – addition of a new column, expansion of existing columns, data type change), we cannot use pre-implementation Image Copy dataset for recovery. So we have to save old definitions, unload data prior to implementation, and in case of a backout situation, we drop/recreate all these objects, reload data, run CHECK DATA and CHECK LOB utilities for RI and LOB constraints, grant privileges, rebind packages, etc.
Recovery of the dropped objects requires another approach. RECOVER utility cannot be used; we have option of using DSN1COPY utility and translate object numbers (so we manually save and store PSID and OBID of the objects that we drop), but that is too slow and too manual. Since we cannot use RECOVER utility to recover dropped tablespaces from the last copy, we often unload them prior to implementation, recreate them with all dependent objects, reload them with data, run CHECK DATA and CHECK LOB utilities for RI and LOB constraints, grant privileges, rebind packages, etc.

These recovery scenarios are not simple and fast.

Even when there are no database schema changes and RECOVER utility can be used, it recovers individual tablespaces even when we group them together into one list with LISTDEF utility. Application databases often include referential integrity constraints, LOBs, etc. These constraints need to be resolved after recovery. In DB2 z/OS (and only in DB2 z/OS) environments, we have to follow actual recovery steps with CHECK DATA and CHECK LOB steps in the right sequence, which take additional time that results in longer outages for application databases. We currently have Production databases with hundreds of RI constrains, and we need to add CHECK DATA steps in the right sequence to our recovery jobs.

DB2 z/OS RECOVER utility requires a LOGPOINT value for Point-in-time recovery. We understand that LOGPOINT is a translation of the Timestamp into a format that is used by DB2 Transaction Logs. We think DB2 should be able to do this translation, it should not be done manually.

RECOVERY SCENARIO THAT MEETS MOST BUSINESS RECOVERY AND BACKOUT REQUIREMENTS

Recovery has to be intuitive and simple. It should be easy to initiate (accept the timestamp as input, preferably in local time zone), have as little manual steps as possible, and mostly automated .When recovery is initiated, DB2 should identify all objects that should be recovered, put databases/tablespaces in utility mode, recover data, rebuild indexes, resolve RI and LOB constraints, undo uncommitted transactions, and bring database online.

Other RDBMSs do exactly that. IBM’s own DB2 LUW takes the timestamp and recovers full database, resolving all referential constraints, reversing in-flight transactions, leaving database in the consistent, usable state. It is quite good, it is fast, it is hard to make mistakes with this type of recovery. It takes seconds to initiate, and doesn’t require any additional work.

Here is how DB2 Backup and Recovery works in DB2 LUW. I specifically chose another IBM product, but I can provide equally good samples for MS SQL Server:

db2 backup database D2PA online to c:\database\db2

db2 restore database D2PA from c:\database\db2 taken at 20210507100444 without rolling forward without prompting

For PIT recoveries:

db2 rollforward database D2PA to 2021-05-08-19.20.52 and complete

For full recoveries:

db2 rollforward database D2PA to end of logs and complete

This is what we need in DB2 z/OS, we can use it for most recovery scenarios – full recoveries, recoveries to the last backup, PIT recoveries.

SOLUTION

We would like to ask you to add two additional utilities to the suite of utilities that are included in DB2 z/OS product: BACKUP DATABASE and RECOVER DATABASE. Alternatively, you can add additional options to the existing COPY and RECOVER utilities to enable database-level backup and recovery.

We are not asking for a suite of additional tools that we have to acquire and maintain separately, and retest with every upgrade (like DB2 Recovery Expert). These new recovery options should be added to the existing list of utilities (in other words, to database engine), and they should be easy to use.

BACKUP DATABASE should be a simple utility, DB2 knows which tablespaces belong to each database and it should not be hard to create one backup file for all of them. For online backups taken with SHRLEVEL CHANGE, it should be easy to include Transaction Log details into backups datasets, to create a point of consistency and decrease dependence on DB2 Transaction Logs during recovery.

RECOVER DATABASE is definitely more complex, however we don’t need a lot of bells and whistles, we currently have plenty of options with recovery of tablespaces, indexes, page sets, etc. A simple database recovery control statement that indicates a timestamp, similar to the sample that I provided (above) should be sufficient. DB2 can do the rest – figure-out LOGPOINT, put database objects into utility mode, recover database structure and data, apply logs, reverse uncommitted transactions, bring database online.

CONSIDERATIONS

We do understand that architecture of DB2 z/OS is different, and that metadata (DB2 Catalog) is shared between multiple databases. It is probably easier to recover database in DB2 LUW because it has its own set of metadata, so you can recover DB2 catalog when you recover databases. However, DDL changes can be recorded in DB2 catalog in the way that it can be used for recovery, and the info needed for recovery can be stored in in DB2 Transaction logs. Considering the fact that DB2 Recovery Expert provides option for recovery of a dropped object, I think most of pieces are already in place.

DB2 Recovery Expert and other recovery tools can simplify recovery to some degree. They typically take care of the LOGPOINT translation, and they help us figuring out RI constraints and put CHECK DATA commands in the right order. It helps, but it is not good enough, it is still too manual and too involving. In cases when we want to recover full database, we would like to avoid creating and running a job that creates a complex multi-step recovery job for us. So tools like DB2 Recovery Expert can be useful in some cases, but full recovery of the database should be simpler than that.

CONCLUSION

Current suite of recovery solutions offered by DB2 z/OS doesn’t provide database recovery option that meets our business needs. We would like you to pass our requirement to your team that works on database backup/recovery utilities. We believe that this functionality can add value to Db2 z/OS in the long run, solidify its place as the premier database solution for large organizations, and help us to keep Applications using DB2 z/OS vs converting to other database technologies.

We strongly believe that RBC is not the only DB2 z/OS client that faces similar challenges with database recoveries. If there is a way to poll other DB2 z/OS shops and ask them to rate the importance of a simple database recovery solution that is similar to what other RDBMSs provide, I am quite sure they will support this technical requirement.

Needed By

Week

Post comment

Guest

Reply
| Oct 22, 2021

Thank you for submitting this enhancement request. While we view all the enhancement requests we receive as valuable and we would like to implement all of them, in an effort to have maximum transparency we are closing the ones that don’t currently appear on our two year product roadmap. This is not to say that these enhancements may not still be implemented at some point in the future, as we constantly evaluate our requirements on the basis of customer demand and technical impact and try to deliver maximum value to our customers. We appreciate your feedback and would be happy to discuss your enhancement requests further.

0 reply Hide replies

Guest

Reply
| Aug 19, 2021

So it looks like Haakon understands the importance of database recovery, however he thinks is should be handled by tools, not by the DB2.

This is not optimal, as it requires us to buy and support additional tools, upgrade and retest them, etc. Additionally, it is much easier to have a utility job handy in Production and just change a few simple parameters (like database name and timestamp) than use tools to generate new recovery jobs every time we need it. The easier and faster it is, the better. Tools can provide additional bells and whistles, and maybe these additional features will have a lot of value for some applications; but basic, bread-and-butter database recovery should be straightforward, simple, easy to use and intuitive.

We do undersand the fact that DB2 catalog is shared and that it adds additional complexity to recoveries. We are asking for a significant change that is not easy, but not impossible. We think it is time that DB2 z/OS provides this functionality, the environement is such that recoveries should be done fast. Down time is not tolerated anymore. Every other RDBMS provides this functionality, so Application teams expect it.

There must be a few ways to deal with it. For example, BACKUP DATABASE utility can actually store metadata together with data, so that RECOVER DATABASE command can use it for recovery. Also, DB2 Catalog stores a lot of historical information, so it is not really hard to store information about database object changes. DB2 knows about relationships between DB2 objects (hence it can take care of RI constraints at recovery time), it knows which indexes belong to tables that are bing recovered (hence it can rebuild these indexes), and it knows at which point the changes to database objects were done (it can be stored in DB2 Transaction Log and DB2 Catalog), so it can undo these changes.

The message that we are trying to send is that we need a simple recovery of the database. If application requires a complex ‘Application-level’ recovery – fine, App teams and DBAs can buy and use tools for it. Our requirement is totally different, it is a simple recovery statement that DB2 can process and get all the info from the catalog and recover all database tablespaces to PIT or end of logs.

0 reply Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Enhancement for DB2 z/OS database backup/recovery