Database Recovery is probably the most critical area for DBAs and Application Support teams. As important as new features and performance are, Recovery is absolutely crucial – Production data loss is unacceptable, long database outages are unacceptable, and in recovery situations databases have to be recovered quickly. We have a lot of critical applications that use DB2 z/OS with millions of customers accessing databases 24 X 7 X 365. Business units can’t tolerate down time without serious consequences, so database recoveries have to run quickly; failure to recover data quickly may result in financial loss, reputational impact, legal consequences, etc.
Database recovery should be simple and quick. This is what business demands from us, and this is actually what business is used to – we have thousands of databases in MS SQL Server, DB2 LUW and Oracle environments, and recoveries are done on a database level in a very fast and simple way. DB2 on z/OS team chose a different approach, and it doesn’t meet our business needs anymore.
Application teams want to be able to use database-level recovery in backout situations. They plan and test their implementations, but complex implementations sometimes have to be backed out. Recovery of the database to the point-in-time prior to implementation is often the most acceptable, fastest way to backout failed implementation, and this is the method widely used by applications that use MS SQL Server, DB2 LUW and Oracle. Some of these implementations include database schema change, so we require a quick and simple way to restore database objects and data to the point-in-time prior to the implementation.
DB2 z/OS doesn’t offer BACKUP DATABASE / RECOVER DATABASE utilities. The largest recovery unit is a tablespace.
Complex implementations, in addition to code changes, often include database changes (creation of new objects, alteration and deletion of existing objects), and data changes (batch updates that change a lot of data).
Recovery of unchanged objects can be done with RECOVER utility. DBAs prepare multi-step jobs that take tablespaces offline (in utility mode), RECOVER tablespaces/partitions, followed by REBUILD index utility step for each index, followed by CHECK DATA, CHECK LOB (if necessary), and then bring tablespaces online.
Recoveries of the altered DB2 objects are more complex. When we change one of the existing objects (and the change can be simple – addition of a new column, expansion of existing columns, data type change), we cannot use pre-implementation Image Copy dataset for recovery. So we have to save old definitions, unload data prior to implementation, and in case of a backout situation, we drop/recreate all these objects, reload data, run CHECK DATA and CHECK LOB utilities for RI and LOB constraints, grant privileges, rebind packages, etc.
Recovery of the dropped objects requires another approach. RECOVER utility cannot be used; we have option of using DSN1COPY utility and translate object numbers (so we manually save and store PSID and OBID of the objects that we drop), but that is too slow and too manual. Since we cannot use RECOVER utility to recover dropped tablespaces from the last copy, we often unload them prior to implementation, recreate them with all dependent objects, reload them with data, run CHECK DATA and CHECK LOB utilities for RI and LOB constraints, grant privileges, rebind packages, etc.
These recovery scenarios are not simple and fast.
Even when there are no database schema changes and RECOVER utility can be used, it recovers individual tablespaces even when we group them together into one list with LISTDEF utility. Application databases often include referential integrity constraints, LOBs, etc. These constraints need to be resolved after recovery. In DB2 z/OS (and only in DB2 z/OS) environments, we have to follow actual recovery steps with CHECK DATA and CHECK LOB steps in the right sequence, which take additional time that results in longer outages for application databases. We currently have Production databases with hundreds of RI constrains, and we need to add CHECK DATA steps in the right sequence to our recovery jobs.
DB2 z/OS RECOVER utility requires a LOGPOINT value for Point-in-time recovery. We understand that LOGPOINT is a translation of the Timestamp into a format that is used by DB2 Transaction Logs. We think DB2 should be able to do this translation, it should not be done manually.
RECOVERY SCENARIO THAT MEETS MOST BUSINESS RECOVERY AND BACKOUT REQUIREMENTS
Recovery has to be intuitive and simple. It should be easy to initiate (accept the timestamp as input, preferably in local time zone), have as little manual steps as possible, and mostly automated .When recovery is initiated, DB2 should identify all objects that should be recovered, put databases/tablespaces in utility mode, recover data, rebuild indexes, resolve RI and LOB constraints, undo uncommitted transactions, and bring database online.
Other RDBMSs do exactly that. IBM’s own DB2 LUW takes the timestamp and recovers full database, resolving all referential constraints, reversing in-flight transactions, leaving database in the consistent, usable state. It is quite good, it is fast, it is hard to make mistakes with this type of recovery. It takes seconds to initiate, and doesn’t require any additional work.
Here is how DB2 Backup and Recovery works in DB2 LUW. I specifically chose another IBM product, but I can provide equally good samples for MS SQL Server:
db2 backup database D2PA online to c:\database\db2
db2 restore database D2PA from c:\database\db2 taken at 20210507100444 without rolling forward without prompting
For PIT recoveries:
db2 rollforward database D2PA to 2021-05-08-19.20.52 and complete
For full recoveries:
db2 rollforward database D2PA to end of logs and complete
This is what we need in DB2 z/OS, we can use it for most recovery scenarios – full recoveries, recoveries to the last backup, PIT recoveries.
We would like to ask you to add two additional utilities to the suite of utilities that are included in DB2 z/OS product: BACKUP DATABASE and RECOVER DATABASE. Alternatively, you can add additional options to the existing COPY and RECOVER utilities to enable database-level backup and recovery.
We are not asking for a suite of additional tools that we have to acquire and maintain separately, and retest with every upgrade (like DB2 Recovery Expert). These new recovery options should be added to the existing list of utilities (in other words, to database engine), and they should be easy to use.
BACKUP DATABASE should be a simple utility, DB2 knows which tablespaces belong to each database and it should not be hard to create one backup file for all of them. For online backups taken with SHRLEVEL CHANGE, it should be easy to include Transaction Log details into backups datasets, to create a point of consistency and decrease dependence on DB2 Transaction Logs during recovery.
RECOVER DATABASE is definitely more complex, however we don’t need a lot of bells and whistles, we currently have plenty of options with recovery of tablespaces, indexes, page sets, etc. A simple database recovery control statement that indicates a timestamp, similar to the sample that I provided (above) should be sufficient. DB2 can do the rest – figure-out LOGPOINT, put database objects into utility mode, recover database structure and data, apply logs, reverse uncommitted transactions, bring database online.
We do understand that architecture of DB2 z/OS is different, and that metadata (DB2 Catalog) is shared between multiple databases. It is probably easier to recover database in DB2 LUW because it has its own set of metadata, so you can recover DB2 catalog when you recover databases. However, DDL changes can be recorded in DB2 catalog in the way that it can be used for recovery, and the info needed for recovery can be stored in in DB2 Transaction logs. Considering the fact that DB2 Recovery Expert provides option for recovery of a dropped object, I think most of pieces are already in place.
DB2 Recovery Expert and other recovery tools can simplify recovery to some degree. They typically take care of the LOGPOINT translation, and they help us figuring out RI constraints and put CHECK DATA commands in the right order. It helps, but it is not good enough, it is still too manual and too involving. In cases when we want to recover full database, we would like to avoid creating and running a job that creates a complex multi-step recovery job for us. So tools like DB2 Recovery Expert can be useful in some cases, but full recovery of the database should be simpler than that.
Current suite of recovery solutions offered by DB2 z/OS doesn’t provide database recovery option that meets our business needs. We would like you to pass our requirement to your team that works on database backup/recovery utilities. We believe that this functionality can add value to Db2 z/OS in the long run, solidify its place as the premier database solution for large organizations, and help us to keep Applications using DB2 z/OS vs converting to other database technologies.
We strongly believe that RBC is not the only DB2 z/OS client that faces similar challenges with database recoveries. If there is a way to poll other DB2 z/OS shops and ask them to rate the importance of a simple database recovery solution that is similar to what other RDBMSs provide, I am quite sure they will support this technical requirement.
Do not place IBM confidential, company confidential, or personal information into any field.