Hello IBM Db2 development team,
could you please pay attention to the following observation and idea for an improvement from one of our customers (Deutsche Bank AG):
Background:
In a world where IBM have been continuing to make the mainframe an attractive platform in terms of cost of ownership they have introduced a feature that often does the opposite.
I really feel the way they implemented DPSI (Data Partitioned) indexes has created so much potential for negative performance impacts. I am seeing it all the time. I think this implementation requires a re-think.
For example, I often see the keyword PARTITIONED replicated through multiple indexes on tables by mistake without any true understanding of the effect or even knowledge that it has actually happened.
I also see partitioned tables with 1 index, the clustering index that is defined as partitioned even though the columns are not leading on the partitioning columns.
Now of course today with table controlled partitioning there is complete independence from partitioning and clustering etc.
I do however think the subject is so complex in itself that many people do not understand it and often the keyword PARTITIONED gets replicated across indexes by mistake.
This creates 2 specific problems if the index is a DPSI instead of a traditional partitioning or Non-partitioned index when the partition columns are not a matching part of the query predicates:
For normal index lookups, it creates multiple traverses through the index tree that are completely invisible in the plan table.
For joins, DB2 cannot do multiple parses and so what should be a matching index access turns to non-matching indexes access causing real performance headaches.
There are not many very highly specialised DB2 experts that understand all of these intricate details. Not everyone is always up-to-date with firstly the differences and secondly how to understand and find these performance problems.
I really think DPSI indexes only have a very small set of use-cases where the positive impacts outweigh the negative ones. Honestly, I see virtually no real use practical use cases for them so in my opinion people should either specify them specifically DATA PARTITIONED or they should be warned that the type of idex being created is not partitioned it is Data Partitioned, that would make them think twice.
Suggestion:
I really suggest that to make this safer there should be a DDL specification difference where you clearly specify DATA PARTITIONED instead of just PARTITIONED.
I think if the PARTITIONED clause is used and the index is DPSI that at least a warning should be issued to say “Be careful this index will be data partitioned are you really sure”.
The long term approach should be that PARTITIONED should not accepted for DPSI, only for partitioning indexes, so 2 DDL specifications.
Does that makes sense?
In conclusion, in my experience Data Partitioned Indexes create many more performance problems than the small number of things they solve. I have witnessed a lot of invisible difficult to detect performance problems and even ones where statements are costing more that they should and they are just left so. It is a little cancer in DDL that is quite quickly being replicated through systems by a series of easy to make database admin mistakes that keeps growing.
There needs to be a differentiation between PARTITIONED and DATA PARTITIONED.
I really think it is an important enhancement.
Best Regards,
-Martin Blazik, Kyndryl Client Center, Brno, CZ
Dear Martin, Thank you for submitting this enhancement request. The Db2 for z/OS team reviewed this and have provided the following feedback:
The implications of using DPSIs are documented. Changing existing DDL syntax, or disallowing existing syntax, is problematic due to implications for existing processes. There are no plans to restrict or provide warnings regarding the creation of DPSIs at this time.
Sincerely,
The Db2 for z/OS Team