Adding of a new apply parameter DBLOAD_OPTIONS in order to influence an automatic load performed by the Q-Replication apply program on column-organized tables.

A column compression dictionary is used to compress data in a column of a column-organized table.
When you load data into a column-organized table, the first phase is the analyze phase, which is unique to column-organized tables.
The analyze phase occurs only if column compression dictionaries must be built, which happens during a LOAD REPLACE operation, a LOAD REPLACE RESETDICTIONARY operation, a LOAD REPLACE RESETDICTIONARYONLY operation, or a LOAD INSERT operation (if the column-organized table is empty). The load utility analyzes the input data to determine the best encoding schemes for building column compression dictionaries.

DB2 V11.1 introduces two new MODIFIED BY parameter to influence the analyze phase, maxanalyzesize=x and cdeanalyzefrequency=x
maxanalyzesize=x
x is the size that has a value of <Number><Megabytes|Gigabytes>. The default size is 128 GB. maxanalyzesize controls how much data is sampled in the ANALYZE phase to produce compression dictionary.
In a massively parallel processing (MPP), the sampling size is not aggregated across members. The ANALYZE phase is stopped when first member reaches max.
cdeanalyzefrequency=x
x is an integer between 0 - 99 inclusive. This value controls how much data is sampled in the ANALYZE phase to produce a compression dictionary. In a massively parallel processing (MPP), the sampling size is not aggregated across members. The ANALYZE phase is stopped when the first member reaches max.

The analyze phase on column-organized tables could be very long and expensive especially if the source tables/data are huge, for example billion of records.
There is a new option for the Q-Replication apply program required to add LOAD options in particular to influence the LOAD for column-organized target tables.
For example, perform a LOAD REPLACE operation or a LOAD REPLACE RESETDICTIONARY or just create only the dictionary without to load the data, LOAD REPLACE RESETDICTIONARYONLY.
Furthermore is also the option how many data should be analyzed or to define a percentage of the source data required.
Also is extra temporal storage required at the target server during the analyze phase in order to build the histogram and dictionary for the column-organized target tables.
Required is a new option for the Q-Replication apply program to do an automatic load with a new apply Parameter, e.g. DBLOAD_OPTIONS.
This new LOAD option should be configure able for each subscription.

At the moment exist only the way to do a manual load for a Q-replication column-organized target table to influence the analyze phase.
Especially huge source tables can be only load in a manual way into column-organized table because there is no option available for the automatic load to define
LOAD options.
However an automatic load called by the Q-Replication apply program with a free defined list of LOAD options can run without a DBA administrator attention whereas a manual load require more effort and attention.

Post comment

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Adding of a new apply parameter DBLOAD_OPTIONS in order to influence an automatic load performed by the Q-Replication apply program on column-organized tables.