Dec 18 2008
One of the new features in DB2 version 9 which got a lot of attention is
certainly 'Row Compression'. While compression algorithms are in place already
in DB2 before version 9 (for instance backup compression, value compression),
row compression is unique for several reasons. But first let's distinguish
between value compression and row compression. Both methods are used to compress
table data and therefore save space in the database. Value compression,
available since version 8.1 of DB2, does so by introducing a new row format
which saves space for certain data types if they are empty or SQL null.
applicability in SAP systems Row compression on the
other hand uses standard data compression algorithms to
compress a row. This includes the creation of a
'dictionary' which defines symbols for particular groups
of bytes which appear frequently in the table data.
The Algorithm of Database DB2
To be more precise, row compression in DB2 uses a
lossless compression technique with a static
dictionary and works on a 'per table' basis. Static here
means that the dictionary, once build, is fixed and will
not change upon INSERTs or UPDATEs. The dictionary must
exists before the table can be compressed. Creation of
the dictionary and compressing the table is done in one
step with an offline table REORG. To create the
dictionary, the algorithm looks for repeating entries
(or patterns) in rows. These entries may span over
several adjacent columns. They are replaced by a symbol
and the symbol is stored together with the repeating
entries in the dictionary. During decompression (which
is completely transparent to the database user) the DBMS
replaces the symbols found in a row with the entries
which are stored in the dictionary for this symbol. This
way, a row gets shorter on disk and more rows will fit
into one page. The dictionary itself is stored together
with the table data and is not accessible from the
outside. Upon access to a compressed table the
dictionary is loaded into (DBHEAP-)memory for faster
decompression. Dictionaries can not exceed the size of
150 KByte, they are usually about 75 KByte in size.
Here are a few other facts about row compression:
- As indexes are used for fast access to records
index data will not be compressed. Also, long and
LOB table data is not compressed.
- Compressed data is represented as it is in the
buffer pool. In other words, the content of the
buffer pool is also compressed if it is related to a
compressed table. This allows for a more effective
memory usage.
- As already mentioned, row compression works on a
per table basis (it can be switched on for single
tables). In case of a multi partition environment,
it works on a 'per table/per partition' basis in the
sense that every compressed table has their own
dictionary on every partition.
How it works in DB2
Assuming you are an admin running a SAP system on DB2
for Linux, UNIX and Windows, let's see what you have to
do to use row compression. I'm using again a SAP system
'Q1I' here to demonstrate the steps. The first step is
to identify tables which are suitable for row
compression. Naturally, the largest tables of the system
should be considered first. In my system Q1I I
identified table '/SAPAPO/MATLOC' as a large table:
db2 => SELECT data_object_l_size, data_object_p_size
FROM TABLE(sysproc.admin_get_tab_info('SAPQ1I','/SAPAPO/MATLOC')) AS X
DATA_OBJECT_L_SIZE DATA_OBJECT_P_SIZE
-------------------- --------------------
2075296 2075296
I used the table function ADMIN_GET_TAB_INFO to
determine the size in KBytes of this table (there are no
long- or LOB-columns in this table). 2075297 KBytes are
2026 MByte which means that some of you will laugh if I
tell you that a 2 GByte table is 'large'. Nevertheless
let's go on with this table in this example. To estimate
the space savings which can be achieved by row
compression, the DB2 command INSPECT can be used with
the new option ROWCOMPESTIMATE:
db2 => INSPECT ROWCOMPESTIMATE TABLE NAME
"/SAPAPO/MATLOC" SCHEMA sapq1i RESULTS KEEP
matlocestimate
DB20000I The INSPECT command completed successfully.
Because of the 'RESULTS KEEP matlocestimate' option I
can find the output of this INSPECT run in the file
matlocestimate in the db2dump directory. As my system
Q1I has two partitions, the partition number (000) is
added to the file name. To read the output it must be
converted first with db2inspf:
db2inspf matlocestimate.000 matlocestimate.000.txt
Here is the result:
DATABASE: Q1I
VERSION : SQL09010
2006-08-31-17.11.22.440352
Action: ROWCOMPESTIMATE TABLE
Schema name: SAPQ1I
Table name: /SAPAPO/MATLOC
Tablespace ID: 5 Object ID: 2861
Result file name: matlocestimate
Table phase start (ID Signed: 2861, Unsigned: 2861;
Tablespace ID: 5) : SAPQ1I./SAPAPO/MATLOC
Data phase start. Object: 2861 Tablespace: 5
Row compression estimate results:
Percentage of pages saved from compression: 87
Percentage of bytes saved from compression: 87
Percentage of rows ineligible for compression due to small row size: 0
Compression dictionary size: 56832 bytes.
Expansion dictionary size: 32768 bytes.
Data phase end.
Table phase end.
Processing has completed. 2006-08-31-17.12.54.592391
The two lines 'Percentage of pages/bytes saved from
compression' tell us here that a space reduction by 87%
is expected from row compression which is certainly a
lot. (I can now admit that I selected this table as it
constitutes a very good showcase for row compression :).
Also, the complete dictionary size for the table (on
this partition) is estimated with 56832 + 32768 = 89600
Bytes = 87,5 KByte. This table is identified as an
rewarding object for row compression.
The next step is to switch on row compression. This
is done with the new addition 'COMPRESS YES' for the
ALTER TABLE statement. Note that this addition is also
available for the CREATE TABLE statement. In both cases
the table will not be compressed with this
statement! Instead it is more that the specified table
is now 'eligible' from compression. The field
'COMPRESSION' in the SYSCAT.TABLES view tells us if
compression is switched on for a table or not.
db2 => SELECT compression FROM syscat.tables WHERE tabname = '/SAPAPO/MATLOC'
COMPRESSION
-----------
N
1 record(s) selected.
db2 => ALTER TABLE "SAPQ1I"."/SAPAPO/MATLOC" COMPRESS YES
DB20000I The SQL command completed successfully.
db2 => SELECT compression FROM syscat.tables WHERE tabname = '/SAPAPO/MATLOC'
COMPRESSION
-----------
R
1 record(s) selected.
The COMPRESSION field in SYSCAT.TABLES can also contain
a 'V' for value compression or an 'B' for both (value
and row compression). To actually compress the table
data and create the dictionary an offline table REORG is
needed:
db2 => REORG TABLE "SAPQ1I"."/SAPAPO/MATLOC" USE PSAPTEMP16 RESETDICTIONARY
DB20000I The REORG command completed successfully.
REORG has the two new options RESETDICTIONARY and
KEEPDICTIONARY. So it is possible to do a reorganization
without running the compress algorithm again by
retaining an existing dictionary. For the exact meaning
of these options please lookup REORG in the DB2 Command
Reference.
What to expect in DB2
How successful was the compression? Let's check:
db2 => SELECT dictionary_size, data_object_l_size,
data_object_p_size
FROM
TABLE(sysproc.admin_get_tab_info('SAPQ1I','/SAPAPO/MATLOC'))
AS X
DICTIONARY_SIZE DATA_OBJECT_L_SIZE
DATA_OBJECT_P_SIZE
-------------------- --------------------
--------------------
89600 273184
273184
With ADMIN_GET_TAB_INFO I can check the size of the
dictionary. Exactly like estimated by INSPECT it's size
is 87.5 KByte. The table now occupies only
(273184*100)/2075296 = 13.1% of the original space on
disk so that indeed 100% - 13% = 87% of the disk space
are saved.
Now you may think that every good thing has a
downside. Do we have to pay for these space savings? It
is very difficult to find a general answer for the costs
associated with row compression. Obviously, we need more
CPU time for decompression. If on the other hand your
database is more I/O bound the savings from less I/O may
(and in most cases will) outperform these additional CPU
cycles. So there is no universal answer.
Otherwise we can examine how the various SQL
statements are affected by row compression. Remember
that the algorithm works with a static dictionary which
is not changed by DML statements issued against a
compressed table. From measurements conducted by our IBM
colleagues here at SAP (thanks!) we found out that there
is nearly no overhead for SELECTs and only a small
overhead for INSERTs. UPDATEs to existing rows need a
bit more attention. If the update changes the table data
in a way that a compression symbol must be expanded to
the original entries the row will to be larger after the
update than before. As a result the row will not fit in
its original location. If there is still free space left
on the data page DB2 will do a page reorg to assemble
the free space and fit the row back in. If there is not
enough free space DB2 will need to create an overflow
record (that means that only a pointer will be left on
the page and the remainder of the row will be stored on
another page). These additional operations will bring
along some additional cost if they occur. You therefore
may see a slight decrease in update performance compared
to an uncompressed table. It very much depends on your
actual data and your update frequency if you see a
decrease in update performance.
Regarding the compression ratio, 87% like in this
example is certainly a very good showcase. For the most
tables which are suitable for row compression a
compression rate around 70% would be more realistic.
Planned support for row compression in SAP systems
Row compression is a very useful feature and it will
be supported by various SAP products.
- Starting today you can transparently enable row
compression for any table in your SAP system through
a manual DBA action. To find suitable candidates for
row compression in your SAP system the new UDF
'INSPECT_TABLE_ROWCOMPESTIMATE' is available. UDF
can be used. The note also describes how to create
DB2 scripts to enable row compression on multiple
tables.
- SAP BI will fully support DB2 row compression
starting with it's next version. Row compression
will become the default for BI fact tables. Support
for existing versions of SAP BI/BW is also available
soon, There will be more information about row
compression in SAP BI posted on SDN soon.
- R3load has the new option '-loadprocedure fast
COMPRESS'. Started with this option it loads part of
the data, does a REORG to create the compression
dictionary and then loads the rest of the data. This
way the table does not grow up to it's full size
before it is compressed.
- The SAP DBA Cockpit (transaction 'DBACOCKPIT')
will fully support row compression starting with the
next release of SAP Netweaver. See below a sneak
preview from one of our development systems. Note
that the development is still ongoing here and there
might be changes!
The 'Single Table Analysis' shows if a row and/or value compression is
switched on for the table. The picture shows the state of our table
after row compression took place.
On the new compression tab you can find all information regarding row
compression. All the things I did above manually (including the INSPECT
run) can be scheduled very easy via the DBA cockpit.
Johannes Heinrich Johannes works as a developer in
the 'DB2 UDB for Unix and Windows' porting team at SAP.

Webmaster Said:
Thank you.
lars svanberg Said:
hariprakash Said: