applicability in SAP systems
Row compression on the other
hand uses standard data
compression algorithms to
compress a row. This
includes the creation of a
'dictionary' which defines
symbols for particular
groups of bytes which appear
frequently in the table
data.
The Algorithm
To be more precise, row
compression in DB2 uses a
lossless compression
technique with a static
dictionary and works on a
'per table' basis. Static
here means that the
dictionary, once build, is
fixed and will not change
upon INSERTs or UPDATEs. The
dictionary must exists
before the table can be
compressed. Creation of the
dictionary and compressing
the table is done in one
step with an offline table
REORG. To create the
dictionary, the algorithm
looks for repeating entries
(or patterns) in rows. These
entries may span over
several adjacent columns.
They are replaced by a
symbol and the symbol is
stored together with the
repeating entries in the
dictionary. During
decompression (which is
completely transparent to
the database user) the DBMS
replaces the symbols found
in a row with the entries
which are stored in the
dictionary for this symbol.
This way, a row gets shorter
on disk and more rows will
fit into one page. The
dictionary itself is stored
together with the table data
and is not accessible from
the outside. Upon access to
a compressed table the
dictionary is loaded into (DBHEAP-)memory
for faster decompression.
Dictionaries can not exceed
the size of 150 KByte, they
are usually about 75 KByte
in size.
Here are a few other
facts about row compression:
- As indexes are used
for fast access to
records index data will
not be compressed. Also,
long and LOB table data
is not compressed.
- Compressed data is
represented as it is in
the buffer pool. In
other words, the content
of the buffer pool is
also compressed if it is
related to a compressed
table. This allows for a
more effective memory
usage.
- As already
mentioned, row
compression works on a
per table basis (it can
be switched on for
single tables). In case
of a multi partition
environment, it works on
a 'per table/per
partition' basis in the
sense that every
compressed table has
their own dictionary on
every partition.
How it works
Assuming you are an admin
running a SAP system on DB2
for Linux, UNIX and Windows,
let's see what you have to
do to use row compression.
I'm using again a SAP system
'Q1I' here to demonstrate
the steps. The first step is
to identify tables which are
suitable for row
compression. Naturally, the
largest tables of the system
should be considered first.
In my system Q1I I
identified table '/SAPAPO/MATLOC'
as a large table:
db2 => SELECT data_object_l_size, data_object_p_size
FROM TABLE(sysproc.admin_get_tab_info('SAPQ1I','/SAPAPO/MATLOC')) AS X
DATA_OBJECT_L_SIZE DATA_OBJECT_P_SIZE
-------------------- --------------------
2075296 2075296
I used the table function
ADMIN_GET_TAB_INFO to
determine the size in KBytes
of this table (there are no
long- or LOB-columns in this
table). 2075297 KBytes are
2026 MByte which means that
some of you will laugh if I
tell you that a 2 GByte
table is 'large'.
Nevertheless let's go on
with this table in this
example. To estimate the
space savings which can be
achieved by row compression,
the DB2 command INSPECT can
be used with the new option
ROWCOMPESTIMATE:
db2 => INSPECT
ROWCOMPESTIMATE TABLE NAME
"/SAPAPO/MATLOC" SCHEMA
sapq1i RESULTS KEEP
matlocestimate
DB20000I The INSPECT
command completed
successfully.
Because of the 'RESULTS KEEP
matlocestimate' option I can
find the output of this
INSPECT run in the file
matlocestimate in the
db2dump directory. As my
system Q1I has two
partitions, the partition
number (000) is added to the
file name. To read the
output it must be converted
first with db2inspf:
db2inspf matlocestimate.000
matlocestimate.000.txt
Here is the result:
The two lines 'Percentage of
pages/bytes saved from
compression' tell us here
that a space reduction by
87% is expected from row
compression which is
certainly a lot. (I can now
admit that I selected this
table as it constitutes a
very good showcase for row
compression :). Also, the
complete dictionary size for
the table (on this
partition) is estimated with
56832 + 32768 = 89600 Bytes
= 87,5 KByte. This table is
identified as an rewarding
object for row compression.
The next step is to
switch on row compression.
This is done with the new
addition 'COMPRESS YES' for
the ALTER TABLE statement.
Note that this addition is
also available for the
CREATE TABLE statement. In
both cases the table will
not be compressed with
this statement! Instead it
is more that the specified
table is now 'eligible' from
compression. The field
'COMPRESSION' in the
SYSCAT.TABLES view tells us
if compression is switched
on for a table or not.
db2 => SELECT compression FROM syscat.tables WHERE tabname = '/SAPAPO/MATLOC'
COMPRESSION
-----------
N
1 record(s) selected.
db2 => ALTER TABLE "SAPQ1I"."/SAPAPO/MATLOC" COMPRESS YES
DB20000I The SQL command completed successfully.
db2 => SELECT compression FROM syscat.tables WHERE tabname = '/SAPAPO/MATLOC'
COMPRESSION
-----------
R
1 record(s) selected.
The COMPRESSION field in
SYSCAT.TABLES can also
contain a 'V' for value
compression or an 'B' for
both (value and row
compression). To actually
compress the table data and
create the dictionary an
offline table REORG is
needed:
db2 => REORG TABLE "SAPQ1I"."/SAPAPO/MATLOC" USE PSAPTEMP16 RESETDICTIONARY
DB20000I The REORG command completed successfully.
REORG has the two new
options RESETDICTIONARY and
KEEPDICTIONARY. So it is
possible to do a
reorganization without
running the compress
algorithm again by retaining
an existing dictionary. For
the exact meaning of these
options please lookup REORG
in the DB2 Command
Reference.
What to expect
How successful was the
compression? Let's check:
db2 => SELECT dictionary_size, data_object_l_size, data_object_p_size
FROM TABLE(sysproc.admin_get_tab_info('SAPQ1I','/SAPAPO/MATLOC')) AS X
DICTIONARY_SIZE DATA_OBJECT_L_SIZE DATA_OBJECT_P_SIZE
-------------------- -------------------- --------------------
89600 273184 273184
With ADMIN_GET_TAB_INFO I
can check the size of the
dictionary. Exactly like
estimated by INSPECT it's
size is 87.5 KByte. The
table now occupies only
(273184*100)/2075296 = 13.1%
of the original space on
disk so that indeed 100% -
13% = 87% of the disk space
are saved.
Now you may think that
every good thing has a
downside. Do we have to pay
for these space savings? It
is very difficult to find a
general answer for the costs
associated with row
compression. Obviously, we
need more CPU time for
decompression. If on the
other hand your database is
more I/O bound the savings
from less I/O may (and in
most cases will) outperform
these additional CPU cycles.
So there is no universal
answer.
Otherwise we can examine
how the various SQL
statements are affected by
row compression. Remember
that the algorithm works
with a static dictionary
which is not changed by DML
statements issued against a
compressed table. From
measurements conducted by
our IBM colleagues here at
SAP (thanks!) we found out
that there is nearly no
overhead for SELECTs and
only a small overhead for
INSERTs. UPDATEs to existing
rows need a bit more
attention. If the update
changes the table data in a
way that a compression
symbol must be expanded to
the original entries the row
will to be larger after the
update than before. As a
result the row will not fit
in its original location. If
there is still free space
left on the data page DB2
will do a page reorg to
assemble the free space and
fit the row back in. If
there is not enough free
space DB2 will need to
create an overflow record
(that means that only a
pointer will be left on the
page and the remainder of
the row will be stored on
another page). These
additional operations will
bring along some additional
cost if they occur. You
therefore may see a slight
decrease in update
performance compared to an
uncompressed table. It very
much depends on your actual
data and your update
frequency if you see a
decrease in update
performance.
Regarding the compression
ratio, 87% like in this
example is certainly a very
good showcase. For the most
tables which are suitable
for row compression a
compression rate around 70%
would be more realistic.
Planned support for row
compression in SAP systems
Row compression is a very
useful feature and it will
be supported by various SAP
products.
- Starting today you
can transparently enable
row compression for any
table in your SAP system
through a manual DBA
action. To find suitable
candidates for row
compression in your SAP
system the new UDF
'INSPECT_TABLE_ROWCOMPESTIMATE'
is available. UDF can be
used. The note also
describes how to create
DB2 scripts to enable
row compression on
multiple tables.
- SAP BI will fully
support DB2 row
compression starting
with it's next version.
Row compression will
become the default for
BI fact tables. Support
for existing versions of
SAP BI/BW is also
available soon, There
will be more information
about row compression in
SAP BI posted on SDN
soon.
- R3load has the new
option '-loadprocedure
fast COMPRESS'. Started
with this option it
loads part of the data,
does a REORG to create
the compression
dictionary and then
loads the rest of the
data. This way the table
does not grow up to it's
full size before it is
compressed.
- The SAP DBA Cockpit
(transaction
'DBACOCKPIT') will fully
support row compression
starting with the next
release of SAP
Netweaver. See below a
sneak preview from one
of our development
systems. Note that the
development is still
ongoing here and there
might be changes!
The 'Single Table Analysis'
shows if a row and/or value
compression is switched on
for the table. The picture
shows the state of our table
after row compression took
place.
On the new compression tab
you can find all information
regarding row compression.
All the things I did above
manually (including the
INSPECT run) can be
scheduled very easy via the
DBA cockpit.