The optimizer is the heart and soul of DB2. It analyzes SQL statements and
determines the most efficient access path available for satisfying each
statement (see Figure 1). DB2 UDB accomplishes this by parsing the SQL statement
to determine which tables and columns must be accessed. The DB2 optimizer then
queries system information and statistics stored in the DB2 system catalog to
determine the best method of accomplishing the tasks necessary to satisfy the
Figure 1. DB2 optimization in action.
The optimizer is equivalent in function to an expert system. An expert
system is a set of standard rules that, when combined with situational data,
returns an "expert" opinion. For example, a medical expert system takes the set
of rules determining which medication is useful for which illness, combines it
with data describing the symptoms of ailments, and applies that knowledge base
to a list of input symptoms. The DB2 optimizer renders expert opinions on data
retrieval methods based on the situational data housed in DB2's system catalog
and a query input in SQL format.
Optimizing data access in DB2
The notion of optimizing data access in the DBMS is one of the most powerful
capabilities of DB2. Remember, you access DB2 data by telling DB2 what to
retrieve, not how to retrieve it. Regardless of how the data is physically
stored and manipulated, DB2 and SQL can still access that data. This separation
of access criteria from physical storage characteristics is called physical data
independence. DB2's optimizer is the component that accomplishes this physical
If you remove the indexes, DB2 can still access the data (although less
efficiently). If you add a column to the table being accessed, DB2 can still
manipulate the data without changing the program code. This is all possible
because the physical access paths to DB2 data are not coded by programmers in
application programs, but are generated by DB2.
Compare this with non-DBMS systems in which the programmer must know the
physical structure of the data. If there is an index, the programmer must write
appropriate code to use the index. If someone removes the index, the program
will not work unless the programmer makes changes. Not so with DB2 and SQL. All
this flexibility is attributable to DB2's capability to optimize data
manipulation requests automatically.
The optimizer performs complex calculations based on a host of information.
To visualize how the optimizer works, picture the optimizer as performing a
- Receive and verify the syntax of the SQL statement.
- Analyze the environment and optimize the method of satisfying the SQL
- Create machine-readable instructions to execute the optimized SQL.
- Execute the instructions or store them for future execution.
The second step of this process is the most intriguing. How does the
optimizer decide how to execute the vast array of SQL statements that you can
send its way?
The optimizer has many types of strategies for optimizing SQL. How does it
choose which of these strategies to use in the optimized access paths? IBM does
not publish the actual, in-depth details of how the optimizer determines the
best access path, but the optimizer is a cost-based optimizer. This means
the optimizer will always attempt to formulate an access path for each query
that reduces overall cost. To accomplish this, the DB2 optimizer applies query
cost formulas that evaluate and weigh four factors for each potential access
path: the CPU cost, the I/O cost, statistical information in the DB2 system
catalog, and the actual SQL statement.