Ever wanted to improve a badly performing Oracle 11 database, or parts of it? How would you know for sure the performance improved for end users?
An Oracle database is a complex entity. It has all sorts of mechanisms to improve and optimize performance, like caching results, creating and caching execution plans, caching dictionaries, etc. When measuring query times, often the first attempt will take several seconds, while next attempts only takes a few milliseconds. That is because Oracle caches almost everything during that first attempt. In the real world though, where databases are under heavy use, caches expire. Performance is based upon first and second attempts together. This means, the more diversified the queries, the less advantage you get from caching.
But what about Oracle's Explain Plan? Well, it gives information about the execution plan and estimates performance... Yes, explain plans are great, unfortunately they are only theoretical and don't provide real world figures! An Oracle Explain Plan is simply an estimate of the execution plan. The real execution plan may very well differ from the Explain Plan estimation.
Worst case testing
To get consistent time measurements without being affected by caches, you'll have to clear all caches before executing a query under test:
- Clear Oracle's buffer_cache:
alter system flush buffer_cache;
- Clear Oracle's shared_pool:
alter system flush shared_pool;
- Clear disk cache (e.g. for CentOS):
echo 3 > /proc/sys/vm/drop_caches
The buffer_cache is part of Oracle's SGA (System Global Area) and buffers blocks read from disk to minimize disk IO. The shared_pool is another part of the SGA and is used for caching the library, dictionary and session information. The OS disk cache is used by the OS to buffer blocks read from disk.
The disadvantage of this approach is flushing the dictionary, part of Oracle's shared_pool. Reading the dictionary of a table can take a long time, especially when a table contains many (sub)partitions. Comparing query times between differently partitioned tables may give unrealistic results. One may assume dictionaries of frequently used tables will be loaded in the dictionary cache almost permanently.
Realistic worst case testing
We had this test setup: We optimized a query using Oracle's Explain Plan. The Explain Plan predicted the improved version of the query would perform better. But using the worse case scenario test environment, as described in the previous paragraph, the optimized query actually performed worse then the original. After generating a trace file, we saw the actual query time was better for the optimized query. But we had altered the partitioning layout of the table for the optimized query, and it caused an extra delay caused by reading the dictionary.
This time we want to have the dictionary already cached, before running the query under test (see list above for the exact commands):
- Clear Oracle's buffer_cache
- Clear Oracle's shared_pool
- Clear os disk cache
- Run warmup queries
- Clear buffer_cache
- Clear os disk cache
A warmup query should be a query selecting from the same tables as the query under test, but with a different Execution Plan (i.e. where clause), and returning an empty result set. For example, when you have a partitioned table, use a where clause that does not match any partition. This will load the dictionary, but not any index or rows for scanning. By not clearing the shared_pool after the warmup queries have run, the cached dictionary is still available when executing the query under test.
Now we got results that match our expectations and can be linked to explain plans. We finally can prove the optimized query gives better performance.
So it is possible to get consistent performance test results with Oracle!