The query otimizer can choose "Best Throughput" or "Best Response time". Best Throughput is the default.
Best Throughput means amount of work in a particular period of time and chooses least amount of resources necessary to process statement. This the default goal for query optimizer.
Best for Batch applications (Report application) because the user concerned with the time necessary for the application to complete and user does not examine the results of individual statements while the application is running
Best response time:
This means that it uses the least amount of resources necessary to process the first row accessed by a SQL statement. For interactive appication (Forms application, SQL Plus queries) because the interactive user is waiting to see the first row or first few rows accessed by the statement.
Factors that effects Query Optimizer Goals
The optimizer's behavior and goal for a SQL statement is affected by the following factors:
OPTIMIZER_MODE Initialization Parameter
Optimizer mode parameter associated with different modes to choose for quey optimization.
|RULE||The rule-based optimizer is used.|
|CHOOSE||Causes optimizer to choose between rule-based and cost-based approaches.
Rule-based = No statistics available for any table in the query
Cost-based = Statistics are avaliable for some tables in the query, statistics of missing tables guessed by optimizer.
|Optimizer uses a cost-based approach with a goal of best throughput.
Note: Presence of statistics are not required.
Note: Most appropriate for data warehousing queries.
|FIRST_ROWS_n||Optimizer uses a cost-based approach with a goal of best response time to return the first n number of rows; n can equal 1, 10, 100, or 1000.
Note: Presence of statistics are not required.
Note: This mode is most appropriate for OLTP type queries
|FIRST_ROWS||The optimizer uses a mix of cost and heuristics(learning themselves) to find a best plan for fast delivery of the first few rows.
Note: FIRST_ROWS is available for backward compatibility and plan stability; use FIRST_ROWS_n instead.
ALTER SESSION SET optimizer_mode = first_rows_1; --current session
ALTER SYSTEM SET optimizer_mode = first_rows_1; --instance level
If optimizer uses cost-based approach, and statement have no statistics, then the optimizer uses internal information (number of data blocks allocated to these tables) to estimate other statistics for these tables.
it is recommend that always use the cost-based optimizer. it is better in theory and practice, such as materialized views, are considered in the cost-based Optimizer, whereas the rule-based optimizer does not recognize them.
Optimizer SQL Hints for Changing the Query Optimizer Goal
Hints can be used to direct the Oracle query optimizer to use a specific optimization technique for a query and can override the OPTIMIZER_MODE initialization parameter for that SQL statement
Hints are embedded as comments in a SQL query, with the following syntax:
sql_action /*+ hint */
sql_action --+ hint
SELECT /*+ ALL_ROWS */
|Cost-based approach with a goal of best response time regardless of the presence of statistic.|
SELECT /*+ FIRST_ROWS_1 */ empno, ename
|Cost-based approach with a goal of best throughput.|
If a hint is incorrect or invalid, Oracle ignores the hint without causing an error.
Query Optimizer Statistics in the Data Dictionary
The statistics used by the query optimizer are stored in the data dictionary. You can collect by using the DBMS_STATS package. This enables the query optimizer to choose the best execution plan.
If no statistics are available, the optimizer will do dynamic sampling. This may cause slower parse times. The dynamic sampling is controlled by initialization parameter OPTMIZER_DYNAMIC_SAMPLING.
The types of statistics used by the cost-based optimizer are shown.
|Entity||Type of Statistics|
|Table||Number of rows, blocks, unused blocks
Average available free space per block
Number of chained rows
Average row length
Remote average row length
|Column||Number of distinct values per column (cardinality)
Second lowest column value
Second highest column value
Column density factor
Number of NULLs for the column
Data distribution factor
|Index||Depth of index B*-tree structure
Number of leaf blocks
Number of distinct values
Average number of leaf blocks per key
Average number of data blocks per key
|System||I/O performance and utilization (new in Oracle9i)
CPU performance and utilization (new in Oracle9i)
You should periodically collect statistics. Prior to Oracle8i, the ANALYZE command was used to collect statistics. Oracle8i introduced the DBMS_STATS package.
You can collect statistics for the complete database or for particular objects (manually or automatically). After creating a new index or doing a data load. Updating statistics should be a part of a general maintenance routine.
If you feel that the cost-based optimizer is working appropriately and you don’t want new statistics to change execution plans, you can store the statistics for the database with the DBMS_STATS.EXPORT_SCHEMA_STATS procedure to save them before you update the statistics.
If the new set of statistics does not deliver the desired performance or better, you can use the DBMS_STATS.IMPORT_SCHEMA_STATS procedure to reimport the saved statistics. You can also store multiple versions of statistics in a statistics table you specify.
The CBO assumes that the values for a column are evenly distributed. For instance, if there are two distinct values in a column, the cost-based optimizer assumes that each value applies to 50% of the entries in the column. This assumption can be incorrect for columns with extreme data value skew and can result in the optimizer ’s making the wrong choice for an execution plan.
For table columns that contain values with large variations in number of duplicates, called skewed data. You can create histograms to avoid this potential problem. Introduced with Oracle8i, histograms give the optimizer a more detailed view of the distribution of data values in the column. You can create a histogram with procedures in the DBMS_STATS package.
Histograms require some overhead, so you should not use them, by default, for all columns, but they can help to improve the accuracy of execution plans involving some columns with low selectivity.