Joins are statements that retrieve data from more than one table. A join is characterized by multiple tables in the FROM clause, and the relationship between the tables is defined through the existence of a join condition in the WHERE clause. In a join, one row set is called inner, and the other is called outer.

How the Query Optimizer Executes Join Statements

To choose an execution plan for a join statement, the optimizer must make these interrelated decisions:

Access Paths:

As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement.

 Join Method :

To join each pair of row sources, Oracle must perform a join operation. Join methods include nested loop, sort merge, cartesian, and hash joins.

 Join Order:

To execute a statement that joins more than two tables, Oracle joins two of the tables and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result.

How the Query Optimizer Chooses Execution Plans for Joins

The query optimizer considers the following when choosing an execution plan:

  • Optimizer places tables first in the join order those have UNIQUE or PRIMARY KEY constraints and return at most one row. The optimizer then optimizes the join of the remaining set of tables.

  • The table with the outer join operator must come after the other table in the condition in the join order (optimizer does not consider that violate this rule)

The optimizer generates a set of execution plans, according to possible join orders, join methods, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in the following ways:

  • The cost of a nested loops operation is based on the cost of reading each selected row of the outer table and each of its matching rows of the inner table into memory using statistics in data dictionary.

  • The cost of a sort merge join is based largely on the cost of reading all the sources into memory and sorting them.

  • he cost of a hash join is based largely on the cost of building a hash table on one of the input sides to the join and using the rows from the other of the join to probe it.

The optimizer also considers other factors when determining the cost of each operation. For example:

  • A smaller sort area size is likely to increase the cost for a sort merge join because sorting takes more CPU time and I/O in a smaller sort area.

  • A larger multiblock read count is likely to decrease the cost for a sort merge join in relation to a nested loop join. If a large number of sequential blocks can be read from disk in a single I/O, then an index on the inner table for the nested loop join is less likely to improve performance over a full table scan. The multiblock read count is specified by the initialization parameterDB_FILE_MULTIBLOCK_READ_COUNT.

With the query optimizer, the optimizer's choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint specifies a join order that violates the rule for an outer join, then the optimizer ignores the hint and chooses the order. Also, you can override the optimizer's choice of join method with hints.

Nested Loop Joins

Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.

First optimizer returns all the rows from row source 1 and Then optimizer probe row source 2 once for each row returned from row source

The optimizer determines the driving table and designates it as the outer table.

The other table is designated as the inner table.

  1. The optimizer determines the driving table and designates it as the outer table.
  2. The other table is designated as the inner table.
  3. For every row in the outer table, Oracle accesses all the rows in the inner table. The outer loop is for every row in outer table and the inner loop is for every row in the inner table. The outer loop appears before the inner loop in the execution plan, as follows:
NESTED LOOPS     outer_loop     inner_loop  

Row source 1
Row 1 -------------- -- Probe -> Row source 2
Row 2 -------------- -- Probe -> Row source 2
Row 3 -------------- -- Probe -> Row source 2
Row source 1 IS known AS the OUTER TABLE
Row source 2 IS known AS the INNER TABLE

For each relevant row in the first table (driving table), find all matching rows in the other table (probed table).

Accessing row source 2 is known a probing the inner table For nested loops to be efficient it is important that the first row source returns as few rows as possible as this directly controls the number of probes of the second row source. Also it helps if the access method for row source 2 is efficient as this operation is being repeated once for every row returned by row source 1.

When the Optimizer Uses : When joining small number of rows, with a good driving condition between the two tables.

Nested Loop Join Hints : USE_NL(table1 table2)

Nesting Nested Loops : The outer loop of a nested loop can be a nested loop itself. You can nest two or more outer loops together to join as many tables as needed.

Hash Joins

Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.

This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.

When the Optimizer uses: The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following conditions are true:

  • A large amount of data needs to be joined.
  • A large fraction of a small table needs to be joined.

Hash Join Hints: USE_HASH

Sort Merge Joins

Rows are produced by Row Source 1 and are then sorted Rows from Row Source 2 are then both sides are merged together (joined)

/ \
| |
Row Source 1 Row Source 2

Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:

  • The row sources are sorted already.
  • A sort operation does not have to be done.

However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.

Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.

In a merge join, there is no concept of a driving table. The join consists of two steps:

  1. Sort join operation: Both the inputs are sorted on the join key.
  2. Merge join operation: The sorted lists are merged together.

If the input is already sorted by the join column, then a sort join operation is not performed for that row source.

When the Optimizer Uses : The optimizer can choose a sort merge join over a hash join for joining large amounts of data if any of the following conditions are true:

  • The join condition between two tables is not an equi-join.
  • Because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.

Sort Merge Join Hints : USE_MERGE

Cartesian Joins

A Cartesian join is used when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets.

When the Optimizer Uses : The optimizer uses Cartesian joins when it is asked to join two tables with no join conditions.

Cartesian Join Hints : ORDERED

Outer Joins

An outer join extends the result of a simple join. An outer join returns all rows that satisfy the join condition and also returns some or all of those rows from one table for which no rows from the other satisfy the join condition.

Nested Loop Outer Joins

This operation is used when an outer join is used between two tables. The outer join returns the outer (preserved) table rows, even when there are no corresponding rows in the inner (optional) table.

In a regular outer join, the optimizer chooses the order of tables (driving and driven) based on the cost. However, in a nested loop outer join, the order of tables is determined by the join condition. The outer table, with rows that are being preserved, is used to drive to the inner table.

The optimizer uses nested loop joins to process an outer join in the following circumstances:

  • It is possible to drive from the outer table to inner table.

  • Data volume is low enough to make the nested loop method efficient.

For an example of a nested loop outer join, you can add the USE_NL hint to Example 13-8 to instruct the optimizer to use a nested loop. For example:

SELECT /*+ USE_NL(c o) */ cust_last_name,sum(nvl2(o.customer_id,0,1)) "Count"

Hash Join Outer Joins

The optimizer uses hash joins for processing an outer join if the data volume is high enough to make the hash join method efficient or if it is not possible to drive from the outer table to inner table.

The order of tables is determined by the cost. The outer table, including preserved rows, may be used to build the hash table, or it may be used to probe one.

Example 13-8 shows a typical hash join outer join query. In this example, all the customers with credit limits greater than 1000 are queried. An outer join is needed so that you do not miss the customers who do not have any orders.

Hash Join Outer Joins

SELECT cust_last_name, sum(nvl2(o.customer_id,0,1)) "Count"    
FROM customers c, orders o
WHERE c.credit_limit > 1000
AND c.customer_id = o.customer_id(+)
GROUP BY cust_last_name;
| Id | Operation | Name |Rows | Bytes | Cost (%CPU)|
| 0 | SELECT STATEMENT | | 168 | 3192 | 6 (17)|
| 1 | HASH GROUP BY | | 168 | 3192 | 6 (17)|
|* 2 | NESTED LOOPS OUTER | | 260 | 4940 | 5 (0) |
|* 3 | TABLE ACCESS FULL | CUSTOMERS | 260 | 3900 | 5 (0) |
|* 4 | INDEX RANGE SCAN | ORD_CUSTOMER_IX | 105 | 420 | 0 (0) |
Predicate Information (IDENTIFIED BY operation id):
3 - filter("C"."CREDIT_LIMIT">1000)
4 - access("C"."CUSTOMER_ID"="0"."CUSTOMER_ID"(+))

The query looks for customers which satisfy various conditions. An outer join returns NULL for the inner table columns along with the outer (preserved) table rows when it does not find any corresponding rows in the inner table. This operation finds all the customers rows that do not have any orders rows.

In this case, the outer join condition is the following:

customers.customer_id = orders.customer_id(+)

The components of this condition represent the following:

  • The outer table is customers.

  • The inner table is orders.

  • The join preserves the customers rows, including those rows without a corresponding row in orders.

You could use a NOT EXISTS subquery to return the rows. However, because you are querying all the rows in the table, the hash join performs better (unless the NOT EXISTS subquery is not nested).

In Example 13-9, the outer join is to a multitable view. The optimizer cannot drive into the view like in a normal join or push the predicates, so it builds the entire row set of the view.

Outer Join to a Multitable View

SELECT c.cust_last_name, sum(revenue)    
FROM customers c, v_orders o WHERE c.credit_limit > 2000
AND o.customer_id(+) = c.customer_id
GROUP BY c.cust_last_name;
| Id | Operation | Name |Rows | Bytes | Cost (%CPU)|
| 0 | SELECT STATEMENT | | 144 | 4608 | 16 (32)|
| 1 | HASH GROUP BY | | 144 | 4608 | 16 (32)|
|* 2 | HASH JOIN OUTER | | 663 | 21216 | 15 (27)|
|* 3 | TABLE ACCESS FULL | CUSTOMERS | 195 | 2925 | 6 (17)|
| 4 | VIEW | V_ORDERS | 665 | 11305 | |
| 5 | HASH GROUP BY | | 665 | 15960 | 9 (34)|
|* 6 | HASH JOIN | | 665 | 15960 | 8 (25)|
|* 7 | TABLE ACCESS FULL| ORDERS | 105 | 840 | 4 (25)|
| 8 | TABLE ACCESS FULL| ORDER_ITEMS | 665 | 10640 | 4 (25)|
Predicate Information (IDENTIFIED BY operation id):
2 - access("O"."CUSTOMER_ID"(+)="C"."CUSTOMER_ID")
3 - filter("C"."CREDIT_LIMIT">2000)
6 - access("O"."ORDER_ID"="L"."ORDER_ID")
7 - filter("O"."CUSTOMER_ID">0)

The VIEW definition IS AS follows:
CREATE OR REPLACE VIEW v_orders AS SELECT l.product_id,
SUM(l.quantity*unit_price) revenue,
o.order_id, o.customer_id
FROM orders o, order_items l
WHERE o.order_id = l.order_id
GROUP BY l.product_id, o.order_id, o.customer_id;Sort Merge OUTER Joins

When an outer join cannot drive from the outer (preserved) table to the inner (optional) table, it cannot use a hash join or nested loop joins. Then it uses the sort merge outer join for performing the join operation.

The optimizer uses sort merge for an outer join:

  • If a nested loop join is inefficient. A nested loop join can be inefficient because of data volumes.

  • The optimizer finds it is cheaper to use a sort merge over a hash join because of sorts already required by other operations.

Full Outer Joins

A full outer join acts like a combination of the left and right outer joins. In addition to the inner join, rows from both tables that have not been returned in the result of the inner join are preserved and extended with nulls. In other words, full outer joins let you join tables together, yet still show rows that do not have corresponding rows in the joined tables.

The query in Example 13-10 retrieves all departments and all employees in each department, but also includes:

  • Any employees without departments

  • Any departments without employees

Full Outer Join

SELECT d.department_id, e.employee_id
FROM employees e
FULL OUTER JOIN departments d ON
e.department_id = d.department_id
ORDER BY d.department_id;
The statement produces the following output:

------------- -----------
10 200
20 201
20 202
30 114
30 115
30 116
125 rows selected

When a SQL query is submitted to an Oracle database, Oracle query optimizer decide how to access the data. The process of making this decision is called query optimization, Oracle looks for the optimal way to retrieve the data, using the execution path.

Go to top
JSN Boot template designed by