SQL JOIN
I received a call from a developer that was concerned about how long his query was running. Looking at the query explain plan, existing indexes, how the query was written, etc, I concluded that it would just take that long. It was doing a LOT of IO by the nature of what he had written.
A few hours later, I received a follow-up call. He had figured out the issue. He changed OUTER JOINs to INNER JOINs, and it ran faster. It was obvious that he had significantly reduced the storage IO, which generally means that the query runs faster, but he did not seem to realize that he had completely changed the query and had different results.
So, let’s take a look at the several types of JOINs in SQL and how they change the results. I’ll use MariaDB V10.2 do generate examples, and Venn diagrams to help visualize the results. I’ll also add quick notes to identify simple set theory (ie, simple AND/OR/NOT/XOR).
My first goals is to give you a clear and concise representation of each of the ways you can JOIN tables. I’ve included the table definitions and INSERT statements so you can reproduce my results.
The Venn diagram shows that there are 3 areas that can be shaded or not shaded (ie, 2 states). 2 raised to the 3rd power is 8, so there are 8 ways represent the Venn diagram. My second goal is to show you all 8, then talk about some other results you may experience.
Tables I’ll be using
The CREATE TABLE for the two tables used to provide sample data for the below 8 joins are listed here. The CREATE DATABASE and USE are included for completeness. L_table is the table on the left, and R_table is the table on the right of the join. Ie, L_table is mentioned first in the SQL statement containing the JOIN, and R_table is mentioned second.
— a place to put the example tables below
CREATE DATABASE db_sample;
USE db_sample;
— create two tables to use for the 8 possible results below
CREATE TABLE L_table (L_id TINYINT, L_text VARCHAR(6));
CREATE TABLE R_table (R_id TINYINT, R_text VARCHAR(6));
— insert data for the simplest example
INSERT INTO L_table VALUES (1,’L only’);
INSERT INTO L_table VALUES (2,’L + R’);
INSERT INTO R_table VALUES (2,’L + R’);
INSERT INTO R_table VALUES (3,’R only’);
— verify
SELECT * FROM L_table;
L_id | L_text
—–+——-
1 | L only
2 | L + R
SELECT * FROM R_table;
R_id | R_text
—–+——-
2 | L + R
3 | R only
INNER JOIN
— inner join
SELECT *
FROM L_table
INNER JOIN R_table ON L_id=R_ID
L_id | L_text | R_id | R_text
—–+——–+——+——-
2 | L + R | 2 | L + R
In set theory: LEFT AND RIGHT
Returns those rows from LEFT and RIGHT that match the join criteria.
The INNER JOIN is the most common JOIN. When referring to JOINs, this is what comes to mind. It’s so common that some engines allow you to drop the term “INNER” in favor of just “JOIN”. Other JOINS (OUTER JOINS) return NULL for a key value, which is the definition of an OUTER JOIN. More specifically…
OUTER JOINs return rows for the designated table (ie, left table as in LEFT OUTER JOIN) with null values for the non-existent values from the other table, AND the rows that match the JOIN condition.
LEFT [OUTER] JOIN
— left outer join
SELECT *
FROM L_table
LEFT JOIN R_table ON L_id=R_ID
;
L_id | L_text | R_id | R_text
—–+——–+——+——-
2 | L + R | 2 | L + R
1 | L only | NULL | NULL
In set theory: LEFT
Returns those rows from LEFT and RIGHT that match the join criteria. Also returns the values that did not match the join criteria from the LEFT with the RIGHT’s values set to NULL.
Note: SQL, unless there is an ORDER BY clause, returns rows in an undetermined order.
RIGHT [OUTER] JOIN
— right join
SELECT *
FROM L_table
RIGHT JOIN R_table ON L_id=R_ID
;
L_id | L_text | R_id | R_text
—–+——–+——+——-
2 | L + R | 2 | L + R
NULL | NULL | 3 | R only
In set theory: RIGHT
Returns those rows from LEFT and RIGHT that match the join criteria. Also returns the values that did not match the join criteria from the RIGHT with the LEFT’s values set to NULL.
LEFT EXCLUSIVE/INNER JOIN
— left exclusive/inner join
SELECT *
FROM L_table
LEFT JOIN R_table ON L_id=R_ID
WHERE L_id IS NULL OR R_id IS NULL
;
L_id | L_text | R_id | R_text
—–+——–+——+——-
1 | L only | NULL | NULL
In set theory: LEFT AND NOT RIGHT
Returns only the non-matching values from the LEFT with the RIGHT’s values set to NULL.
For the “EXCLUSIVE/INNER JOINs”, notice that my WHERE clause is generic in the sense that it contains an “OR” which is not always required, but it’s the same WHERE clause throughout the examples.
RIGHT EXCLUSIVE/INNER JOIN
— right exclusive/inner join
SELECT *
FROM L_table
RIGHT JOIN R_table ON L_id=R_ID
WHERE L_id IS NULL OR R_id IS NULL
;
L_id | L_text | R_id | R_text
—–+——–+——+——-
NULL | NULL | 3 | R only
In set theory: RIGHT AND NOT LEFT
Returns only the values that did not match the join criteria from the RIGHT with the LEFT’s values set to NULL.
FULL [OUTER] JOIN
— full [outer] join = left outer join + right exclusive/inner join
(
SELECT *
FROM L_table
LEFT JOIN R_table ON L_id=R_ID
)
UNION
(
SELECT *
FROM L_table
RIGHT JOIN R_table ON L_id=R_ID
WHERE L_id IS NULL OR R_id IS NULL
)
;
L_id | L_text | R_id | R_text
—–+——–+——+——-
2 | L + R | 2 | L + R
1 | L only | NULL | NULL
NULL | NULL | 3 | R only
In set theory: LEFT OR RIGHT
Returns those rows from LEFT and RIGHT that match the join criteria. Also returns the values that did not match the join criteria from the LEFT with the RIGHT’s values set to NULL. Also returns the values that did not match the join criteria from the RIGHT with the LEFT’s values set to NULL.
Some database engines will allow the “FULL OUTER JOIN tbl ON condition” syntax. Some use “(+)” just after the left or right table. The above syntax is generic.
OUTER EXCLUSIVE/INNER JOIN
— full exclusive/inner join = left exclusive/inner join + right exclusive/inner join
(SELECT *
FROM L_table
LEFT JOIN R_table ON L_id=R_ID
WHERE L_id IS NULL OR R_id IS NULL
)
UNION
(
SELECT *
FROM L_table
RIGHT JOIN R_table ON L_id=R_ID
WHERE L_id IS NULL OR R_id IS NULL
)
L_id | L_text | R_id | R_text
—–+——–+——+——-
1 | L only | NULL | NULL
NULL | NULL | 3 | R only
In set theory: LEFT XOR RIGHT
Returns the values that did not match the join criteria from the LEFT with the RIGHT’s values set to NULL. Also returns the values that did not match the join criteria from the RIGHT with the LEFT’s values set to NULL.
NO JOIN
This JOIN serves no purpose other than to complete the example with 8 Venn diagrams.
STRAIGHT JOIN
Some database engines like MariaDB have a “STRAIGHT JOIN” clause. This is simply an INNER JOIN that evaluates the tables in the order they are listed in the SQL, rather than in the order the cost based optimizer determines is most efficient.
This JOIN is used when the cost based optimizer needs a little help in determining the order to evaluate tables.
I’d recommend never using this clause in favor of only using INNER JOINS. If you’re having issues with performance, focus on making sure the cost based optimizer has good information. For example, insure statistics are not out of date, estimated row counts are close to reality, etc. With MariaDB default file storage, InnoDB, ANALYZE may be a good thing as long as you’ve accounted for LOCKS.
STRAIGHT JOINs are a form of optimizer hint. If you do use them, keep in mind that optimizer hints are a maintenance nightmare! Remember your data will change over time and your optimizer hints will become outdated.
CROSS JOIN
SELECT *
FROM L_table, R_table;
L_id | L_text | R_id | R_text
—–+——–+——+——-
1 | L only | 2 | L + R
2 | L + R | 2 | L + R
1 | L only | 3 | R only
2 | L + R | 3 | R only
In set theory: LEFT * RIGHT
Returns each row of the LEFT table is joined to each row of the RIGHT table. There are no NULLs added.
A CROSS JOIN is sometimes referred to as a CARTESIAN PRODUCT. Results grow exponentially with the size the two tables. The number or rows returned is the number of rows in the LEFT table times the number of rows in the RIGHT table.
CROSS JOINs are almost always an error. The only legitimate use I can think of is to generate a Cross Tabulation Report.
The SQL Standard
SQL Standard 2006 SQL/Foundation 7.7
THE MAKING OF THE VENN DIAGRAMS
I googled: venn diagram templates, to find a couple of connected circles to use as a template. I then used sketch.io to fill the red portions of each diagram.
Troy Frericks.
blog 31-Dec-2018
=
Copyright 2015-2018 by Troy Frericks, http://dba.frericks.us/.
#