What is the difference between union and union all in sql server? Is this related to joins(I can specify full join). Thank you all.
What is the difference between union and union all in sql server? Is this related to joins(I can specify full join). Thank you all.
I find the best way of explaining this is in terms of the execution plan.
If you look at the operator that is used to combine the two sets for a UNION ALL, you'll see it's a Concatenation operator, combining two sets of data without any further consideration of whether rows have already been seen, etc.
For a UNION, you can see a number of operations in play. You may see a Distinct Sort as it makes sure it doesn't return the same row twice. You may see a Full Outer Join, performed as either a Merge Join or a Hash Match. Or if each set involved in the UNION can be seen by the system as having no duplicates and no matches with the other set, you may find that it's performed in exactly the same way as a UNION ALL.
Almost every time you ask yourself what's different between two ways of writing a query, look at the differences in the execution plan - and remember that the differences you see are actually down to other aspects of the data.
Which one is better? Generally UNION ALL, but you need to focus on what your query needs to do and make sure you're asking the system the right question.
The main difference is UNION ALL will include all rows from the multiple recordsets being UNION'd together... Including any duplicates. BOL has a full description at http://msdn.microsoft.com/en-us/library/ms180026.aspx
We use union and union all to combine result sets which return the same number of columns with same data types. And no, it is not same as full outer join.
Lets take an example with one column.
Table1
Col1
1
2
3
4
Table2
Col2
3
4
5
6
Now, when I write
SELECT Col1 FROM Table1
UNION
SELECT Col2 FROM Table2
That would give me 1,2,3,4,5,6 - So it has removed the duplicates.
If I do
SELECT Col1 FROM Table1
UNION ALL
SELECT Col2 FROM Table2
It would return me 1,2,3,4,3,4,5,6 - It has got the duplicates with it.
Hope this help.
UNION is for joining 2 result sets, JOIN is for joining 2 tables, views, table-valued functions, table expressions and the like.
UNION ALL simply joins 2 result sets including all the rows. UNION joins 2 result sets and removes duplicate rows. UNION by itself incurs the overhead of a merge join so that it can remove these dupes. It also has a side affect of ordering the rows. If you do not care about dupes UNION ALL is more performant
Queue's answer covers the difference between UNION and UNION ALL well.
You should also be aware of EXCEPT and INTERSECT which are two further horizontal concatenation operators.
In terms of the difference between JOIN and UNION - think of one in terms of vertical concatenation, and think of the other in terms of horizontal concatenation. Each of them can return less rows than were present in the source (with the exception that join can repeat on the one side when there are multiples on the other), and each of them take the dimensions of your result set and extend them.
That's how I usually explain them, anyhow.
To illustrate the many plan possibilities for UNION and UNION ALL:
-- UNION implemented using a Concatenation
SELECT data FROM (VALUES (1), (2)) A (data)
UNION
SELECT data FROM (VALUES (3), (4)) B (data);
CREATE TABLE #A (data INT NOT NULL);
CREATE TABLE #B (data INT NOT NULL);
INSERT #A VALUES (1),(1), (3), (2), (2);
INSERT #B VALUES (1),(2), (2), (1), (4);
-- Concatenate then Distinct Sort on the result
SELECT data FROM #A
UNION
SELECT data FROM #B OPTION (CONCAT UNION, ORDER GROUP);
-- Concatenate then Hash Distinct on the result
SELECT data FROM #A
UNION
SELECT data FROM #B OPTION (CONCAT UNION, HASH GROUP);
-- Sort Distinct on the inputs, then Merge Union
SELECT data FROM #A
UNION
SELECT data FROM #B OPTION (MERGE UNION, ORDER GROUP);
-- Hash Distinct and Sort on the inputs to a Merge Union
SELECT data FROM #A
UNION
SELECT data FROM #B OPTION (MERGE UNION, HASH GROUP);
-- Sort Distinct on the probe input to a Hash Union
SELECT data FROM #A
UNION
SELECT data FROM #B OPTION (HASH UNION, ORDER GROUP);
-- Hash Distinct on the probe input to a Hash Union
SELECT data FROM #A
UNION
SELECT data FROM #B OPTION (HASH UNION, HASH GROUP);
-- UNION ALL performed by concatenation
SELECT data FROM #A
UNION ALL
SELECT data FROM #B;
-- Add indexes to provide sorted input (optimiser now has more options)
CREATE INDEX nc1 ON #A (data);
CREATE INDEX nc1 ON #B (data);
-- UNION ALL performed by Merge Concatenation
SELECT data FROM #A
UNION ALL
SELECT data FROM #B ORDER BY data;
-- Stream Aggregate Distinct on the inputs to a Merge Union
SELECT data FROM #A
UNION
SELECT data FROM #B;
-- Stream Aggregate Distinct on the inputs to a Merge Concatenation, then another Stream Aggregate Distinct
SELECT data FROM #A
UNION
SELECT data FROM #B OPTION (CONCAT UNION, ORDER GROUP);
DROP TABLE #A, #B;
SQL is not a truly relational language and one of its non-relational features is that SQL tables and queries can contain duplicate rows. The need to support duplicates complicates the language and its implementations very significantly. This is arguably SQL's most serious flaw (at least that's what Ted Codd thought).
SQL supports ALL and DISTINCT versions of some operations. The ALL keyword means duplicate rows are included while the DISTINCT keyword eliminates duplicates. To make matters worse, although both keywords are optional the behaviour when neither one is specified is not consistent. In the SELECT clause ALL is the default but with the UNION operator DISTINCT is the default.
2 People are following this question.