twingirl avatar image
twingirl asked

Union CTE's with different # of Columns

I'm trying to do a union of two CTE's. The first CTE has 7 columns and the other one has only 6 columns. I tried to pad it, but I invalid column name and not a defined system type.

This is what I did convert(dest, varchar, null)

Dest is the column in the first CTE that isn't in the 2nd CTE.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image
Oleg answered

There are a couple of eproblems with your attempt.

If first CTE has 7 columns, and the second - only 6, then you probably should not use the UNION but opt for UNION ALL instead. This is because you would probably want the seventh column from the first CTE to still appear in the results, opting to use NULL to "fill in the blanks" from the rows of the second CTE which does not have such column. This implies that there is no reason whatever to torture the engine by using the UNION when UNION ALL makes perfect sense.

Also, if the dest column is present in the first CTE and does not exist in the second then I am not sure why you would want to convert it to varchar. Even if you do for some unknown reason, the syntax should be convert(varchar, dest), not convert(dest, varchar). In the list of select columns from the second CTE you should just opt for NULL, and that is all.

Here is the snippet which shows the idea:

;with firstCTE as (
        Col1, Col2, Col3, Col4, Col5, Col6, dest
        from someTable
secondSTE as (
        ColA, ColB, ColC, ColD, ColE, ColF
        from someOtherTable
        Col1, Col2, Col3, Col4, Col5, Col6, dest
        from firstCTE
    union all
        ColA, ColB, ColC, ColD, ColE, ColF, null
        from secondCTE;

Something like this will work if the data types for the first 6 columns do match or at least are compatible. The names of the columns in the final select will be driven by the list from the first CTE, so in this example, the column names of the final result will be Col1, Col2, Col3, Col4, Col5, Col6, and dest.

Hope this helps.


10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

HanShi avatar image
HanShi answered

The remark about the UNION / UNION ALL in the first paragraph of the answer of @Oleg is wrong It has nothing to do with the number of columns in the query. When using UNION ALL it will show all ROWS from the result including any duplicates. When using UNION it remove duplicate rows from the final result, thus only showing unique rows.

When using UNION or UNION ALL (and also the operators like EXCEPT, INTERSECT, etc.) the number of columns of all SELECT statements must match and the datatype of each column must be (implicitly) the same. You can use the technique described by @Oleg and shown in his code snippet to add additional "dummy" columns to the second (and optional third, fourth, etc.) SELECT statement to match the number of the first SELECT.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kev Riley avatar image Kev Riley ♦♦ commented ·

@Oleg never mentioned anything about UNION/UNION ALL being about the number of columns, he explained that the 2 distinct data sets will be different (due to the derived 7th column), so there's little point in having SQL waste performance on de-duping the data.

0 Likes 0 ·
Show more comments
Oleg avatar image Oleg commented ·

@HanShi I am not sure what possessed you to assess that the remark about union / union all in my answer is wrong. If the first CTE has more columns then it is necessary to add some dummy column to the select list from the second CTE so that the number of columns is the same for combining the rows to work. In this case, assuming that first CTE has some values in that extra column, using the UNION instead of UNION ALL is a pretty silly idea as it creates an extra overhead without changing the result. This is because there is nothing but NULL in all rows from the second CTE for that extra column. The UNION has one extra step when compared with UNION ALL, namely it removes the duplicates. If there is no possibility for any duplicates to be removed then why would anyone in the right mind consider it? The UNION is somewhat akin to UNION ALL to which implicit GROUP BY [all columns in the select list] is added.

The bottom line is that yes, it would be better to use UNION ALL, not the UNION in this case. Also, I already had the answer with all that explained, so I am not sure what is the purpose of your answer, it did not really provide any additional solution to what I have already posted.

0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.