question

Raj More avatar image
Raj More asked

Cross Database Unions

My warehouse data is split across 3 databases on the same server. While pulling into a cube, I am essentially doing this:

EDIT: Changed to UNION ALL based on suggestions

SELECT * FROM DB1.dbo.Fact_Pres
UNION ALL
SELECT * FROM DB2.dbo.Fact_Pres
UNION ALL
SELECT * FROM DB3.dbo.Fact_Pres

Should I actually consolidate the data into ONE table? Will that make my processing any faster?

sql-server-2008performance
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Raj More avatar image Raj More commented ·
I left the 3 tables in 3 DBs and used the UNION ALL and found decent performance using the proper indexes.
0 Likes 0 ·
David Wimbush avatar image
David Wimbush answered

One thing that might help. UNION looks for, and eliminates, duplicate rows. A bit like SELECT DISTINCT. If you just want all the rows, use UNION ALL which will cut out this step and that should make it faster.

I doubt whether consolidating into one table would make much difference, though, if the databases are all on the same server.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Sriram avatar image
Sriram answered

Yes it will definitely improve the cube processing perforamnce if the whole data is in a single table. you can test the exact time difference by just processing once with teh query, and dumping the records and poiniting it as the cube data source.

4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Whitfield avatar image Matt Whitfield ♦♦ commented ·
I would think the converse would be true, because multiple threads would be assigned to read data from the multiple tables... Especially if the I/O subsystem is set up to split data onto separate disks for separate DBs
2 Likes 2 ·
TimothyAWiseman avatar image TimothyAWiseman commented ·
Without doing thorough testing on this, it is my expectation that the difference would be virtually nonexistant, assuming all other things being equal.
1 Like 1 ·
Håkan Winther avatar image Håkan Winther commented ·
I agree with Matt, and if you keep the tables in different databases, you will have the option to move (scale out) the databases to different servers to even increase the performance (by using openquery).
1 Like 1 ·
Raj More avatar image Raj More commented ·
Sriram, can you help me understand why it will be faster?
0 Likes 0 ·
Håkan Winther avatar image
Håkan Winther answered

In this case you will get three table scans, but if you add a where clause you could benefit from index seek (clustered), and if you do you may gain more performance from three tables, but that depends on the index depth, index selectivity, statistics etc. And remember, it is easier to rebuild three small indexes rather than one large.

Also keep in mind that "Auto update stats" for an index will execute when 20% of the data in the table is changed. 20% of 100 000 000 records is 20 000 000, but 20% of 33 000 000 is only 6 600 000 records. ( a little simplified )

Some of my large table doesn't get the statistics auto updated because i never pass the 20% limit, and the performance suffer from not having a correct statistics.

I know this answer is not exactly what you are looking for, but the lesson learned is that smaller tables are easier to maintain.

2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Raj More avatar image Raj More commented ·
I do have indexes on the appropriate columns and they are doing INDEX SEEEKS and not SCANS.
0 Likes 0 ·
Raj More avatar image Raj More commented ·
This is a data mart - I expect to add data on a monthly load and rebuild my indexes as a part of the ETL
0 Likes 0 ·
Håkan Winther avatar image
Håkan Winther answered

If you decide to merge them into one table, I recommend you to look at the new partitioning features in SQL server 2008. By using a partitioned table you could benefit from multiple threads to read data from the table, especially if you can place your partitions onto separate disks.

10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.