question

Wilfred van Dijk avatar image
Wilfred van Dijk asked

Change datatype in a table with 65.000.000.000 rows

I want to change a columtype from float to numeric(8,3). This saves 3 bytes per row, and in total about 180GB. However, this table has a lot of rows: 65.000.000.000 Cleaning data is not an option. The table has only 4 columns: 3 integer columns and 1 float column. Primary clustered key on 3 integer columns. SQL Server is SQL2008 Standard Edition SP3 (Yes, I already advised 2016 Enterprise) Options: (whatever option I choose, foreign keys and triggers are temporary disabled) 1. Execute "Alter table .. alter column", However, I think this wil eat all the available logspace and how long wil this run? 2. Create the new table besides the old table and move the data to this table. However, if I move 10.000 records per second this will take me 1800 hours... 3. Either option requires to rebuild the clustered index (option 1 to release the saved space, option 2 because of fragmentation), No idea how long this will take... Any ideas?
tableconversiondatatype
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Wilfred van Dijk avatar image Wilfred van Dijk commented ·
update: I did some testing and it seems the "delete top() output deleted.* into" trick is able to process 200.000 rows per second, which reduces the movement to 90 hour
1 Like 1 ·
Scot Hauder avatar image Scot Hauder commented ·
You advised the Enterprise version--and they said no? page-level compression will save more than 180GB
1 Like 1 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
That's still a very long outage... :-( Do let us know what process you go with in the end.
0 Likes 0 ·
Wilfred van Dijk avatar image Wilfred van Dijk commented ·
*update*: I did some performance testings on the client's server with 10.000.000 rows and is seems the "delete top() ... output deleted" trick will perform quite well. If I use "delete top(500.000)" the whole process will take about 80 hours,
0 Likes 0 ·
Wilfred van Dijk avatar image Wilfred van Dijk commented ·
Enterprise will be an option in the future. It's a big financial step if you upgrade a 24 core MSSQL 2008 server to 2016 ...
0 Likes 0 ·
ThomasRushton avatar image
ThomasRushton answered
What about adding a new blank column of type numeric(8,3) default NULL, and gradually populating that? When you're done, rename the old column, rename the new column to the original... Probably take a long time, but can be done in chunks over a period of time. Unless the addition of the numeric(8,3) causes the table to get rebuilt first time around. Hmm.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

ThomasRushton avatar image
ThomasRushton answered
What about creating a new table that matches what you want, and creating a view that merges the two tables (for SELECT) statements; new data goes into the new table; gradually migrate data between the two; updates of old data go into the new table while being removed from the old. When the old table is empty, just drop it & the view?
5 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Wilfred van Dijk avatar image Wilfred van Dijk commented ·
I am also thinking about this option, but I have to discuss this with the data analysts. I am not sure if they can change the tablename in their reports
2 Likes 2 ·
ThomasRushton avatar image ThomasRushton ♦♦ commented ·
You could rename the old table as part of the script - that way they won't need to change their reports, they'll be selecting from the view instead of the table. Of course, this requires that you can control the CRUD procedures, and that there's no outstanding code writing to the table. :-/
0 Likes 0 ·
Wilfred van Dijk avatar image Wilfred van Dijk commented ·
Every night, several rows are added from several servers (using SSIS). Since I don't want to touch these packages a view with the original tablename is not an option. No updates during working hours, just reads
0 Likes 0 ·
Tom Staab avatar image Tom Staab ♦ commented ·
Great back-and-forth between both of you as you consider the various implications. I like the idea, Thomas, and it's nice to see the thought process as it develops. Wilfred, please let us know what you end up doing.
0 Likes 0 ·
Wilfred van Dijk avatar image Wilfred van Dijk commented ·
Migration is planned for Nov 11th... I'll post my scripts when I am ready
0 Likes 0 ·
frankf avatar image
frankf answered
Except VIEW option, others are all good ones in architecture from long run. the important step1 is to dump such big table to a staging table under DEV environment (can set up a uncommitted isolation level for it), make changes over there, then move back to production when completed. adding blank column (decimal (8,3)) is a good option in detail.
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image
Magnus Ahlkvist answered
This is an exceptionally large table (in terms of rows). So regardless of method, it will take time to make this change. I would try with option 2, create a new table, and then bulk insert using SSIS to the new table. I would guess it's faster than OUTPUT SELECT * INTO. If your source is a Query which selects rows in the order of the clustered index, and add the ORDER hint to the fast load options of the destination, you are likely to get pretty descent performance even when loading into a clustered index. You will probably have to fiddle a bit with some settings, like max commit size and rows per batch - trial and horror will get you to whatever is "optimal" with your server and memory configuration. EDIT: Didn't mention originally, but my idea would be to load the table with the clustered index defined on the target table, not load into heap and then build clustered index. But also this could be challenged. If you bulk into a heap, using a sorted source query, creating the clustered index as a last step could (or could not) be faster than loading into a clustered index (since the data pumped into the table is sorted, thus the heap is likely to be "sorted").
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Wilfred van Dijk avatar image Wilfred van Dijk commented ·
I did some testing with several delete top(#) values, but a higher value than 500.000 won't give me more performance. So I/O must be at its maximum. Because of this I expect SSIS not to be faster, but an interesting advice!
1 Like 1 ·
Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
Since DELETE is a fully logged operation, as well as INSERT, I would think that a minimally logged bulk insert with SSIS would perform quite a lot faster.
0 Likes 0 ·
Wilfred van Dijk avatar image Wilfred van Dijk commented ·
This means I need to have an additional 2TB storage (the size of the table) before I can drop the old table
0 Likes 0 ·
Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
Yes, that's the downside of this solution. If the table would have been partitioned, you could have bulked in and truncated (also not fully logged) one partition at the time, but since you are on Standard edition, that's not an option. If temporarily adding 2TB storage to the server, I think this is the fastest solution. If you, like me, is using SAN storage, that's not such a big thing, there just have to be 2TB available from the SAN for a day. But if you are using physical drives in the server it would be more of a hazzle.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.