Change datatype in a table with 65.000.000.000 rows
I want to change a columtype from float to numeric(8,3). This saves 3 bytes per row, and in total about 180GB. However, this table has a lot of rows: 65.000.000.000 Cleaning data is not an option. The table has only 4 columns: 3 integer columns and 1 float column. Primary clustered key on 3 integer columns. SQL Server is SQL2008 Standard Edition SP3 (Yes, I already advised 2016 Enterprise) Options: (whatever option I choose, foreign keys and triggers are temporary disabled) 1. Execute "Alter table .. alter column", However, I think this wil eat all the available logspace and how long wil this run? 2. Create the new table besides the old table and move the data to this table. However, if I move 10.000 records per second this will take me 1800 hours... 3. Either option requires to rebuild the clustered index (option 1 to release the saved space, option 2 because of fragmentation), No idea how long this will take... Any ideas?
What about adding a new blank column of type numeric(8,3) default NULL, and gradually populating that? When you're done, rename the old column, rename the new column to the original... Probably take a long time, but can be done in chunks over a period of time. Unless the addition of the numeric(8,3) causes the table to get rebuilt first time around. Hmm.
What about creating a new table that matches what you want, and creating a view that merges the two tables (for SELECT) statements; new data goes into the new table; gradually migrate data between the two; updates of old data go into the new table while being removed from the old. When the old table is empty, just drop it & the view?
Except VIEW option, others are all good ones in architecture from long run. the important step1 is to dump such big table to a staging table under DEV environment (can set up a uncommitted isolation level for it), make changes over there, then move back to production when completed. adding blank column (decimal (8,3)) is a good option in detail.
This is an exceptionally large table (in terms of rows). So regardless of method, it will take time to make this change. I would try with option 2, create a new table, and then bulk insert using SSIS to the new table. I would guess it's faster than OUTPUT SELECT * INTO. If your source is a Query which selects rows in the order of the clustered index, and add the ORDER hint to the fast load options of the destination, you are likely to get pretty descent performance even when loading into a clustered index. You will probably have to fiddle a bit with some settings, like max commit size and rows per batch - trial and horror will get you to whatever is "optimal" with your server and memory configuration. EDIT: Didn't mention originally, but my idea would be to load the table with the clustered index defined on the target table, not load into heap and then build clustered index. But also this could be challenged. If you bulk into a heap, using a sorted source query, creating the clustered index as a last step could (or could not) be faster than loading into a clustered index (since the data pumped into the table is sorted, thus the heap is likely to be "sorted").