technette avatar image
technette asked

sort 1 million 32-bit integers

I have been researching the following question: What is the most effective way to sort 1 million 32-bit integers? Why? Came across a C++ book that states that Hashing Agorithms can achieve this more efficiently that Tree because it it easier to implement... What are your thoughts?
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Magnus Ahlkvist avatar image Magnus Ahlkvist commented ·
The sorting operation on its own is one thing. The other thing is how you want to access the data. If you'll always access all the data at once one algorithm might fit, if you want to search specific items in the sorted data, another algorithm might suit better. Given that this is a SQL Server site, I'd say the best solution is to put all the data in a column in a table and create an index on it. Then you'll get a tree structure, so I vote for tree :)
4 Likes 4 ·
Scot Hauder avatar image Scot Hauder commented ·
Radix sort for integer data
1 Like 1 ·

1 Answer

TimothyAWiseman avatar image
TimothyAWiseman answered
There are two questions that need to be answered before this can be addressed, one is if we can safely make any assumptions about the structure of the data ahead of time, and the second is, as Magnus asks, how you intend to use the data. Now, as a practical matter, all modern languages implement a sort function, and normally this sort function is both highly optimized and well enough designed to consider a variety of different algorithms depending on the circumstances. So, unless there was some overwhelming concern generated by those two questions, I would just invoke the languages built in sort function and be done with it. Some concerns that would change that would be if I new there were large sections of the data set that were already sorted. If I could even reasonably guess at where the boundaries were, then a merge sort would probably be best, and if I could definitely break it up into subsections such that I could let the algorithm not even check that those sub-lists were sorted, then merge sort is definitely best. Of course, merge sort will often be less efficient in practice than a heapsort if I cannot make assumptions about the data have some order ahead of time. And, looking at the question of how the data will be used afterwards, if I wanted to repeatedly access this through a database, then as Magnus said, the best approach is just to index it and let it be. Generally though, for just a list of integers with no special post processing concerns and where I can make no specific assumptions about the data structure (other than it all being integers), than a radix sort will normally be fasts, as Scot Hauder said. With all of that said, this site is primarily about databases and you might be better of trying this on StackOverflow.
1 comment
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

technette avatar image technette commented ·
Thank you Scot, Magnus and Timothy, this really great input. Even when dealing with programming, I really like to get good database expert input regarding data.
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.