Ok, so something I need to do is a bit of investigation into how collations are managed, and how you could use .NET code to compare a string in the same way that any given collation would...
So, for this, I need to try and create a data set which contains all possible two-character unicode strings. Seeing as this will end up being 4 billion rows, it could potentially be quite (very) time consuming.
So I thought I would ask you guys how you would go about doing it.
A bit of background:
This is because I need to be able to take the output of two ordered SELECT statements from two DBs with identical schema, and in .NET be able to tell which row would come 'later'. This is so that if the databases contain different data, I can read from the side which has extra rows until the streams are back in sync. This is pretty easy for numeric types, but I need to work with string types too.
asked Jan 25, 2010 at 06:42 AM in Default
Matt Whitfield ♦♦
Don't create a table with all the values, make one with all the Unicode characters - nchar(0) to nchar(65535) - joining it to itself to get the two-character strings.
answered Jan 25, 2010 at 08:00 AM
What exactly is meant by "managed"? If you want to gain a better understanding of collations in general (their naming conventions, different versions, etc), the following three areas of MSDN documentation will help greatly:
I am not certain about getting an approximation for collations with names starting in SQL_, but for non-SQL_ collations you should be able to get the native .NET comparisons to be identical (or nearly identical depending on any differences between Unicode 5.0 and 6.0 since the version being used in .NET is dependent upon OS version and .NET Framework version). If you had the collation name of the field, you should be able to:
This statement is based on a common misunderstanding of Unicode and SQL Server's implementation of it via the
There are actually 1,112,064 possible characters that can be represented in Unicode and yes, stored within the
The 4 billion number comes from thinking in terms of the typical
I might be missing something, but I don't see why this is necessary. If the two result sets are using the same field to sort on and have the same collation, then you can just look at the row numbers of the result sets. And if the collations are different, then it doesn't matter as there is no way to equate them. If you want to sync the data between two DBs and not worry about collation differences, then just use a collation that has a name ending in