There are three main aspects of Unicode collating that are responsible for this behavior (mainly 1 and 3):
VARCHARdata using a SQL Server collation (i.e. a collation name starting with
SQL_) uses a static mapping table for sorting and comparison operations. But, both
VARCHARdata using a Windows collation (i.e. a collation name not starting with
NVARCHARdata (regardless of the type of collation) uses Unicode sorting rules. And Unicode uses a multi-level comparison that looks across the entire string for each type of sensitivity, instead of character-by-character. So it will look at all accents across the string, then the casing of each character across the entire string, etc.
According to the official Unicode data, there are many characters marked as "variable", but in Microsoft's implementation of Unicode, it appears to be mainly "apostrophe" and "hyphen-minus".
DECLARE @SortTest TABLE (Things NVARCHAR(10) COLLATE SQL_Latin1_General_CP1_CI_AS); INSERT INTO @SortTest ([Things]) VALUES (N'AA'), (N'AC'), (N'AB'), (N'ZA'), (N'A-A'), (N'a''A'), (N'A,B'), (N'A''''''A'), (N'A''-''A'), (N'A-B'), (N''''), (N'-'), (N'AA'), (N','); SELECT [Things] FROM @SortTest ORDER BY [Things] -- COLLATE Latin1_General_100_BIN2
Now, you perhaps could use a binary collation (i.e. name ending in
_BIN2; don't use the older
_BIN collations) to get those two characters to not be ignored in this context, but that also introduces a different problem. For example, uncomment the
COLLATE clause at the end of that query and execute it. It will return the following:
Notice how "a'A" comes after "ZA"; it is not grouped with the upper-case "A"s.
You could also try to sort by an expression that converts those 2 characters to something that would sort closer to where you expect them to go. This would allow for the actual values to remain as they are. But I wouldn't recommend this as it can get messy and hard to maintain, and won't likely perform well for large result sets as it likely won't be able to use an index (if you don't have an index on this column then it won't be as bit of a problem).
You can use the International Components for Unicode (ICU) Collation Demo to see a more ideal / faithful implementation of the Unicode specification. Just click on the "ICU Collation Demo" link and do the following:
You should get the follow results:
Notable differences between ICU collation demo and
SQL_Latin1_General_CP1_CI_AS in SQL Server:
Since you are working with
NVARCHAR and intend on fully supporting Unicode (based on your comment on John's answer), here are some notes for working with
NVARCHAR, but string literals need to be prefixed with an upper-case "N". Sure, in this particular case it wouldn't alter any data, but it's a good habit to get into.
UNICODE()function instead of the
For more information on working with collations in general, please visit: Collations Info
16 People are following this question.