x

navchar datatype returns incorrect results when using equal

CREATE TABLE #nVarcharDemo
(C1 INT,
C2 NVARCHAR(90)
)


INSERT INTO #nVarcharDemo
 -- Will Insert as "prhan" since you did't used "N"
 SELECT 1, CONVERT(NVARCHAR(90),N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʭ')
 UNION 
 -- Insert with 'N'
 SELECT 2, CONVERT(NVARCHAR(90),N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʪ')

SELECT * FROM #nVarcharDemo
where C2 =N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʭ'

Drop table #nVarcharDemo
The above query returns both rows.
more ▼

asked Feb 01, 2011 at 01:49 AM in Default

Jayanth gravatar image

Jayanth
27 1 1 2

I fixed the issue using the comment from William , I had used SQL collation which is langaguage specific , I needed to change it to Windows collation and set BIN2 in order for sql to understand the UTF 16 data and perform a bin level comparison on the nvarchar data regardless of UTF 8 or UTF 16.

The Below link helped understand the reasons http://msdn.microsoft.com/en-us/library/ms143350.aspx http://blogs.msdn.com/b/qingsongyao/archive/2009/04/10/sql-server-and-utf-8-encoding-1-true-or-false.aspx
Feb 01, 2011 at 03:16 AM Jayanth
(comments are locked)
10|1200 characters needed characters left

3 answers: sort voted first

You are having problems with collation comparisons not matching correctly.

Example:

-- collation changed to one that recognises the differences in the
-- example text you supplied

DECLARE @nVarcharDemo AS TABLE
    (id int,
     c2 nvarchar(90))

INSERT  INTO @nVarcharDemo
 -- Will Insert as "prhan" since you did't used "N"
        SELECT  1,
                CONVERT(nvarchar(90), N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʭ')
        UNION 
 -- Insert with 'N'
        SELECT  2,
                CONVERT(nvarchar(90), N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʪ')

-- Collation not forced = 2 rows
SELECT  *
FROM    @nVarcharDemo
WHERE   C2 = N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʭ'

-- Again, collation forced to use a binary sort = 1 row returned!
SELECT  *
FROM    @nVarcharDemo
WHERE C2 = N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʭ' COLLATE Latin1_General_BIN
more ▼

answered Feb 01, 2011 at 02:21 AM

WilliamD gravatar image

WilliamD
25.9k 17 19 41

The data I am trying to store is random UTF characters and not belonging to any specific language. Does the above solution still work ??
Feb 01, 2011 at 02:23 AM Jayanth

It only has to do with the collation of your database.

Try out a few different collations with my example code and you will see that it will return 1 or 2 rows depending.

I suggest you do a binary collation comparison to always find only one row.

It will depend upon the codepage and sort order required.

Try SELECT * FROM fn_helpcollations() to see what collations are available.

Trying my example with the collation Latin1_General_BIN also returned 1 row (example code changed to reflect this).
Feb 01, 2011 at 02:45 AM WilliamD
(comments are locked)
10|1200 characters needed characters left

It looks like unicode characters 681 to 686 are threated as the same character, though they look very different...

The below would logically select 0 but it does select 1

select case when nchar(681) = nchar(682) then 1 else 0 end

while

select case when nchar(681) = nchar(689) then 1 else 0 end

will select 0

Even "funnier":

select nchar(681)
union all
select nchar(682)
union all
select nchar(683)
union all
select nchar(684)
union all
select nchar(685)
union all
select nchar(686)

selects this:

ʩ
ʪ
ʫ
ʬ
ʭ
ʮ

while

select nchar(681)
union 
select nchar(682)
union 
select nchar(683)
union 
select nchar(684)
union 
select nchar(685)
union 
select nchar(686)

selects only one row:

ʩ

So it seems like SQL Server THINKS these are all the same character, though one can see by looking at them that they are different.

Though this doesn't answer your question - WHY is this happening? - it's a confirmation that it's indeed happening also on other installations than yours...
more ▼

answered Feb 01, 2011 at 02:27 AM

Magnus Ahlkvist gravatar image

Magnus Ahlkvist
16.6k 17 20 33

In case it helps , when inserting the data using Union instead of Union All SQl is able to identify the difference. Its only suring select that I face the issue.
Feb 01, 2011 at 02:30 AM Jayanth

@Magnus - it is a collation issue. I will bet the database collation is a language centric collation (default for SQL Server installations).

If he ran this on a SAP standard installation (their default is a binary sorted collation), the problem wouldn't occur.
Feb 01, 2011 at 02:48 AM WilliamD
(comments are locked)
10|1200 characters needed characters left

your example has N in both INSERTS but your comments seem to say something different. Do you need to review the TSQL?

Try this code:
INSERT INTO #nVarcharDemo -- Will Insert as "prhan" since you did't used "N" SELECT 1, CONVERT(NVARCHAR(90),'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʭ') UNION -- Insert with 'N' SELECT 2, CONVERT(NVARCHAR(90),N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʪ')

and your query only returns one row. try removing the N in N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0ʭ' from your code
more ▼

answered Feb 01, 2011 at 01:58 AM

Fatherjack gravatar image

Fatherjack ♦♦
42.8k 75 79 108

Hi

the values are Nvarchar data and distinct N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0**ʭ**' AND N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0**ʪ**'

However when I query for

N'¢§ķɵƆːƠØĆ˿ŷŏɟʄJ{ŖɰŤDŽʒgNjƁ˶Jčʴ!ǝ0**ʭ**'

I Still get both values.
Feb 01, 2011 at 02:08 AM Jayanth
I have edited my answer, I think your first insert is wrong
Feb 01, 2011 at 02:25 AM Fatherjack ♦♦
(comments are locked)
10|1200 characters needed characters left
Your answer
toggle preview:

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

SQL Server Central

Need long-form SQL discussion? SQLserverCentral.com is the place.

Topics:

x1853
x22

asked: Feb 01, 2011 at 01:49 AM

Seen: 1559 times

Last Updated: Feb 01, 2011 at 02:49 AM