SQLexpress 2008 doesn't recognize my U0242 (glottal stop)
Hi, I have a problem that is driving me crazy. No matter how I form the query (or what the collation), I can't seem to get the glottal stop character to be searchable. I've tried escaping it, changing collations, changing codepages, custom stoplists, etc. and can't seem to find a solution. Any ideas? Thanks!
@guitar_guy The query you try: Select * from Table WHERE Column LIKE N'ɂ' is supposed to return zero rows because there are no wildcards and therefore the LIKE acts like equal sign, meaning you are trying to find rows which have a value of ɂ exactly. The way the glottal stop character is stored depends on how it was inserted. If the column is NVARCHAR column (2 bytes per letter) then the actual value for glottal stop will be either 0x4202 in case if N'' was used when inserting or 0x3F00 if it was not (sadly, 0x3F is just a regular question mark so if the N'' was not used then the value is lost as it was translated to regular question mark). If the column is VARCHAR column then the value is always 0x3F for that character. Here is the sample to show the difference (please note the 2 different ways used to insert the values): declare @t table (col1 nvarchar(3)); insert into @t values (N'A'), -- ɂ not present (N'Bɂ'), -- begins with ɂ inserted as nvarchar (N'ɂC'), -- ends with ɂ inserted as nvarchar (N'DɂE'), -- contains ɂ inserted as nvarchar ('Fɂ'), -- begins with ɂ inserted as varchar, ɂ = ? due to info loss ('ɂG'), -- ends with ɂ inserted as varchar, ɂ = ? due to info loss ('HɂK'); -- contains ɂ inserted as varchar, ɂ = ? due to info loss select col1, cast(col1 as varbinary(4)) stored_as from @t; -- results: col1 stored_as ---- ---------- A 0x4100 Bɂ 0x42004202 ɂC 0x42024300 DɂE 0x44004202 F? 0x46003F00 ?G 0x3F004700 H?K 0x48003F00 Please note that the glottal stop is either inserted correctly as 0x4202 or as a plain question mark 0x3F00 depending on whether N'' was used or not. Unfortunately, using **LIKE '%ɂ%'** returns incorrect rows (the ones with regular question mark) and using **LIKE N'%ɂ%'** returns all rows as you already discovered. The workaround is to use something else in place of LIKE. For example, the following will do the trick: select * from @t where charindex(0x4202, cast(col1 as varbinary(max))) > 0; -- results are correct (rows 2, 3, and 4 are returned): col1 ---- Bɂ ɂC DɂE You don't have to cast to varbinary(max) but can cast to varbinary(your\_column\_size) instead. **EDIT** Actually, I just tried the binary collation and it works just fine, so it would be much easier than casting to varbinary: select * from @t where col1 like N'%ɂ%' collate SQL_Latin1_General_Cp850_BIN; -- results are also correct (rows 2, 3, and 4 are returned): col1 ---- Bɂ ɂC DɂE Hope this helps. Oleg
Thanks, Oleg. Actually, I found that it turns out there are 3 unicode code positions for glottal stop: u294, u241, and u242 (don't recall the hex values). First two are uppercase, and 242 is lowercase, but belongs with 241. These are new(er) than u294. You are bang on about the collations. I also found that using Latin_100 also worked. Anything in the "100" collations are unicode 5.0 compliant, so that's what I ended up using. BINs also work, as you found. Cheers, Chris