question

Dick Rosenberg avatar image
Dick Rosenberg asked

Characters getting corrputed copying from nvarchar to text

SQL Server 2005. I was asked to get the contens of an Excel spreadsheet into a SQL Server table my manager had set up that had a couumn with datatype text (temp_catalogue_master). After a few false starts I ended up using the SSIS Import Wizard which created a table (UG_Programs) where the column in question (catalog_text) had a data type of nvarchar(max). I wrote a simple query to populate temp_catalogue_master from ug_programs insert into dbo.TEMP_CATALOG_MASTER select convert(char(10),CRS_CDE), convert(text,CATALOG_TEXT) from dbo.UG_Programs That worked fine but there is a problem with the data in the catalog_text column in temp_catalog_master. wherever the was the combination of letters "fi" on the ug_Programs side the is a question mark "?" on the temp_catalog_master side. Can anybody explan this and, better yet, tell me how I can tweak my Insert query so that the data will be populated coorectly in the column with data type Text? Thanks, Dick Rosenberg
datatypesunicode
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

ThomasRushton avatar image
ThomasRushton answered
The "n" in "nvarchar" indicates "unicode" - ie 2 bytes per character. And that text data is normal 1-byte per character ascii. Try using "ntext" rather than "text" in your target database schema and "convert" statement. [ http://msdn.microsoft.com/en-us/library/ms187752.aspx][1] for more on SQL Server data types (albeit SQL 2008, but the concepts are good) [ http://msdn.microsoft.com/en-us/library/aa933104(SQL.80).aspx][2] for more on Unicode data types (SQL2000, but...) --edit-- Oh, and for more on Unicode, check [Joel Spolsky's blog from 2003][3]... [1]: http://msdn.microsoft.com/en-us/library/ms187752.aspx [2]: http://msdn.microsoft.com/en-us/library/aa933104(SQL.80).aspx [3]: http://www.joelonsoftware.com/articles/Unicode.html
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Dick Rosenberg avatar image
Dick Rosenberg answered
I completely understand that. what I don't understand is why the combination of the letter "f" followed by the letter "i" (both valid ascii values, not in some esoteric alophabet that Unicode was designed to handle) on the nvarchar side are showing up as question marks on the text side (all the other "f"s and "i"s are fine. Dick
1 comment
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

(use the "add new comment" button - it makes it easier to follow the thread...) The only thing I can think of for that "fi" thing is if there's some language / word out there that has a separate character that something has autocorrected from "fi"... In the original text, can you select the "f" & "i" individually?
0 Likes 0 ·
Dick Rosenberg avatar image
Dick Rosenberg answered
I'll provide some more information. I thought, "What if I save my spreadsheet in a text format and tried to work with the text file?". Well, doing that puts the question marks in where the fi combinations were (at least when viewed by Notepad).
3 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Right, so that demonstrates that the input data is a non-ascii character, which means you're going to have problems storing it as text. Sanitise your input - see if you can do a search & replace for whatever-the-hell that fi thing is and swap it with the individual letters before bringing it into SQL Server.
0 Likes 0 ·
ligature, if I remember correctly from my quark xpress days :)
0 Likes 0 ·
http://www.google.co.uk/search?sourceid=chrome&ie=UTF-8&q=define:ligature agrees. Thanks. I was getting wrapped up in diacriticals, and diaresises and things like that. :-)
0 Likes 0 ·
Dick Rosenberg avatar image
Dick Rosenberg answered
Oh, and another thing in response to ThomasRushton's answer. The ultimate destination of the data in question is a database provided by the vendor of our system, so I have no control over the datatype in that table. It has to be text. Thanks, Dick
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Dick Rosenberg avatar image
Dick Rosenberg answered
ThomasRushton put something in an email the system sent me (which I can't see from here. I'm new to this system). His comment was The only thing I can think of for that "fi" thing is if there's some language / word out there that has a separate character that something has autocorrected from "fi"... In the original text, can you select the "f" & "i" individually? I believe he is correct. when I open up the Excel spreadsheet I cannot select the "f" and the "i" individually. I kind of reached the same realization. In my text column I converted all of the question marks to "fi". I then selected from each table and saved the results as .csv files. I then used a file compare utility to compare them. It showed all the lines that originally had the question mark as being different than the lines that came from the Unicode table. And I couldn't separate select the "f" from the "i" in that utility program either. So the problem is solved (or at least explained). Thank you ThomasRushton and I will try and give you a positive vote. Dick
2 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

And don't forget the "tick" under the thumbs-up! Shows that it was the right answer...
0 Likes 0 ·
Oh, and I wrote that thing that you were notified about in a comment on one of your non-answers...
0 Likes 0 ·

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.