question

ayahmsa avatar image
ayahmsa asked

How can I successfully read unicode with special characters from Oracle db to SQL db through SSIS?

I created a dataflow task which has a source oledb connection to Oracle and a destination oledb to SQL server. While testing the data I noticed that special characters in Oracle are being inserted as ? in SQL server, next I filtered the data of the source query to check only one record and when I did a preview in SSIS it showed with the ? instead of the special character. I tried several things: - Changed the code page to 65001 instead of 1252 but that didn't work. - Changed the source data type from DT_STR to DT_WSTR but still didn't show correctly. - Changed the source driver from "Microsoft OLE DB Provider for Oracle" to "Oracle Provider for OLE DB", however the processing is SO slow it took over 2 hours to process around 3 million records. P.S. The source data type is varchar2 Any suggestions? More info: The column in question is ITEM_DSCR. Source (Oracle): COLUMN_NAME DATA_TYPE DATA_LENGTH DATA_PRECISION NULLABLE ITEM_DSCR VARCHAR2 120 NULL Y Source OLEDB Connection (SSIS): I changed it to DT_WSTR from advanced editor for OLEDB connection, under OLEDB Source Output (Output Columns). P.S. every time I change the same under OLEDB Source Output (External Columns) it gets changed back automatically to DT_STR. Examples of the special characters involved: ¹ ¬ ¦ ¿
ssisoledbunicodevarchar2
9 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Did you try source data type to be nvarchar2 ?
3 Likes 3 ·
The destination columns were varchar but I changed them to nvarchar. I changed the code-page of the ole source db component from its properties which in turn changed it for all the columns altogether but of course using DT_WSTR omits the code-page property.
0 Likes 0 ·
I can't change anything in the source (Oracle side) as it is out of my control, or do you mean the source oledb connection in SSIS?
0 Likes 0 ·
The issue is most likely finding the correct database Character Set being used in Oracle (since the field is `VARCHAR2`) and then finding the matching collation on the SQL Server side for use with `VARCHAR`. So what Character Set is being used on the Oracle side?
0 Likes 0 ·
Changing the source to `DT_WSTR` probably won't help since `VARCHAR2` should equate to `DT_STR`, while `NVARCHAR2` would equate to `DT_WSTR` (but you aren't using `NVARCHAR2`). This is also probably why it keeps changing back to `DT_STR` each time. Please find out the character set being used in the Oracle database for `CHAR` and `VARCHAR2` fields, and we can go from there.
0 Likes 0 ·
Show more comments
srutzky avatar image
srutzky answered

What we know is this:

  • the source column is of type: `VARCHAR2`.
  • the character encoding for `CHAR` and `VARCHAR2` data is: `AL32UTF8`
  • `AL32UTF8` is full `UTF-8` (variable width Unicode encoding)
  • `UTF-8` equates to Code Page: `65001`
  • Source column _cannot_ change to `NVARCHAR2`

What to do:

  • Destination column needs to be of type `NVARCHAR`
  • source SSIS datatype should be `DT_WSTR` (according to Integration Services Data Types), but might want to also try `DT_NTEXT`
  • You might need to force the datatype on the way out of Oracle by using `UNISTR(ITEM_DSCR)` in the source SQL, as discussed in this thread: SSIS - Shows DT_WSTR for Non-Unicode Oracle Source. Be sure to refresh the columns after making this change, especially if SSIS keeps changing the column type from `DT_WSTR` to `DT_STR`.

Additional Resources:

2 comments
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@srutzky Apologies for the delay, I got caught up fixing other stuff. You are reading my mind, these are the exact steps and thoughts that I did. I even tried to force the source using UNISTR but it still isn't working. Is there a special way to refresh the columns? mb I'm not refreshing correctly although I think I do. I just tried extracting from the source using TO_CHAR but with no avail still. I will give the Attunity connector a try and let you know how that goes...
0 Likes 0 ·
@ayahmsa not sure about the refresh. maybe, just to test, try creating a brand new source and starting with the query that has `UNISTR`? After changing to UNISTR, did you go back and make sure that the field was marked as `DT_WSTR` and not `DT_STR`? And that the `CodePage` was set to `65001`? I don't have time to play with SSIS, but your scenario, at least with the details provided so far, is the same as the last 2 comments on that "SSIS - Shows DT_WSTR..." link. Did you switch back to using the "Microsoft OLE DB Provider for Oracle"?
0 Likes 0 ·
ayahmsa avatar image
ayahmsa answered
The strange thing is that this is working fine on PROD environment which is using older setup (DTS on SQL Server 2000, calling a stored proc which uses an Open Query to linked server). I also have to note that the destination datatype on PROD is varchar and is storing the special characters perfectly! :D However moving to DEV (SSIS 2012) it is not working properly.
10 |1200 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.