ColdFusion Muse

Handling Unicode Data types in MS SQL

Mark Kruger July 6, 2006 10:28 PM MS SQL Server Comments (8)

We have a customer who wants to support Asian languages. They have a lightweight CMS tool that they use to update portions of their website. Our first task was to duplicate this functionality for each language supported. Our first hurdle was the Chinese character set. We could update the DB directly through cut and paste, but the CF code we were using resulted in inscrutable question marks. We were using CFQUERYPARAM and none of the character types we tried worked. It looked as if we were up against a great wall.

Reaching back into the distant past I remembered something I had discovered years ago. While using the script generator to build a "create" script for a database I noticed that the "DROP" commands all looked like this:

<cfquery>
if exists
   (select * from dbo.sysobjects
    where    id = object_id(N'[dbo].[FK_architectural_feature_doors_architectural_features]')
    and    OBJECTPROPERTY(id, N'IsForeignKey') = 1
    )
...Drop constraint....
</cfquery>
I had wondered why all the table and constraint definitions (character variables in the code above) had a big N in front of them. I discovered that an N keeps the DB server from applying a code page. In other words, it allows the characters to be treated "as is" instead of trying to format them. In reality it stands for "Native Language" - and forces the server to not treat it as any particular language. Of course you still need to be using double byte data types (ntext, nvarChar nchar instead of text, varchar and char). Here's a blurb from MS books on line about it.

MS Books Online RE: Constants

Unicode strings
Unicode strings have a format similar to character strings but are preceded by an N identifier (N stands for National Language in the SQL-92 standard). The N prefix must be uppercase. For example, 'Michél' is a character constant while N'Michél' is a Unicode constant. Unicode constants are interpreted as Unicode data, and are not evaluated using a code page. Unicode constants do have a collation, which primarily controls comparisons and case sensitivity. Unicode constants are assigned the default collation of the current database, unless the COLLATE clause is used to specify a collation. Unicode data is stored using two bytes per character, as opposed to one byte per character for character data. For more information, see Using Unicode Data.

  • Share:

8 Comments

  • O?uz Demirkap?'s Gravatar
    Posted By
    O?uz Demirkap? | 7/6/06 9:44 PM
  • O?uz Demirkap?'s Gravatar
    Posted By
    O?uz Demirkap? | 7/6/06 9:45 PM
    As you can see in my name, same problem for BlogCFC code on SQL Server.
  • Jorrit's Gravatar
    Posted By
    Jorrit | 7/7/06 4:08 AM
    Better practice is to use CFQUERYPARAM.
    The only thing you have to do when using this is to modify your datasource settings:
    Click on "advanced settings" and check "Enable Unicode for data sources configured for non-Latin characters".
    That's all!
  • mkruger's Gravatar
    Posted By
    mkruger | 7/7/06 9:05 AM
    Jorrit,

    Excellent - thanks for that. We can save our cfqueryparam code.
  • Oğuz Demirkapı's Gravatar
    Posted By
    Oğuz Demirkapı | 7/8/06 9:18 PM
    Unicode Test

    ğĞ üÜ ıİ şŞ öÖ çÇ äÄ ß 시험
  • Oğuz Demirkapı's Gravatar
    Posted By
    Oğuz Demirkapı | 7/8/06 9:18 PM
    Looks good! :)
  • Mark's Gravatar
    Posted By
    Mark | 4/7/08 4:33 PM
    &#8482;
  • Jay Sylvester's Gravatar
    Posted By
    Jay Sylvester | 7/9/08 1:58 PM
    Do these characters work?

    <,>,",“,”,’,—,&,©,™