Re: UTF-8 Coding of MSSQL 2005 DB

Re: UTF-8 Coding of MSSQL 2005 DB

am 15.04.2008 16:47:54 von raymond.mancy

On Mar 26, 11:11 am, Shawn Beasley wrote:
> Hi List,
>
> I am searching franticly for a solution (or the procedure) to setting
> the coding of a new DB to UTF-8. I can find no setting in the Server
> Manager, during creation of the DB, to influence this. Can someone
> please show me the way? Thanks
>
> --Shawn


I have converted my data from Latin 1 to UTF-8 in MSSQL.

The first thing to realise is (as has already been mentioned), MSSQL
uses UCS-2, a 2 byte Unicode encoding scheme. Not UTF-8, but we can
still use it.

What I did was create another table with an ntext column type (instead
of the old text used for Latin 1, of course your column in question
may be varchar etc, so you will need to create a table with a column
of type nvarchar etc).

What I then did is create a script to pull the data out of the old
table and insert it into the new table with the ntext column.
For each line I pulled out I used a function to encode it into UTF-8
(I was using php, maybe you could use
Convert::Scalar::utf8_encode() ? ). For that line I executed an SQL
statement to insert the line into the new table, however I prefixed
the Letter 'N' to the value of the newly UTF-8 encoded string. This
tells the db to treat the string as a unicode constant and not to
apply any of its own formatting/encoding to the string.

Now the newly encoded values in your new table may look like garbage
through the MSSQL gui, but the data is correctly encoded in UTF-8 and
will display in a webpage with UTF-8 content-type headers.

Maybe there is an easier way to do it, but this worked for me.

Hope that helps at least a little.
Raymond