UTF-8

Having grown up in a society evolved beyond the confines of 7-bit ASCII and lived through the nightmare of codepages as well as cursed the illiterate that have so little to say they can manage with 26 letters in their alphabet, I was pleased when I read Joel Spolsky’s tutorial on Unicode. It was a relief – finally somebody understood.

Years passed and I thought I knew now how to do things right. And then I had to do Windows C in anger and was lost in the jungle of wchar_t and TCHAR and didn’t know where to turn.

Finally I found this resource here:

http://utf8everywhere.org/

And the strategies outlined to deal with UTF-8 in Windows are clear:

  • Define UNICODE and _UNICODE
  • Don’t use wchar_t or TCHAR or any of the associated macros. Always assume std::string and char * are UTF-8. Call Wide windows APIs and use boost nowide or similar to widen the characters going in and narrowing them coming out.
  • Never produce any text that isn’t UTF-8.

Do note however that the observations that Windows would not support true UTF-16 are incorrect as this was fixed before Windows 7.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s