If you write ajax applications and your pages are encoded in a non-UTF character set,
you will most probably find yourself in need of a conversion mechanism so that the data you send to server is encoded properly without damaging native characters.
Recently, I've been in a similar situation and I've written two methods (one server-side, one client-side) to sort out the issue.
Although my solution only covers Turkish character set (
iso8859-9) it may be generalized to suit your needs.
XMLHttpRequest uses
UTF-8 encoding to send data to the server.
You should normally use javascript's
escape function to convert the data you wish to send to something server does not confuse.
As you may know, Escaping a string replaces special charaters such as space, ampersand (&), percent (%) to their
UTF equivalents so that it will not damage the format of the
QueryString when post to the server.
Under normal circumstances, escaping the data before sending it to the server is enough to encode it. However in our special case (where we are using a non-utf charset -iso-8859-9- along with native characters) it is not enough. We need to convert the native Turkish characters to their UTF equivalents as well.
Here is how to do it:
function iso88599Escape(strText)
{
strText=escape(strText);
strText=strText.replace(/ı/g,"%C4%B1");
strText=strText.replace(/Ü/g,"%C3%9C");
strText=strText.replace(/ü/g,"%C3%BC");
strText=strText.replace(/ğ/g,"%C4%9F");
strText=strText.replace(/Ğ/g,"%C49E%");
strText=strText.replace(/ü/g,"%C3%BC");
strText=strText.replace(/Ü/g,"%C3%9C");
strText=strText.replace(/İ/g,"%C4%B0");
strText=strText.replace(/ş/g,"%C5%9F");
strText=strText.replace(/Ş/g,"%C5%9E");
strText=strText.replace(/ç/g,"%C3%A7");
strText=strText.replace(/Ç/g,"%C3%87");
strText=strText.replace(/ö/g,"%C3%B6");
strText=strText.replace(/Ö/g,"%C3%96");
return strText;
}
Though there is another caveat here:
We are sending the data in the QueryString to the server in
UTF format.
However the server is configured to interpret the data it received as if it were an
8 bit
iso8859-9 encoded string. When it comes to native characters, this encoding differs from unicode.
So we need another conversion method on the server to convert the
UTF data it received so that it becomes a properly encoded
iso8859-9 string.
A quick and dirty solution would be a brute-force replacement of misinterpreted character sequences:
public static string AjaxRequestToIso88599(string value)
{
return value.Replace("Ü","Ü"
).Replace("Åz","Ş"
).Replace("Äz","Ğ"
).Replace("Ç","Ç"
).Replace("İ","İ"
).Replace("Ö","Ö"
).Replace("ü","ü"
).Replace("ÅŸ","ş"
).Replace("ÄŸ","ğ"
).Replace("ç","ç"
).Replace("ı","ı"
).Replace("ö","ö");
}
I hear you say "There should be a better way to do it.
And yes, you are right.
Let us go one by one:
UTF data as byte array(ajax request)
-> [server (Request.QueryString)]
-> ISO-8859-9 encoded String
The data posted to the server (i.e. the querystring we just formed) is in
UTF-8 format.
Although server interprets it as if it were an Latin formatted stream (namely a stream with
iso-8859-9 charset). This creates those cryptic characters.
So we need to convert the String into what it once were: a
UTF String!
To do it, we first get the original byte array by decoding the incorrectly encoded String back to its bytes.
And then encode those bytes using
UTF.
public static string Iso88599ToUTF8(string value)
{
return Encoding.GetEncoding("UTF-8").GetString(
Encoding.GetEncoding("ISO-8859-9").GetBytes(value)
);
}
Easy cheesy!
One line of code and your String is properly converted.
afiyet olsun!
Other References- Special Turkish Alphabet Characters
- Jeppe's unicode page
- UTF8 Transformation chart
- JSPWiki UTF8 Issues
- Another UTF conversion table
bu yaziyi sevdin mi?
hemen
una ekle!