More on encoding and ajax
This is a follow-up to my prior post about ajax and charset conversion.
If you have'nt read it you had better take a look at it. Else you may be lost from the beginning.
Let us see where we left:
We created a custom method that converts an improperly formatted iso8859-9 string to a properly formatted utf-8 string.
To remember here is our final method:
Which converts a UTF-8 ajax response to a ISO-8859-9 Turkish string without loss.
(
provided that your request encoding and response encoding are set to ISO-8859-9 in your web.config file as well as you have proper globalization settings:
Here is the necessary part of my web.config:
[system.web>
[globalization
requestEncoding="iso-8859-9"
responseEncoding="iso-8859-9"
fileEncoding="iso-8859-9"
culture="tr-TR"
uiCulture="tr-TR" /]
[/system.web]
)
Well, I thought I had solved. But that was only the beginning of the story.
I needed to
1. Write this data to a DB (in which data was iso-8859-9 encoded)
2. Retrieve the data from the DB.
The first round was to convert the UTF-8 string that Iso88599ToUTF8 method above returns to an iso-8859-9 string (which is my request and response encoding)
public static string ProperUTF8ToIso88599(string properutf8string)
{
return Encoding.GetEncoding("ISO-8859-9").GetString(
Encoding.Convert(
Encoding.GetEncoding("UTF-8"),Encoding.GetEncoding("ISO-8859-9"),
Encoding.GetEncoding("UTF-8").GetBytes(properutf8string)
)
);
}
This method converts a properly encoded UTF-8 string to an ISO-8859-9 string.
Then to make things more modular I combined those two methods:
public static string UTF8AsIso88599ToProperIso88599(string utf8actinglikeiso88599)
{
return ProperUTF8ToIso88599(Iso88599ToUTF8(utf8actinglikeiso88599));
}
/* give the method a human-friendly alias */
public static string Ajaxify(string value)
{
return ResponseStringToServerString(UTF8AsIso88599ToProperIso88599(value));
}
That's it!
Calling Ajaxify on any mis-encoded UTF-8 string will create a properly encoded ISO-8859-9 string.
(if you are not lost up to this point, I assure you will in the next few paragraphs)
After working for hours, I found some other odd things about my web server.
Let us handle with each step separately:
step 1 - Writing to the database:
public static string ResponseStringToServerString(string value)
{
return Encoding.GetEncoding(
"ISO-8859-1").GetString(
Encoding.GetEncoding("ISO-8859-9"
).GetBytes(value)
);
}
The method will create an improperyl encoded iso-8859-1 string.
The server (actually the adodb command object) decodes it to a byte array when sending to the database with something like:
Encoding.GetEncoding("ISO-8859-1").GetBytes(strValue)
And streams it as a byte array.
let us plug the former method into this.
Encoding.GetEncoding("ISO-8859-1").GetBytes(
Encoding.GetEncoding(
"ISO-8859-1").GetString(
Encoding.GetEncoding("ISO-8859-9"
).GetBytes(value)
)
)
If you play it for some time you will see that the result is identical to
Encoding.GetEncoding("ISO-8859-9").GetBytes(value)
since the conversions cancel out. Which means that if we use
ResponseStringToServerString(strValue);
and strValue is an ISO-8859-9 string; then it will be automagically entered to the db as a ISO-8859-9 byte array.
That ends round #1.
step 2 - Reading from the database and displaying it:
I will not eloborate on this part much, since it is simply the inverse of what we do in the former step.
public static string ServerStringToResponseString(string value)
{
return Encoding.GetEncoding(
EnvironmentVariable.DatabaseCodePage).GetString(
Encoding.GetEncoding(EnvironmentVariable.DataReadCodePage
).GetBytes(value)
);
}
will convert the stream coming from the database (from a DataReader or some similar object), to a properly encoded ISO-8859-9 string.
And as a final step, let us give them human-friendly aliases:
public static string ToServer(string value)
{
return ResponseStringToServerString(value);
}
public static string ToResponse(string value)
{
return ResponseStringToServerString(value);
}
Bottom Line
Sorting out encoding issues is a real pain and I believe there are many more unique configuration-specific instances.
I tried to demonstrate how to approach to a particular encoding problem. What you do is mainly playing with various combinations of byte arrays, streams and strings.
What you need is mainly:
If you can, avoid all those complications and use UTF as your default encoding. Then you will have more time on basic needs like eating, sleeping...
bu yaziyi sevdin mi?
hemen
una ekle!
If you have'nt read it you had better take a look at it. Else you may be lost from the beginning.
Let us see where we left:
We created a custom method that converts an improperly formatted iso8859-9 string to a properly formatted utf-8 string.
To remember here is our final method:
public static string Iso88599ToUTF8(string value)
{
return Encoding.GetEncoding("UTF-8").GetString(
Encoding.GetEncoding("ISO-8859-9").GetBytes(value)
);
}
Which converts a UTF-8 ajax response to a ISO-8859-9 Turkish string without loss.
(
provided that your request encoding and response encoding are set to ISO-8859-9 in your web.config file as well as you have proper globalization settings:
Here is the necessary part of my web.config:
[system.web>
[globalization
requestEncoding="iso-8859-9"
responseEncoding="iso-8859-9"
fileEncoding="iso-8859-9"
culture="tr-TR"
uiCulture="tr-TR" /]
[/system.web]
)
Well, I thought I had solved. But that was only the beginning of the story.
I needed to
1. Write this data to a DB (in which data was iso-8859-9 encoded)
2. Retrieve the data from the DB.
The first round was to convert the UTF-8 string that Iso88599ToUTF8 method above returns to an iso-8859-9 string (which is my request and response encoding)
public static string ProperUTF8ToIso88599(string properutf8string)
{
return Encoding.GetEncoding("ISO-8859-9").GetString(
Encoding.Convert(
Encoding.GetEncoding("UTF-8"),Encoding.GetEncoding("ISO-8859-9"),
Encoding.GetEncoding("UTF-8").GetBytes(properutf8string)
)
);
}
This method converts a properly encoded UTF-8 string to an ISO-8859-9 string.
Then to make things more modular I combined those two methods:
public static string UTF8AsIso88599ToProperIso88599(string utf8actinglikeiso88599)
{
return ProperUTF8ToIso88599(Iso88599ToUTF8(utf8actinglikeiso88599));
}
/* give the method a human-friendly alias */
public static string Ajaxify(string value)
{
return ResponseStringToServerString(UTF8AsIso88599ToProperIso88599(value));
}
That's it!
Calling Ajaxify on any mis-encoded UTF-8 string will create a properly encoded ISO-8859-9 string.
(if you are not lost up to this point, I assure you will in the next few paragraphs)
After working for hours, I found some other odd things about my web server.
- Although my response encoding is ISO-8859-9, the web application was sending an ISO-8859-1 encoded byte array to the database. Since data in the db is stored as a ISO-8859-9 encoded byte array this results in data loss when storing.
- Similarly, although my request encoding is ISO-8859-9,when the web application reads data from the database, it reads the bytes as if they were an ISO-8859-1 encoded stream (they are ISO-8859-9 encoded however). Thus another data loss.
Let us handle with each step separately:
step 1 - Writing to the database:
- If I extract the bytes of my iso-8859-9 ecoded string (using iso-8859-9 encoding); I will have an iso-8859-9 byte array (say it is byte[] b]
- Then if I convert those bytes to an iso-8859-1 encoded string, since the database tier will create an iso-8859-1 encoded byte array out of it; the bytes transferred to the db will be exactly identical to byte[] b, without any loss.
public static string ResponseStringToServerString(string value)
{
return Encoding.GetEncoding(
"ISO-8859-1").GetString(
Encoding.GetEncoding("ISO-8859-9"
).GetBytes(value)
);
}
The method will create an improperyl encoded iso-8859-1 string.
The server (actually the adodb command object) decodes it to a byte array when sending to the database with something like:
Encoding.GetEncoding("ISO-8859-1").GetBytes(strValue)
And streams it as a byte array.
let us plug the former method into this.
Encoding.GetEncoding("ISO-8859-1").GetBytes(
Encoding.GetEncoding(
"ISO-8859-1").GetString(
Encoding.GetEncoding("ISO-8859-9"
).GetBytes(value)
)
)
If you play it for some time you will see that the result is identical to
Encoding.GetEncoding("ISO-8859-9").GetBytes(value)
since the conversions cancel out. Which means that if we use
ResponseStringToServerString(strValue);
and strValue is an ISO-8859-9 string; then it will be automagically entered to the db as a ISO-8859-9 byte array.
That ends round #1.
step 2 - Reading from the database and displaying it:
I will not eloborate on this part much, since it is simply the inverse of what we do in the former step.
public static string ServerStringToResponseString(string value)
{
return Encoding.GetEncoding(
EnvironmentVariable.DatabaseCodePage).GetString(
Encoding.GetEncoding(EnvironmentVariable.DataReadCodePage
).GetBytes(value)
);
}
will convert the stream coming from the database (from a DataReader or some similar object), to a properly encoded ISO-8859-9 string.
And as a final step, let us give them human-friendly aliases:
public static string ToServer(string value)
{
return ResponseStringToServerString(value);
}
public static string ToResponse(string value)
{
return ResponseStringToServerString(value);
}
Bottom Line
Sorting out encoding issues is a real pain and I believe there are many more unique configuration-specific instances.
I tried to demonstrate how to approach to a particular encoding problem. What you do is mainly playing with various combinations of byte arrays, streams and strings.
What you need is mainly:
- Luck
- Concentration
- Strong nerves
- and patience
If you can, avoid all those complications and use UTF as your default encoding. Then you will have more time on basic needs like eating, sleeping...
bu yaziyi sevdin mi?
hemen
una ekle!
- permalink: 10:53 AM


1 Coments
Post a Comment
Links to this post:
Create a Link
<< Home