.

Wednesday, May 31, 2006

Database Normalization for Dummies #

I'm currently normalizing one of my client's database, and thought it's a nice time to mention about normalization.

Some of you may think that normalization is just for academic studies and does not apply to real-life situations. Some of you have "somewhat heard" about it. Some of you wonder whether it is a food to eat, or a country to visit.

Actually normalization is a way of optimizing data. While optimizing, you ensure data consistency at the same time. There are six levels of normalization which you can find more details in the wikipedia.

There are six levels of normalization. However most applications use third normal form (3NF) and therefore we will be dealing with the first three levels of normalization in this post.

1. First Normal Form (1NF)

This is the most elementary requirement. It states that data (the domain of attribute) should be atomic. No composite values are allowed. The database should contain a separate table for multi-valued data.

For instance the table below is not in its first normal form (and most of you are sure that it is not a good idea to store n favorite colors in n separate columns)

personfavorite color 1favorite color 2favorite color 2
Aliceredgreenblue
Bobcyanmagentablack


Now let us normalize it.

person
Alice
Bob

personfavorite color
Alicered
Alicegreen
Aliceblue
Bobcyan
Bobmagenta
Bobblack

More organized, isn't it?

Now let us move to the next level of normalization.

2. Second Normal Form (2NF)

Second normal form requires the data to be in the first normal form.

A table is said to be in the 2NF if and only if it is in the 1NF and every non-key column is irreducably (i.e. not partially) dependent on the primary key (1)

(from #1) 2NF applies to tables that have composite primary keys, where two or more columns comprise the primary key. Because if the key were a singleton key, then the relation would be guaranteed to be in at least 2NF by definition (#1), provided that it fullfills the criteria of being 1NF. (2)

Let us clarify with an example:

order number (PK)client id (PK)client nameorder total
11alice$100
21alice$120
32bob$130
42bob$130


Here, setting order number and client id as a composite primary key is correct (since they fully determine the relation: a client may have more than one order). The order total is also fully functionally dependent on the primary key.

However "client name" is only dependent on the client id and thus breaks the second normal form (2NF).

Why is it that important? Let us assume we have billions of those records, and all of a sudden alice decides to change her name to "Felicia"(well it's her name, and she can change it as she wishes and demands it from a court). Now you need to update half a billion rows, changing all instances of "Alice" to "Felicia".

Now let us put it into 2NF.

order number (PK)client id (PK)order total
11$100
21$120
32$130
42$130


client id (PK)client name
1Alice
2Bob


Now instead of a billion updates, you only need to update a single row.

And let us see the third normal form (3NF) finally.


3. Third Normal Form (3NF)

Third normal form states that, the data should be in the second normal form (2NF) first of all.
And in addition there should be no dependencies of a non-key column on a field other than the primary key.

Let us give an example by slightly modifying our first table given in the 2NF example.

order number (PK)client nameclient cityorder total
1aliceistanbul$100
2aliceistanbul$120
3bobizmir$130
4bobizmir$130


This table is in 2NF by definition (#2). Since there is one primary key, all columns are irreducably dependent on that key.

However the table breaks 3NF because client city is dependent on the non-key attribute "client name". Thus we need to take it apart.

Here is the proper 3NF form:

order number (PK)client nameorder total
1alice$100
2alice$120
3bob$130
4bob$130


client nameclient city
aliceistanbul
bobizmir


Bottom Line

If all of these seem too complicated, just keep these two in mind and 99% of the time you are done:

  1. Group logical parts apart.

  2. Create relations to protect referential integrity.

Cheers.

 bu yaziyi sevdin mi?  hemen una ekle!

Tuesday, May 30, 2006

some node analysis #

I've created node maps of various pages using aharef's dom visualizer applet.

The results are really interesting to share.

Let me start with sarmal.com. Here is how the site currently looks like:



and here follows the corresponding node graph for it:


What do all those colors mean?

blue
: for links (the A tag)
red: for tables (TABLE, TR and TD tags)
green: for the DIV tag
violet: for images (the IMG tag)
yellow: for forms (FORM, INPUT, TEXTAREA, SELECT and OPTION tags)
orange: for linebreaks and blockquotes (BR, P, and BLOCKQUOTE tags)
black: the HTML tag, the root node
gray: all other tags

...

When I first saw it I was surprised how such a complicated-looking site has this simple node structure. It's obviously because the site's layout is CSS-based.

We'll examine the node graph deeper but just to reassure I'll show the node structure of a table-based site with equal visual complexity (I cannot give the URL because the site owner is my client)

Here is the node structure ot that site:



As you can see the structure is way too much complicated. The red nodes in the middle are for three nested tables. Other red nodes located on the main trunks are various placeholding TR and TD elements.

The upper right corner is another nested table group, nested within the master table (the master table actually consists of three tables nested inside each other)

In addition we see a heavy use of images to make the site look pretty (which can better be done using css backgrounds -- the purples nodes represent images)

Let us split our original graph to regions:



Region 4 is the main content (page body describing what "sarmal" is in Turkish)
Region 7 is the header part and Region 5 is the footer part

It's really nice to observe how divs (the green nodes) group content together.

Region 3 is the head part of the HTML.

Region 8 (the dotted one) is the side link container (on the left). You can see links grouped using unordered lists (Region 1 and Region 2 as two examples, there are three other)
imho the best technique for creating a semantic navigation is using unordered lists.

Finally Region 6, an unordered list grouping six paragraphs, is also inside the main body. Thus the main body of the page can be visualized as a trunk starting from region 4 ang going all the way to region 6. While the leaves (which are regions 5, 7, and 8) are header, footer and side navigation respectively.

It may not be the best semantic design in the world. But having seen all these I'm enchanted with the beauty of its structure.

...

The second site we'll observe is sardalya.

Here is how the site looks:



And its corresponding node structure:


...

Finally let us observe this blog:


Looking at the graph, you can easily see how content is centered around container divs (the page is an archive page which contains 5 blog posts, it is nice seeing how those five posts -- the clouds that are centered around green dots -- i.e. divs -- are grouped together)

And if we just click the permalink of a single post, its node structure would be:




Its amazing!
...or at least I think so.

 bu yaziyi sevdin mi?  hemen una ekle!

Wednesday, May 24, 2006

use EncodeURIComponent when possible #

Though I cannot frequently update this blog, I try to add useful stuff so that others may not spend hours of development time, from the things that I have suffered already.

Turgay says regarding my post about ajax and charset conversion

Thats the article what i really look for it. Thank you very much for your help. By the way, it would be better to use encodeURIComponent() rather than escape() method. Then there should not written client-side iso function.

Thank you for pointing this out. You're absolutely right.

Actually I've been using that method in s@rdalya . Especially in XHRequest object which is used to send AJAX requests to the server.

You may want to look at s@rdalya wiki for examples and api documentation.

Haven't you tried it yet? (%100 yerli malı)

As a sidenote; for those Turkish guys out there, you may find a related reading on ileriseviye.org.

Cheers.

 bu yaziyi sevdin mi?  hemen una ekle!

Tuesday, May 16, 2006

modifying default behavior for elements #

Brent repliest to my former "never rely on javascript" post, reminding us there may be legitimite uses of overriding the default behavior:

This is actually the second time I've run across this entry, and I agreed with it the first time, but now I'm not so sure.

I'm in a situation where I'm being asked to disable enter on a form only under a particular circumstance. Briefly, I have a drop-down search results box, a la Google Suggest, and the powers that be want the enter key to select an option from that box, rather than attempt to submit the form. I think this is a reasonable request. The points you bring up about not using Javascript for security are completely valid, but there are cases where we might want to modify the default behavior for non-security related reasons. These ought to be taken into account.

I also want to add that this form does degrade gracefully, and that accessibility is not compromised in this scenario, because enter is only disabled while the user is performing a specific task (selected an option from the menu).

Well I'm not against that. I've written it as a reply to that post. But I think it's worth mentioning here once more:

Javascript is to enhance functionality, not to create. That's okay.

And by giving enter key an alternative function only for a particular element, you enhance the way user interacts with your form.

Moreover, Brent indicates that he has also made sure that the form works equally well without JavaScript.

So he is not obstructing anything. On the contrary he is enhancing a form element's functionality.

Modifying the default behaviour, or modifying anything for that matter, is not bad if you know what you are doing and if you have a valid and reasonable explanation for it.

Which leads us to another discussion that being a standardista does not necessarily require being a standards blind:

don't be a standards blind!

And if you're interested, here is a follow up describing the intentional non-standard exceptions I do on my site.

 bu yaziyi sevdin mi?  hemen una ekle!

Wednesday, May 10, 2006

More on encoding and ajax #

This is a follow-up to my prior post about ajax and charset conversion.

If you have'nt read it you had better take a look at it. Else you may be lost from the beginning.

Let us see where we left:

We created a custom method that converts an improperly formatted iso8859-9 string to a properly formatted utf-8 string.

To remember here is our final method:

public static string Iso88599ToUTF8(string value)
{
return Encoding.GetEncoding("UTF-8").GetString(
Encoding.GetEncoding("ISO-8859-9").GetBytes(value)
);
}

Which converts a UTF-8 ajax response to a ISO-8859-9 Turkish string without loss.
(
provided that your request encoding and response encoding are set to ISO-8859-9 in your web.config file as well as you have proper globalization settings:

Here is the necessary part of my web.config:

[system.web>
[globalization
requestEncoding="iso-8859-9"
responseEncoding="iso-8859-9"
fileEncoding="iso-8859-9"
culture="tr-TR"
uiCulture="tr-TR" /]
[/system.web]

)

Well, I thought I had solved. But that was only the beginning of the story.

I needed to

1. Write this data to a DB (in which data was iso-8859-9 encoded)
2. Retrieve the data from the DB.

The first round was to convert the UTF-8 string that Iso88599ToUTF8 method above returns to an iso-8859-9 string (which is my request and response encoding)

public static string ProperUTF8ToIso88599(string properutf8string)
{
return Encoding.GetEncoding("ISO-8859-9").GetString(
Encoding.Convert(
Encoding.GetEncoding("UTF-8"),Encoding.GetEncoding("ISO-8859-9"),
Encoding.GetEncoding("UTF-8").GetBytes(properutf8string)
)
);
}


This method converts a properly encoded UTF-8 string to an ISO-8859-9 string.

Then to make things more modular I combined those two methods:

public static string UTF8AsIso88599ToProperIso88599(string utf8actinglikeiso88599)
{
return ProperUTF8ToIso88599(Iso88599ToUTF8(utf8actinglikeiso88599));
}

/* give the method a human-friendly alias */
public static string Ajaxify(string value)
{
return ResponseStringToServerString(UTF8AsIso88599ToProperIso88599(value));
}


That's it!
Calling Ajaxify on any mis-encoded UTF-8 string will create a properly encoded ISO-8859-9 string.


(if you are not lost up to this point, I assure you will in the next few paragraphs)


After working for hours, I found some other odd things about my web server.
  1. Although my response encoding is ISO-8859-9, the web application was sending an ISO-8859-1 encoded byte array to the database. Since data in the db is stored as a ISO-8859-9 encoded byte array this results in data loss when storing.
  2. Similarly, although my request encoding is ISO-8859-9,when the web application reads data from the database, it reads the bytes as if they were an ISO-8859-1 encoded stream (they are ISO-8859-9 encoded however). Thus another data loss.

Let us handle with each step separately:

step 1 - Writing to the database:
  1. If I extract the bytes of my iso-8859-9 ecoded string (using iso-8859-9 encoding); I will have an iso-8859-9 byte array (say it is byte[] b]
  2. Then if I convert those bytes to an iso-8859-1 encoded string, since the database tier will create an iso-8859-1 encoded byte array out of it; the bytes transferred to the db will be exactly identical to byte[] b, without any loss.
Hard to tell. Let us demonstrate it with code:

public static string ResponseStringToServerString(string value)

{
return Encoding.GetEncoding(
"ISO-8859-1").GetString(
Encoding.GetEncoding("ISO-8859-9"
).GetBytes(value)
);
}


The method will create an improperyl encoded iso-8859-1 string.

The server (actually the adodb command object) decodes it to a byte array when sending to the database with something like:

Encoding.GetEncoding("ISO-8859-1").GetBytes(strValue)

And streams it as a byte array.

let us plug the former method into this.

Encoding.GetEncoding("ISO-8859-1").GetBytes(
Encoding.GetEncoding(
"ISO-8859-1").GetString(
Encoding.GetEncoding("ISO-8859-9"
).GetBytes(value)
)

)


If you play it for some time you will see that the result is identical to

Encoding.GetEncoding("ISO-8859-9").GetBytes(value)

since the conversions cancel out. Which means that if we use

ResponseStringToServerString(strValue);

and strValue is an ISO-8859-9 string; then it will be automagically entered to the db as a ISO-8859-9 byte array.


That ends round #1.

step 2 - Reading from the database and displaying it:

I will not eloborate on this part much, since it is simply the inverse of what we do in the former step.

public static string ServerStringToResponseString(string value)
{
return Encoding.GetEncoding(
EnvironmentVariable.DatabaseCodePage).GetString(
Encoding.GetEncoding(EnvironmentVariable.DataReadCodePage
).GetBytes(value)
);
}


will convert the stream coming from the database (from a DataReader or some similar object), to a properly encoded ISO-8859-9 string.

And as a final step, let us give them human-friendly aliases:

public static string ToServer(string value)
{
return ResponseStringToServerString(value);
}

public static string ToResponse(string value)
{
return ResponseStringToServerString(value);
}


Bottom Line

Sorting out encoding issues is a real pain and I believe there are many more unique configuration-specific instances.

I tried to demonstrate how to approach to a particular encoding problem. What you do is mainly playing with various combinations of byte arrays, streams and strings.

What you need is mainly:
  • Luck
  • Concentration
  • Strong nerves
  • and patience
Just my 2c

If you can, avoid all those complications and use UTF as your default encoding. Then you will have more time on basic needs like eating, sleeping...

 bu yaziyi sevdin mi?  hemen una ekle!

Tuesday, May 09, 2006

A simple class to get the IP address of the visitor #

Recently I needed to get the IP address of the visitor to an ASP.Net (C#) web application to log into the a hit log table in the db.

Though it is fairly easy; it has some tricky points.




Here is the class to get the IP of the client:


using System.Web;

/*
* sarmal.com - 2006 (C)
* http://www.sarmal.com/
*/

namespace com.sarmal.net
{
public sealed class AddressInfo
{
public static string getIP(HttpRequest request)
{
string REMOTE_ADDR = request.ServerVariables["REMOTE_ADDR"];
/*
* if the request is redirected through
* a proxy, then the correct IP is this one.
*/
string HTTP_X_FORWARDED_FOR = request.
ServerVariables["HTTP_X_FORWARDED_FOR"];


if (HTTP_X_FORWARDED_FOR != null)
{
return HTTP_X_FORWARDED_FOR;
}
else
{
return REMOTE_ADDR;
}
}

private AddressInfo()
{
}
}
}





The code is self explanatory, I suppose.

calling AddressInfo.getIP(); anywhere in your asp.net web application will give the user's proper IP address.

Hope that it helps someone.

 bu yaziyi sevdin mi?  hemen una ekle!



Recent Posts

RSS

RSS register icon

Other Blogs

Archive

Various

Sponsor

Profile Information

Browser I Suggest

Sponsor

Dikkatimi Çekenler