Introduction: Normalization of a Database to Third Normal Form

The estimated time to complete this instructable, Including the end excercise should be anywhere from 10 to 20 minutes

What Is Normalization?

Normalization is the process of removing redundancies and dependencies from a database. This instructable will explain how to normalize a database to 3rd normal form.

Step 1: Terms to Be Covered

Anomalies: Update, Edit, & Delete

Multi-value attributes

Functional Dependency

Transitive Dependency

Primary Keys

Concatenate Keys

Foreign Keys

First Normal Form

Second Normal Form

Third Normal Form

Step 2: Anomalies

In order to explain why you may want to implement normalization to your database, it is important to know the general flaws a non-normalized database may encounter.

A database can be affected by three anomalies: update, insert, and delete.

Update Anomaly

The update anomaly occurs when we try to update a field in the database. If there happens to be multiple rows of data that include a related attribute, then it would be necessary to update each individual field in the rows that were affected if an update were to occur.

Insert Anomaly

The insert anomaly occurs when a new data set cannot be inserted into the database because it requires a field that, at the moment, does not have an attribute associated with it.

Delete Anomaly

The delete anomaly occurs when there is a row of data in which deleting one attribute results in the deletion of another attribute. To combat this, a person would generally place a dummy variable as a placeholder, which is not the most efficient means of storing data.

Think about it

Assume the table above is your entire database. Think about how and why this database is flawed in respects to the three anomalies. For example, if you needed to delete the fifth order, you would consequently also delete the customer Tom and all his information from your system.

Step 3: First Normal Form

First normal form is the method in which to remove multi-valued attributes from a data set.

What is an attribute?

An attribute is a characteristic of the data in the table, describing a field or cell in a table.

Multi-value Attribute

A multi-value attribute occurs when a single field serves as a placeholder for multiple attributes. An example of this can be seen in the figure above.

In the figure above, the attribute of children can have multiple values. This is an example of a multi-valued attribute, and the use of them can restrict the functionality of a database. If a query was created to figure out what children belong to john, the database could provide the correct answer.

However, if we were to try and create a query that was to determine which parent Adam belonged to, it would fail to give a response. This happens because when the query reads the attribute of Johns children, it will read "Adam, Sam, Dean". It will not be able to separate the multiple values given.

Step 4: Second Normal Form

The requirements to meet second normal form is that the database must be in first normal form and have full functional dependency.

Functional Dependency

Functional dependency occurs when all non-key attributes are dependent on the primary key. So if a table has only one primary key, it is fully functional dependent. The figure above does not meet the requirements of second normal form because the non-primary attribute (Item Name) is only dependent on the primary key (Item #). This one table should be broken into two.

What is a primary key?

A primary key is a unique identifier for each row of data in a relational database. In addition, to be a primary key, the the identifier can never be null, or empty.

Concatenated Key

Generally a primary key is one attribute, but is some instances a table will require the combination of two or more attribute fields to uniquely identify a row of data. This combination is called a concatenated primary key. A good example for the use of a concatenated key would be in a purchase order of more than one item. To record the quantity of each item purchased, it would be necessary to identify each record with the order number and the product number.

What is a foreign key?

A foreign key is a primary key from another table.

Step 5: Third Normal Form

A database is in third normal form when it meets the requirements of second normal form, in addition to having no transitive functional dependencies.

Transitive Functional Dependency

A transitive dependency is when one field (State) is functionally dependent to another field (Zip), which is functionally dependent on another field (Name), Such that (Name) is transitively dependent to (State). This can be seen in the figure above. The state is functionally dependent on the zip code, and the zip code is functionally dependent on the person.

To remove transitive functional dependency means that all non-key attributes are functionally dependent on the primary key.

Step 6: Practicing First Normal Form

To test your knowledge of what you have learned about first normal form:

- identify any problems in the table that breaks the rule of first normal form.

- with a piece of paper and pencil, separate the table above into multiple tables that meet the requirements of first normal form.

Note: for this example, just assume that only one item can be placed on a single order.

The next step will show the above table normalized to 3rd normal form.

Step 7: How Did You Do?

Here is how the previous table could be broken down into first normal form, assuming you made the same assumptions. The original table did not have any multi-value attributes, and was already in third normal form. To convert it into third normal form, it was necessary to break the table into multiple tables where all non-key attributes were dependent on only the primary key.

Want more Info?

Feel free to check out these sites for more information on database normalization:

http://databases.about.com/od/specificproducts/a/n...

http://www.1keydata.com/database-normalization/