Database Normalisation – a review of the rules to refresh my knowledge

Here is a review of the 6 forms of normalisation (yes the 4th is called 3.5 for some reason) – there are lots of summaries on the web of these, but I wanted to re-write them in my own words to ensure I understand them fully. Revision of these post real world experience is particularly enlightening.

Normalisation rules are cumulative ie a database or set of tables can be said to be 1 through to 5 compliant. To rank at the highest level it must comply with all previous forms.

That is to say that a database (or table) to be in the 2nd Normal Form must first fulfill all the criteria of the 1st Normal Form.

A database is said to be properly normalised if the first three normalised forms are inviolate.

Violations of normalisation forms subsequent to the third normal form may be legitimately required and as such are NOT absolutely required for a database to be classified as properly normalised.

Some web explanations do not list forms post 3 I would argue there is still a lot of value in trying to adhere to these less well forms but must admit to not always including them in design usually due to poor knowledge of the application domain.

The differing forms of normalisation start simple becoming increasingly complicated.
If you have a lot of experience in creating relational backends you may have deduced these issues for yourself.

(1NF) First Normal Form – Edgar F Codd – first stated 1970
Eliminate duplicate columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

(2NF) Second Normal Form – Edgar F Codd – first stated 1971
No partial dependency on anything but the key fields

(3NF) Third Normal Form – Edgar F Codd – first stated 1971
Every non prime attribute of the primary key is non transitively dependent on every key of R. The example I saw was that if you had winners of wimbledon some of which repeated you would not store there birth dates. You would have a junction table of the winners and link to the names of the individuals and it would be in this table that you would store the dates of birth.

The 3.5 Normal Form or the Boyce Codd Normal Form
This is a slightly stronger version of the third normal form developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly not dealt with by the 3rd Normal form as originally defined.

Only in rare cases does a 3rd Normal Form table not meet the requirements of Boyd Codd Normal Form. A 3rd Normal Form table which does not have multiple overlapping candidate keys is guaranteed to be in Boyce Codd Normal Form.

Web investigation indicates that the most concise description is;
Every determinant must be a candidate key

and that Boyd Codd Normalised Form is not always possible

Nearest Shops
Person Shop Type Nearest Shop
Mark Jewellers H Samuel
Mark Barbers Mr Man
Natalia Coffee Shop Starbucks
John Coffee Shop Costa
John Barbers Mr Man
John Jewellers H Samuel

The issue above is that Mr Man on record 2 could be changed to a Hairdressers for shop type but the second last record John would not change and as such it would seem to be inconsistent. In this case I would split this table into two with Nearest Shop and Shop Type being linked and there being some kind of table that links People to Nearest shops.

The 4th Normal Form
A table is in the 4th Normal Form if it has no multi-valued dependencies.
Definition: Multivalued dependencies occur when the presence of one or more rows in a table implies the presence of one or more other rows in that same table.

For example lets say you have a you have a shop selling coffee – lets call it Starbucks and you have a table that lists the permutations of cofee sold in its store
CoffeeShop Coffee Product Location
Starbucks Café Late Livingston
Starbucks Americano Livingston
Starbucks Café Late Edinburgh
Starbucks Americano Edinburgh

If we know that all Starbucks sell the same products if there is a new product in one shop then we would expect that new product to be in the other shops so adding a Skinny Late to the Livingston shop implies that there should be another record for Skinny Lates in the Edinburgh shop.

To eliminate the possibility of anomalies we must place the facts about the products and the location of the shops into two tables

Products
Starbucks Café Late
Starbucks Americano
Starbucks Skinny Late

Location of Coffee shops
Starbucks Livingston
Starbucks Edinburgh

Alternatively if the coffee types did vary from location to location the original table would satisfy 4NF

The 5th Normal Form
A table is in fifth normal form or Project-Join Normal Form if it is in the Fourth Normal Form and it cannot have a loss-less decomposition into any number of smaller tables.

Or as I like to call it the – I know about it and try to adhere to it but sometimes I’m a bit lazy form.

I certainly try to keep my designs to the full six normalisation principles but there are instances where certain tables do not comply.