SQL Tutorials: Normalization

Database normalization or normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.

1NF - First Normal Form

First normal form enforces these criteria;

o Contains only atomic value. [value that cannot be divided OR Each field should contain ONE value]

o Eliminate repeating groups in individual tables.

o Create a separate table for each set of related data.

o Identify each set of related data with a primary key [the table should contain at least 1 unique key (primary key)]

Employee Raw data:

EmployeeName	Degree
Krishnan	BSc, MSc, Phd
Raman	BSc, MSc
Durga	BSc

After enforcing the First Normal Form criteria:

EmployeeName	Degree
Krishnan	BSc
Krishnan	MSc
Krishnan	Phd
Raman	BSc
Raman	MSc
Durga	BSc

2NF - Second Normal Form

A database is in second normal form if it satisfies the following conditions:

o It is in first normal form.

o All non-key attributes are fully functional dependent on the primary key.

We see here, EmployeeName (non-key attribute) can be identified by EmployeeID and Degree(non-key attribute) can be identified by DegreeID.

After enforcing the second normal form:

EmployeeID	EmployeeName	DegreeID	DegreeID	Degree
1	Krishnan	1	1	BSc
1	Krishnan	2	2	MSc
1	Krishnan	3	3	Phd
2	Raman	1
2	Raman	2
3	Durga	1

3NF - Third Normal Form

A database is in third normal form if it satisfies the following conditions:

o It is in second normal form.

o No non-key fields are dependent on another non-key field.

We can see here, non-key attribute (City) is dependent on another non-key attribute (Zip)

EmployeeID	EmployeeName	City	Zip
1	Krishnan	Mangalore	575001
2	Raman	Mangalore	575001
3	Durga	Mangalore	575002
4	Saratha	Coimbatore	641104

After enforcing the third normal form:

EmployeeID	EmployeeName	Zip	Zip	City
1	Krishnan	575001	575001	Mangalore
2	Raman	575001	575002	Mangalore
3	Durga	575002	641104	Coimbatore
4	Saratha	641104

BCNF - Boyce–Codd Normal Form

A database is in third normal form if it satisfies the following conditions:

o It is in third normal form.

o For any non-trivial functional dependency, X→ A, X must be a super-key

The above case, EmployeeID is the super-key in the relation Employee detail and Zip is the super-key in the relation ZipCodes. So,

EmployeeID -> EmployeeName, Zip and Zip -> City

4NF - Fourth Normal Form, For more details refer below article. https://en.wikipedia.org/wiki/Fourth_normal_form

5NF - Fifth Normal Form, For more details refer below article. https://en.wikipedia.org/wiki/Fifth_normal_form

6NF - Fifth Normal Form, For more details refer below article. https://en.wikipedia.org/wiki/Fifth_normal_form

Advantages of Normalization

1. Smaller database:

By eliminating duplicate data, you will be able to reduce the overall size of the database.

2. Better performance:

Narrow tables: Having more fine-tuned tables allows your tables to have less columns and allows you to fit more records per data page.

Fewer indexes per table mean faster maintenance tasks such as index rebuilds.

Only join tables that you need.

Disadvantages of Normalization

More tables to join: By spreading out your data into more tables, you increase the need to join tables.

Tables contain codes instead of real data: Repeated data is stored as codes rather than meaningful data. Therefore, there is always a need to go to the lookup table for the value.

Data model is difficult to query against: The data model is optimized for applications, not for ad hoc querying.

EXAMPLE:

Computer	Software
1	Word
2	Access, Word, Excel
3	Word, Excel

1 NF: Eliminate repeating groups.

Computer	Software
1	Word
2	Access
2	Word
2	Excel
3	Word
3	Excel

2 NF: Eliminate reduntant Data plus (1 NF) Eliminate repeating groups.

Computer	Software
1	Word
2	Access
2	Word
2	Excel
3	Word
3	Excel

To eliminate redundant data, we need to create 2 tables The first table which stores a reference SoftwareID to the New table that has unique list of Software titles.

Computer	SoftwareID		SoftwareID	Software
1	1		1	Word
2	2		2	Access
2	3		3	Excel
2	3
3	1
3	3

NF 3 : Eliminate columns not dependent on key plus 1NF and 2NF. In this table, we have data that contains both data about the computer and the user

Computer	User Name	User Hire	Date Purchased
1	Joe	4/1/2000	5/1/2003
2	Mike	9/5/2003	6/15/2004

NF 3 : Steps : To eliminate columns not dependent on the key, we would create the following tables. Now the data stored in the computer table is only related to the computer, and the data stored in the user table is only related to the user

Computer	Date Purchased
1	5/1/2003
2	6/15/2004

UserID	User Name	User Hire
1	Joe	4/1/2000
2	Mike	9/5/2003

UserID	Computer
1	1
2	2

SQL Tutorials

Navigation

Normalization

No comments:

Post a Comment

Pages

Pages

Pages