Database normalization or normalization, is
the process of organizing the columns (attributes) and tables (relations) of a
relational database to reduce data redundancy and improve data integrity.
- 1NF - First Normal Form
First normal form enforces these criteria;
o Contains
only atomic value [value that cannot be divided OR Each field should contain
ONE value]
o Eliminate
repeating groups in individual tables.
o Create
a separate table for each set of related data.
o Identify
each set of related data with a primary key.[the table should contain at least
1 unique key (primary key)]
Employee Raw data:
EmployeeName
|
Degree
|
Krishnan
|
BSc, MSc, Phd
|
Raman
|
BSc, MSc
|
Durga
|
BSc
|
After
enforcing the First Normal Form criteria:
EmployeeName
|
Degree
|
Krishnan
|
BSc
|
Krishnan
|
MSc
|
Krishnan
|
Phd
|
Raman
|
BSc
|
Raman
|
MSc
|
Durga
|
BSc
|
- 2NF - Second Normal Form
A database is in second normal form if it
satisfies the following conditions:
o It
is in first normal form.
o All
non-key attributes are fully functional dependent on the primary key.
We see here, EmployeeName (non-key
attribute) can be identified by EmployeeID and Degree(non-key attribute) can be identified by
DegreeID.
After enforcing the second
normal form:
EmployeeID
|
EmployeeName
|
DegreeID
|
DegreeID
|
Degree
|
|
1
|
Krishnan
|
1
|
1
|
BSc
|
|
1
|
Krishnan
|
2
|
2
|
MSc
|
|
1
|
Krishnan
|
3
|
3
|
Phd
|
|
2
|
Raman
|
1
|
|||
2
|
Raman
|
2
|
|||
3
|
Durga
|
1
|
- 3NF - Third Normal Form
A database is in third normal form if it
satisfies the following conditions:
o It
is in second normal form.
o No non-key fields are dependent on another non-key
field.
We can see here, non-key attribute (City) is
dependent on another non-key attribute (Zip)
EmployeeID
|
EmployeeName
|
City
|
Zip
|
|
1
|
Krishnan
|
Mangalore
|
575001
|
|
2
|
Raman
|
Mangalore
|
575001
|
|
3
|
Durga
|
Mangalore
|
575002
|
|
4
|
Saratha
|
Coimbatore
|
641104
|
EmployeeID
|
EmployeeName
|
Zip
|
Zip
|
City
|
|
1
|
Krishnan
|
575001
|
575001
|
Mangalore
|
|
2
|
Raman
|
575001
|
575002
|
Mangalore
|
|
3
|
Durga
|
575002
|
641104
|
Coimbatore
|
|
4
|
Saratha
|
641104
|
|||
- BCNF - Boyce–Codd Normal Form
A database is in third normal form if it
satisfies the following conditions:
o It
is in third normal form.
o For
any non-trivial functional dependency, X→ A, X must be a super-key.
The above case, EmployeeID is
the super-key in the relation Employee detail and Zip is the super-key in the
relation ZipCodes. So,
EmployeeID -> EmployeeName, Zip and Zip -> City
- 4NF - Fourth Normal Form, For more details refer below article. https://en.wikipedia.org/wiki/Fourth_normal_form
- 5NF - Fifth Normal Form, For more details refer below article. https://en.wikipedia.org/wiki/Fifth_normal_form
- 6NF - Fifth Normal Form, For more details refer below article. https://en.wikipedia.org/wiki/Fifth_normal_form
Advantages of Normalization
|
1. Smaller database:
|
By eliminating duplicate data, you will be able to reduce the overall size of the database.
|
2. Better performance:
|
Narrow tables: Having more fine-tuned tables allows your tables to have less columns and allows you to fit more records per data page.
|
Fewer indexes per table mean faster maintenance tasks such as index rebuilds.
|
Only join tables that you need.
|
Disadvantages of Normalization
|
More tables to join: By spreading out your data into more tables, you increase the need to join tables.
|
Tables contain codes instead of real data: Repeated data is stored as codes rather than meaningful data. Therefore, there is always a need to go to the lookup table for the value.
|
Data model is difficult to query against: The data model is optimized for applications, not for ad hoc querying.
|
EXAMPLE:
Computer
|
Software
| ||||
1
|
Word
| ||||
2
|
Access, Word, Excel
| ||||
3
|
Word, Excel
| ||||
1 NF: Eliminate repeating groups.
| |||||
Computer
|
Software
| ||||
1
|
Word
| ||||
2
|
Access
| ||||
2
|
Word
| ||||
2
|
Excel
| ||||
3
|
Word
| ||||
3
|
Excel
| ||||
2 NF: Eliminate reduntant Data plus (1 NF) Eliminate repeating groups.
| |||||
Computer
|
Software
| ||||
1
|
Word
| ||||
2
|
Access
| ||||
2
|
Word
| ||||
2
|
Excel
| ||||
3
|
Word
| ||||
3
|
Excel
| ||||
To eliminate redundant data, we need to create 2 tables The first table which stores a reference SoftwareID to the New table that has unique list of Software titles.
| |||||
Computer
|
SoftwareID
|
SoftwareID
|
Software
| ||
1
|
1
|
1
|
Word
| ||
2
|
2
|
2
|
Access
| ||
2
|
3
|
3
|
Excel
| ||
2
|
3
| ||||
3
|
1
| ||||
3
|
3
| ||||
NF 3 : Eliminate columns not dependent on key plus 1NF and 2NF. In this table, we have data that contains both data about the computer and the user
| |||||
Computer
|
User Name
|
User Hire
|
Date Purchased
| ||
1
|
Joe
|
4/1/2000
|
5/1/2003
| ||
2
|
Mike
|
9/5/2003
|
6/15/2004
| ||
NF 3 : Steps : To eliminate columns not dependent on the key, we would create the following tables. Now the data stored in the computer table is only related to the computer, and the data stored in the user table is only related to the user
| |||||
Computer
|
Date Purchased
| ||||
1
|
5/1/2003
| ||||
2
|
6/15/2004
| ||||
UserID
|
User Name
|
User Hire
| |||
1
|
Joe
|
4/1/2000
| |||
2
|
Mike
|
9/5/2003
| |||
UserID
|
Computer
| ||||
1
|
1
| ||||
2
|
2
|
No comments:
Post a Comment