Normalization

Database normalization or normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.


  • 1NF - First Normal Form

First normal form enforces these criteria;
o    Contains only atomic value. [value that cannot be divided OR Each field should contain ONE value]
o    Eliminate repeating groups in individual tables.
o    Create a separate table for each set of related data.
o    Identify each set of related data with a primary key [the table should contain at least 1 unique key (primary key)]

             
            Employee Raw data:

EmployeeName
Degree
Krishnan
BSc, MSc, Phd
Raman
BSc, MSc
Durga
BSc


            After enforcing the First Normal Form criteria:
EmployeeName
Degree
Krishnan
BSc
Krishnan
MSc
Krishnan
Phd
Raman
BSc
Raman
MSc
Durga
BSc



  • 2NF - Second Normal Form

A database is in second normal form if it satisfies the following conditions:
o    It is in first normal form.
o    All non-key attributes are fully functional dependent on the primary key.

We see here, EmployeeName (non-key attribute) can be identified by EmployeeID and Degree(non-key attribute) can be identified by DegreeID.

After enforcing the second normal form:

EmployeeID
EmployeeName
DegreeID
DegreeID
Degree
1
Krishnan
1
1
BSc
1
Krishnan
2
2
MSc
1
Krishnan
3
3
Phd
2
Raman
1
2
Raman
2
3
Durga
1


  • 3NF - Third Normal Form

A database is in third normal form if it satisfies the following conditions:
o    It is in second normal form.
o    No non-key fields are dependent on another non-key field.

We can see here, non-key attribute (City) is dependent on another non-key attribute (Zip)
EmployeeID
EmployeeName
City
Zip
1
Krishnan
Mangalore
575001
2
Raman
Mangalore
575001
3
Durga
Mangalore
575002
4
Saratha
Coimbatore
641104

 After enforcing the third normal form:
EmployeeID
EmployeeName
Zip
Zip
City
1
Krishnan
575001
575001
Mangalore
2
Raman
575001
575002
Mangalore
3
Durga
575002
641104
Coimbatore
4
Saratha
641104

  • BCNF - Boyce–Codd Normal Form

A database is in third normal form if it satisfies the following conditions:
o    It is in third normal form.
o    For any non-trivial functional dependency, X→ A, X must be a super-key

The above case, EmployeeID is the super-key in the relation Employee detail and Zip is the super-key in the relation ZipCodes. So,
EmployeeID -> EmployeeName, Zip and Zip -> City





Advantages of Normalization

1. Smaller database:
              By eliminating duplicate data, you will be able to reduce the overall size of the database.

2. Better performance:

            Narrow tables: Having more fine-tuned tables allows your tables to have less columns and allows you to fit more records per data page.

            Fewer indexes per table mean faster maintenance tasks such as index rebuilds.

            Only join tables that you need.
Disadvantages of Normalization
          
            More tables to join: By spreading out your data into more tables, you increase the need to join tables.

           Tables contain codes instead of real data: Repeated data is stored as codes rather than meaningful data. Therefore, there is always a need to go to the lookup table for the value.
           Data model is difficult to query against: The data model is optimized for applications, not for ad hoc querying.

  EXAMPLE:

Computer
Software
1
Word
2
Access, Word, Excel
3
Word, Excel
1 NF: Eliminate repeating groups.
Computer
Software
1
Word
2
Access
2
Word
2
Excel
3
Word
3
Excel
2 NF: Eliminate reduntant Data plus (1 NF) Eliminate repeating groups.
Computer
Software
1
Word
2
Access
2
Word
2
Excel
3
Word
3
Excel
To eliminate redundant data, we need to create 2 tables  The first table which stores a reference SoftwareID to the New table that has unique list of Software titles. 
Computer
SoftwareID
SoftwareID
Software
1
1
1
Word
2
2
2
Access
2
3
3
Excel
2
3


3
1


3
3


NF 3 : Eliminate columns not dependent on key plus 1NF and 2NF. In this table, we have data that contains both data about the computer and the user
Computer
 User Name
 User Hire
 Date Purchased
1
Joe
4/1/2000
5/1/2003
2
Mike
9/5/2003
6/15/2004
NF 3 : Steps : To eliminate columns not dependent on the key, we would create the following tables. Now the data stored in the computer table is only related to the computer, and the data stored in the user table is only related to the user
Computer
 Date Purchased
1
5/1/2003
2
6/15/2004
UserID
 User Name
 User Hire
1
Joe
4/1/2000
2
Mike
9/5/2003
UserID
Computer
1
1
2
2

No comments:

Post a Comment