Tuesday 15 November 2016

Entity Relationship Model


ER Model defines conceptual view of a database. It is a logical relationship of entities (or objects) in order to create a database.

The elements of ER Model are;

·         Entity – Entity can be real-world object. It is a collection of related attributes or properties.

Example: Employee, Department, etc.


o    Strong Entity –An entity that has a primary key is called as Strong entity. Rectangle represents strong entity.


Example: Employee, Department, etc, we can use “INNER JOIN” to join between tables.

o    Weak Entity –Weak entity doesn’t have key attribute (primary key) of its own.  Double rectangle represents weak entity.

Example: Employee, Dependant, Employee can exist without dependants but no dependent can exist without the employees. We must use “LEFT OUTER JOIN”/ “RIGHT OUTER JOIN” to join between entities.

Differences between Strong Entity and Weak Entity:


#
Strong Entity
Weak Entity
1
It has its own primary key.
It does not have sufficient attributes to form a primary key on its own.
2
It is represented by a rectangle.
It is represented by a double rectangle.
3
It contains a primary key represented by an eclipse with underline.
It contains partial key or discriminator represented by an eclipse with dashed underline.
4
The member of strong entity set is called as dominant entity set
The member of weak entity set is called as subordinate entity set.
5
The primary key is one of its attributes which uniquely identifies its member.
The primary key of weak entity set is combination of partial key and primary key of the strong entity set.
6
The relationship between two strong entity set is represented by a diamond symbol.
The relationship between one strong and weak entity set is represented by a double diamond symbol.
7
The line connecting strong entity set with identifying the relationship is single.
The line connecting weak entity set with identifying the relationship is double.
8
Total participation in the relationship may or may not exists
Total participation in the identifying relationship always exists.


·     Entity Set – It is a collection of similar type of entities. An entity set may contain entities with attribute sharing similar values.



Example: “Staff” set may contain all the staffs of a company from all employees.





·        Attribute - It is a property/element of an object. Entities are named/represented by the means of their properties.



Example: EmployeeName, EmployeeAge, EmployeeSalary, etc,



The various types of Attributes are;
o   Simple attribute - Simple attributes are atomic values, which cannot be divided further. Eclipse represents key attribute.


Example: An employee's phone number is an atomic value of 10 digits.Eclipse represents attribute.


o    Composite attribute - Composite attributes are made of more than one simple attribute.

Example: An employee's complete name may have FirstName and LastName.

o    Derived attribute - Derived attributes that do not exist in the physical database, but their values are derived from other attributes present in the database. It depicted by dased eclipse.

Example: AverageSalary in a department should not be saved directly in the database, instead it can be derived. Another example is, Age can be derived from DateOfBirth.

o    Single-value attribute - Single-value attributes contain single value. It depicted by eclipse with underlining.

Example:  EmployeePAN, EmployeeSSN, etc,
o    Multi-value attribute - Multi-value attributes may contain more than one value. It depicted by doubled eclipse.

Example: An employee can have more than one EmailAddress, PhoneNumber, etc,



·         Keys - Key is just a single attribute (which is just a column) that can uniquely identify a row.
Example: EmployeeID attribute of Employee table.
The various types of Keys are;
o    Super Key - A set of attributes (one or more) that collectively identifies an entity in an entity set. It’s any combination of column(s) for which that combination of values will be unique across all rows in a table.

   Example: EmployeeID is always unique in Employee table. Combination of EmployeeID and EmployeeName there will be no two rows in the table that share the exact same combination of values, because the EmployeeID will always be unique and different for each row.

o    Candidate Key - A minimal super key is called a candidate key. An entity set may have more than one candidate key. It is minimum number of columns that can be used to uniquely identify a single row. In other words, the minimum number of columns, which when combined, will give a unique value for every row in the table.

Example: EmployeeID is always unique in Employee table.

o    Primary Key - Only one super key is used to uniquely identify a single row of table is called primary key.

Example: EmployeeID is always unique in Employee table.

o    Foreign Key -  It is used to define relationship between two tables.

o    Alternate Key - Any table have more than one candidate key, then after choosing primary key from those candidate key, rest of candidate keys are known as an alternate key of that table.

  Example: We have a table named Employee which has two columns EmployeeID and EmployeeEmail, both have not null attributes and unique value. So both columns are treated as candidate key. Now we make EmployeeID as a primary key to that table then EmployeeEmail is known as alternate key.

o    Composite Key - Create keys with more than one column then that key is known as composite key. It is used to uniquely identify a single row of table.

   Example: Combination of (EmployeeID, DegreeID) columns uniquely identified as single row of EmployeeDegree table.



·         Mapping Cardinalities (Relations) - Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of other set called relations.

Relationships are defined on the basis of matching key columns. In SQL server, these relationships are defined using Primary Key-Foreign Key constraints. Rhombus/Diamond represents relationship.  

There are three types of relationship that exist between Entities.

o    Binary Relationship - Relationship means relation between two Entities.
o    Unary/Recursive Relationship - An Entity is related with itself.
o    Ternary Relationship - Relationship of degree (number) three is called Ternary relationship.

The various types of Relationship are;

  • One to One [Binary Relationship]

One element of entity/table can be associated with at most one element of another entity(table) and vice versa.
Example: An employee must have one address.

  • One to Many [Binary Relationship]
One element of entity/table can be associated with more than one elements of another entity(table).
Example: An employee holds many degrees.
  • Many to One [Binary Relationship]
More than one elements of entity/table can be associated with at most one element of another entity(table).


Example: Many employees belong to one department.

  • Many to Many

One element of entity/table can be associated with more than one elements of another entity(table) and vice versa.


Example: Many employees are associated to one project and many projects are assigned to many employees.

  • Unary/Recursive Relationship
An entity is related with itself
 

Example: Employee table has columns like EmployeeID and ManagerID. Here EmployeeID is candidate key/ primary key of Employee table, Employee table’s ManagerID column is associated with same Employee table EmployeeID  

  • Ternary Relationship
Relationship of degree (number) three is called Ternary relationship.


                    









Example: Employees are associated to Projects and Technologies, Technologies associated to Projects. 



Entity Relationship Diagram


Entity Relationship Diagram is a graphical representation of logical relationship among entities (tables) in the database.

ER diagram depicts the attributes relationship among entities/entity set.

Symbols and Notations for ER Diagram are as follows.



Example for ER Diagram:

Try it out:


1) Understand the above diagram and write the requirements.

2) Draw an ERD for following system and create a database design.
- Allows to store number (ID), first name, last name, gender, date of birth, PAN, salary, reporting manager, street address, city and zip and qualifications/degrees (SSLC, PUC, BSc, MSc, Phd) of employees. (number would be used to identify the unique record of employee)
- Allows storing number (ID), name and technologies (PHP, SQL, HTML) of projects. (number would be used to identify the unique record of project)
- Allows storing identification number, name, street address, city and zip of company (number would be used to identify the unique record of company)
- Employee can exist without project,
- Allows storing percentage of allocation of each project to the assigned Employee
- Employees must belong to a company
- Qualifications must be stored in a separate entity named as degree 
- Employee holds one or many qualifications
- Allows storing multiple technologies of project  
- Allows getting the average salary of employees
- Search allows for getting the employee record based on first name and last name 


  

Normalization

Database normalization or normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.

  • 1NF - First Normal Form
First normal form enforces these criteria;
o  Contains only atomic value [value that cannot be divided OR Each field should contain ONE value]
o    Eliminate repeating groups in individual tables.
o    Create a separate table for each set of related data.
o    Identify each set of related data with a primary key.[the table should contain at least 1 unique key (primary key)]

             
            Employee Raw data:

EmployeeName
Degree
Krishnan
BSc, MSc, Phd
Raman
BSc, MSc
Durga
BSc


            After enforcing the First Normal Form criteria:
EmployeeName
Degree
Krishnan
BSc
Krishnan
MSc
Krishnan
Phd
Raman
BSc
Raman
MSc
Durga
BSc


  • 2NF - Second Normal Form
A database is in second normal form if it satisfies the following conditions:
o    It is in first normal form.
o    All non-key attributes are fully functional dependent on the primary key.

We see here, EmployeeName (non-key attribute) can be identified by EmployeeID and Degree(non-key attribute) can be identified by DegreeID.

After enforcing the second normal form:

EmployeeID
EmployeeName
DegreeID
DegreeID
Degree
1
Krishnan
1
1
BSc
1
Krishnan
2
2
MSc
1
Krishnan
3
3
Phd
2
Raman
1
2
Raman
2
3
Durga
1


  • 3NF - Third Normal Form
A database is in third normal form if it satisfies the following conditions:
o    It is in second normal form.
o    No non-key fields are dependent on another non-key field.

We can see here, non-key attribute (City) is dependent on another non-key attribute (Zip)
EmployeeID
EmployeeName
City
Zip
1
Krishnan
Mangalore
575001
2
Raman
Mangalore
575001
3
Durga
Mangalore
575002
4
Saratha
Coimbatore
641104

 After enforcing the third normal form:
EmployeeID
EmployeeName
Zip
Zip
City
1
Krishnan
575001
575001
Mangalore
2
Raman
575001
575002
Mangalore
3
Durga
575002
641104
Coimbatore
4
Saratha
641104

  • BCNF - Boyce–Codd Normal Form
A database is in third normal form if it satisfies the following conditions:
o    It is in third normal form.
o    For any non-trivial functional dependency, X→ A, X must be a super-key.

The above case, EmployeeID is the super-key in the relation Employee detail and Zip is the super-key in the relation ZipCodes. So,
EmployeeID -> EmployeeName, Zip and Zip -> City


Advantages of Normalization

1. Smaller database:
              By eliminating duplicate data, you will be able to reduce the overall size of the database.

2. Better performance:

            Narrow tables: Having more fine-tuned tables allows your tables to have less columns and allows you to fit more records per data page.

            Fewer indexes per table mean faster maintenance tasks such as index rebuilds.

            Only join tables that you need.
Disadvantages of Normalization
          
            More tables to join: By spreading out your data into more tables, you increase the need to join tables.

           Tables contain codes instead of real data: Repeated data is stored as codes rather than meaningful data. Therefore, there is always a need to go to the lookup table for the value.
           Data model is difficult to query against: The data model is optimized for applications, not for ad hoc querying.

  EXAMPLE:


Computer
Software
1
Word
2
Access, Word, Excel
3
Word, Excel
1 NF: Eliminate repeating groups.
Computer
Software
1
Word
2
Access
2
Word
2
Excel
3
Word
3
Excel
2 NF: Eliminate reduntant Data plus (1 NF) Eliminate repeating groups.
Computer
Software
1
Word
2
Access
2
Word
2
Excel
3
Word
3
Excel
To eliminate redundant data, we need to create 2 tables  The first table which stores a reference SoftwareID to the New table that has unique list of Software titles. 
Computer
SoftwareID
SoftwareID
Software
1
1
1
Word
2
2
2
Access
2
3
3
Excel
2
3


3
1


3
3


NF 3 : Eliminate columns not dependent on key plus 1NF and 2NF. In this table, we have data that contains both data about the computer and the user
Computer
 User Name
 User Hire
 Date Purchased
1
Joe
4/1/2000
5/1/2003
2
Mike
9/5/2003
6/15/2004
NF 3 : Steps : To eliminate columns not dependent on the key, we would create the following tables. Now the data stored in the computer table is only related to the computer, and the data stored in the user table is only related to the user
Computer
 Date Purchased
1
5/1/2003
2
6/15/2004
UserID
 User Name
 User Hire
1
Joe
4/1/2000
2
Mike
9/5/2003
UserID
Computer
1
1
2
2