8.4 Normalization
Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise (Connolly et. al 2002). Normalization always performed as a series of tests on relations to determine whether it satisfies the requirements of a given normal form. There are three normal forms that were normally used called first normal form (1NF), second normal form (2NF) and third normal form (3NF). All this normal forms are normally based on functional dependencies among attributes in a relation. In this section, we will explain these three types of normal forms. Before that, we need to explain the concept of functional dependencies.
8.4.1 Functional Dependencies
In order to explain the concept of functional dependencies, let us assume that we have a relation R. This relation R has several attributes named, a1, a2, a3 and a4. So,
R (a1, a2, a3, a4)
In this relation R, if a1 and a2 are the attributes of R, a2 is functionality dependent on a1 (denoted as a1-> a2). Meaning that a1 is determinant on a2. Figure 8-3 shows that a2 is functionality dependent on a1. Determinant refers to attributes or group of attributes on the left-hand side of the arrow of a functional dependency (Connolly et. al 2002).
8.4 Normalization
Figure 8-3: a1 is a determinant of a2
The rest of this section, we will use this example below. Assume that we have a Relation STUDENT with the following attributes :
STUDENT ( student_ID, student_name, program_code, course_id, course_name)
In this relation, noted that student_name is functionality dependent on student_ID. student_ID is the determinant of student_name as shown in figure below.
Figure 8-4: student_ID is a determinant of student_name
8.4 Normalization
8.4.2 First Normal Form
First normal form (1NF) is a relation in which the intersection of each row and column contains one and only one value. The process of normalization begins with a table in unnormalized form (UNF). To transform this unnormalized form to 1NF, we should identify and remove repeating groups in the table. A repeating group is an attribute or group of attributes in the table that occurs multiple values for a single occurrence of the key attribute for that table. Figure 8-5 shows there are repeating groups occurred when course data is repeated for each student.
Figure 8-5: A Table GRADE with repeating groups
8.4 Normalization
There are two approaches to remove repeating groups from unnormalized form:
| 1. |
Remove the repeating groups by entering an appropriate data in the empty columns containing repeating data. |
|
|
| 2. |
Remove the repeating groups by placing the repeating data, along with a copy of the original key attribute in a separate new relation. |
|
|
Using the first approach, we remove the repeating group by entering an appropriate data in each row. The result is shown in Figure 8-6. From here, we select StudId and CourseId as primary keys. Then, the STUDENT relation is defined as below :
GRADE(StudId, StudName,Tel, Major, CourseId, CourseTitle, TeachName, Teach Grade)
Figure 8-6: First Normal Form
8.4 Normalization
8.4.3 Second Normal Form
Second normal form (2NF) is based on the concept of full functional dependency. Second normal form (2NF) is a relation that is in 1NF and every non key attribute is fully functional dependent on the key. Full functional dependency is when in relation R, if a1 and a2 are the attributes of R, a2 is fully functionality dependent on a1 but not on any proper subset of a1. Partial functionality dependent is a condition when some attributes that can be removed from a1 and the dependencies still holds. From First Normal Form below, we identify partial and fully dependencies for each attribute. Figure 8-7 shows results for Second Normal Form and its relations.
 Figure 8-7: Second Normal Form |
 |
8.4 Normalization
8.4.4 Third Normal Form
Second normal form (2NF) has less redundancy than 1NF, but the redundancy are still exist. Third normal form (3NF) is a relation that is in first and second normal form and a condition where no non-key attribute is transitively dependent on the key. Third normal form (3NF) is based on the concept of transitive dependency. Transitive dependency is a condition where
a1,
a2 and
a3 are attributes of relation R. If
a1 -> a2 and
a2 -> a3, then
a3 is transitively dependent on
a1 via
a2. Transitive dependency is a condition where non key attributes is depends on non key attributes. So, we should remove all non key attributes with transitive dependencies to a new relation and copy together with its key attribute. Figure 8-8 shows Third Normal Form and its relations.
 Figure 8-7: Third Normal Form |
 |