First Normal Form

29.03.2017 |

Episode #5 of the course Relational database design by Ben Brumm

 

Welcome to the fifth lesson.

In this lesson, we’ll learn how to transform this design into something that’s efficient and well-designed using a process called “normalization.”

 

What is “Normalization”?

Normalization is the process of transforming a database design into a design that meets a set standard for quality.

There are three stages to normalization, and in each stage, we apply steps to our design to meet a certain format.

The stages are called:

• First normal form
• Second normal form
• Third normal form

To move a database design from an initial design, we ensure it meets a specific rule. This rule is the same for all databases, so if a database meets this rule, it is said to be in “first normal form.”

As we go through this course, we’ll step through each of these stages, showing you what the database looks like at each stage.

Once we get to third normal form, I’ll explain why this format is ideal and why it’s better than our original design.

 

First Normal Form

Our sample database looks like this:

Student (first name, last name, date of birth, address)

Subject (subject name, category)

Teacher (first name, last name, date of birth, address, subjects taught)

University (name, address)

Now, the rule of first normal form means:

Each set of columns must uniquely identify a row.

In this case, the columns are the attributes. I’ll start referring to entities as tables and attributes as columns.

Let’s start with our student table.

Student (first name, last name, date of birth, address)

Now, does the set of these columns uniquely identify a row? Does the combination make it unique?

If we have a student with a first name, last name, date of birth, and address, is that unique?

The answer is no.

There could be someone of the same name and same date of birth living at the same address. It’s rare but still possible. What can we use to uniquely identify this row? We can’t use any column individually or a combination of these columns (e.g., first and last name). A new column is needed. This new column will be an identifier—a unique value to identify a student.

This is called a “primary key.” Primary keys are used to uniquely identify a row. The value must be unique.

We’ll add a new column and call it “student ID.”

We often call our primary key fields IDs, and adding the name of the table into the field name makes it clear.

So, our table now looks like this, with our primary key underlined:

Student (student ID, first name, last name, date of birth, address)

Now, what about the subject table?

We could have the same subject name in the same category, so these fields don’t uniquely identify a row.

The subject name, or the category, isn’t unique.

Let’s make a new column called subject ID for the primary key.

Subject (subject ID, subject name, category)

The teacher table:

Teacher (first name, last name, date of birth, address, subjects taught)

It’s rare, but we could have two teachers with the same name, date of birth, address, and same subject. Let’s add a teacher ID column as the primary key for consistency.

Teacher (teacher ID, first name, last name, date of birth, address, subjects taught)

Now, our university table:

University (name, address)

Does this uniquely identify a row?

Yes. Two universities with the same name would have a different address. We don’t need a primary key, but creating one is up to you. I like to create primary keys in this instance. So let’s create a new column called university ID.

Our complete database looks like this:

Student (student ID, first name, last name, date of birth, address)

Subject (subject ID, subject name, category)

Teacher (teacher ID, first name, last name, date of birth, address, subjects taught)

University (university ID, name, address)

Our diagram looks like this:

 

In the next lesson, we’ll look at the relationships between tables.

Until next time,
Ben

 

Recommended book

“The Data Model Resource Book, Vol. 1: A Library of Universal Data Models for All Enterprises” by Len Silverston

 

Share with friends