This is the third in a series of posts teaching normalization.
The second post focused on the first normal form, its definition, and examples to hammer it home.
Now it is time to take a look at the second normal form. I like to think the reason we place tables in 2nd normal form is to narrow them to a single purpose. Doing so brings clarity to the database design, makes it easier for us to describe and use a table, and tends to eliminate modification anomalies.
This stems from the primary key identifying the main topic at hand, such as identifying buildings, employees, or classes, and the columns, serving to add meaning through descriptive attributes.
An EmployeeID
isn’t much on its own, but add a name, height, hair color and age, and now you’re starting to describe a real person.
So what is the definition of 2nd normal form?
2NF – Second Normal Form Definition
A table is in 2nd Normal Form if:
- The table is in 1st normal form, and
- All the non-key columns are dependent on the table’s primary key.
We already know about the 1st normal form, but what about the second requirement? Let me try to explain.
The primary key provides a means to uniquely identify each row in a table. When we talk about columns depending on the primary key, we mean that in order to find a particular value, such as what color is Kris’ hair, you would first have to know the primary key, such as an EmployeeID
, to look up the answer.
Once you identify a table’s purpose, then look at each of the table’s columns and ask yourself, “Does this column serve to describe what the primary key identifies?”
- If you answer “yes,” then the column is dependent on the primary key and belongs in the table.
- If you answer “no,” then the column should be moved to a different table.
When all the columns relate to the primary key, they naturally share a common purpose, such as describing an employee. That is why I say that when a table is in second normal form, it has a single purpose, such as storing employee information.
Issues with our Example Data Model
So far, we have taken our example to the first normal form, and it has several issues.
The first issue is the SalesStaffInformation
table has two columns which aren’t dependent on the EmployeeID
. Though they are used to describe which office the SalesPerson
is based out of, the SalesOffice
and OfficeNumber
columns themselves don’t serve to describe who the employee is.
The second issue is that there are several attributes which don’t completely rely on the entire Customer
table primary key. For a given customer
, it doesn’t make sense that you should have to know both the CustomerID
and EmployeeID
to find the customer
.
It stands to reason you should only need to know the CustomerID
. Given this, the Customer
table isn’t in 2nd normal form as there are columns that aren’t dependent on the full primary key. They should be moved to another table.
These issues are identified below in red.
Fix the Model to 2NF Standards
Since the columns identified in red aren’t completely dependent on the table’s primary key, it stands to reason they belong elsewhere. In both cases, the columns are moved to new tables.
In the case of SalesOffice
and OfficeNumber
, a SalesOffice
was created. A foreign key was then added to SalesStaffInformation
so we can still describe in which office a sales person is based.
The changes to make Customer
a second normal form table are a little trickier. Rather than move the offending columns CustomerName
, CustomerCity
, and CustomerPostalCode
to new table, recognize that the issue is EmployeeID
! The three columns don’t depend on this part of the key. Really, this table is trying to serve two purposes:
- To indicate which customers are called upon by each employee
- To identify customers and their locations.
For the moment, remove EmployeeID
from the table. Now the table’s purpose is clear, it is to identify and describe each customer
.
Now let’s create a table named SalesStaffCustomer
to describe which customer
s a sales person calls upon. This table has two columns, CustomerID
and EmployeeID
. Together, they form a primary key. Separately, they are foreign keys to the Customer
and SalesStaffInformation
tables respectively.
With these changes made, the data model, in second normal form, is shown below:
To better visualize this, here are the tables with data.
As you review the data in the tables, notice that the redundancy is mostly eliminated. Also, see if you can find any update, insert, or deletion anomalies. Those too are gone. You can now eliminate all the sales people, yet retain customer records. Also, if all the SalesOffices
close, it doesn’t mean you have to delete the records containing sales people.
The SalesStaffCustomer
table is a strange one. It’s just all keys! This type of table is called an intersection table. An intersection table is useful when you need to model a many-to-many relationship.
Each column is a foreign key. If you look at the data model, you’ll notice that there is a one to many relationship to this table from SalesStaffInformation
and another from Customer
. In effect, the table allows you to bridge the two tables together.
For all practical purposes, this is a pretty workable database. Three out of the four tables are even in third normal form, but there is one table which still has a minor issue, preventing it from being so.
More tutorials are to follow! Remember! I want to remind you all that if you have other questions you want answered, then post a comment or tweet me. I’m here to help you. What other topics would you like to know more about?