This is the second in a series of posts teaching normalization.
The first post introduced database normalization, its importance, and the types of issues it solves.
In this article, we’ll explore the first normal form. For the examples, we’ll use the Sales Staff Information shown below as a starting point. As we pointed out in the last post’s modification anomalies section, there are several issues to keeping the information in this form. By normalizing the data you see, we’ll eliminate duplicate data as well as modification anomalies.
1NF – First Normal Form Definition
The first steps to making a proper SQL table is to ensure the information is in first normal form. Once a table is in first normal form, it is easier to search, filter, and sort the information. The rules to satisfy 1st normal form are:
- That the data is in a database table. The table stores information in rows and columns where one or more columns, called the primary key, uniquely identify each row.
- Each column contains atomic values, and there are no repeating groups of columns.
Tables in first normal form cannot contain sub columns. That is, if you are listing several cities, you cannot list them in one column and separate them with a semi-colon. When a value is atomic, the values cannot be further subdivided. For example, the value “Chicago
” is atomic; whereas “Chicago; Los Angeles; New York
” is not. Related to this requirement is the concept that a table should not contain repeating groups of columns such as Customer1Name
, Customer2Name
, and Customer3Name
.
Our example table is transformed to first normal form by placing the repeating customer
related columns into their own table. This is shown below:
The repeating groups of columns now become separate rows in the Customer
table linked by the EmployeeID
foreign key. As mentioned in the lesson on Data Modeling, a foreign key is a value which matches back to another table’s primary key. In this case, the customer
table contains the corresponding EmployeeID
for the SalesStaffInformation
row. Here is our data in first normal form.
This design is superior to our original table in several ways:
- The original design limited each
SalesStaffInformation
entry to three customers. In the new design, the number of customers associated with each design is practically unlimited. - It was nearly impossible to Sort the original data by
Customer
. You could, if you used the UNION
statement, but it would be cumbersome. Now, it is simple to sort customers. - The same holds true for filtering on the
customer
table. It is much easier to filter on one customer
name related column than three. - The insert and deletion anomalies for
Customer
have been eliminated. You can delete all the customer
s for a SalesPerson
without having to delete the entire SalesStaffInformation
row.
Modification anomalies remain in both tables, but these are fixed once we reorganize them as 2nd normal form. More tutorials are to follow! Remember! I want to remind you all that if you have other questions you want answered, then post a comment or tweet me. I’m here to help you. What other topics would you like to know more about?