Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / PostgreSQL

A Closer Look at Lookup Tables

4.60/5 (3 votes)
20 Oct 2016CPOL6 min read 21.2K  
The two most common database lookup table patterns are examined

Introduction

Fundamental design choices made on a database can have profound effects on your application. Take for example the humble lookup table. There are basically two ways to do them. The first way is to create a new table for each lookup type. The second, often called the One True Look-up Table (OTLT), is a single lookup table that stores all lookup values. I have always created separate lookup tables for each type, but have recently worked on a project that put all the lookup tables into one table. I have to say the OTLT approach left a bad taste in my mouth. I will explain why.

Background

A large application may have dozens of lookup tables. For example, an Address table might have a lookup for AddressType. The Address table would probably have an AddressTypeID column which could have an efficient tinyint data type that has a constraint against the AddressType table to ensure data integrity. Now repeat this many times for all the other lookup tables that are required as part of a properly normalized database.

Sounds like a lot of work, why don't we just create one big lookup table that can be added to on-the-fly, the OTLT? When the system changes, we just add some new records to the OTLT and we are done, there is no need to create any new tables. On the surface, and in theory this sounds like a good thing. However, when you dig down and look at how it all works in practice, the OTLT model quickly falls apart.

The Bigger Issues

The AddressType table will at most have a dozen or so records in it, so the tinyint datatype is a good choice. If we where to use the OTLT, the primary key will have many more values than 256, so a larger datatype (int or bigint) will have to be used in both the Address table and the OTLT. Not the end of the world, but not the best choice for efficiency.

So we add five records for address type to the OTLT. These are the first records added, so the keys generated are 0, 1, 2, 3, 4. Next, you add some more lookup records for a different table called ItemStatus. These records have keys 5, 6,…. Again, not the end of the world, but inconvenient. Most Lookup tables have keys starting at 0 to N, they are easy to understand. Now you add another AddressType record and it has a key 321, since there have been a significant amount of records added to the table since you first added AddressType records. Now we have a set of AddressType keys that look like 0, 1, 2, 3, 4, 321. That is just plain ugly and very annoying when it comes to working with these `magic` numbers. Debugging and development becomes more difficult. Multiply that annoyance by the number lookup table types in your OTLT and it becomes more than an annoyance, it becomes a problem.

So let's solve that problem by adding a compound key to the OTLT, that contains a column for the foreign key, and another for the LookupType. Now you can reuse the primary key many times since the lookup value must be unique across the two columns. But wait, it is now possible to have many records with the value 3 as their lookup value. How do you ensure that when you join the OTLT table to the Address table, you only get the AddressType records? Why not create a view called AddressType which filters the OTLT table for only AddressType records and join to that view. At this point, I ask why not just create the AddressType table in the first place? Also, adding a LookupType to the OTLT would probably require another table if it is normalized properly. So now we have a lookup table for the lookup table, which is contrary to the entire purpose of the OTLT pattern. I call this "Solution Divergence" and it is a sure sign that you are doing something wrong.

Let's keep going with the OTLT issues. We might need a table called CountryCode which contains records of all countries in the world based on the ISO 3166 two character code. This would join to the CountryCode column in the Address table.

The OTLT is keyed on a numeric column. The ISO 3166 code is an excellent choice as a country key, but it won't work since it uses a character key. There is no easy answer to this problem. Maybe you add another two character based key to the OTLT table. Now we are really out of control since which key column do you use when? Adding a special key column that is seldom used? I won't even mention that the OTLT key could be a string value that would be joined to a numeric column or that all keys in all tables should be characters. That is just crazy talk. Or maybe we just add a separate table called CountryCode. So now we have some lookups in one place, and other lookups in another. The problem just keeps getting worse.

Because the OTLT table is generic, the columns have generic names. For example, you might have a columns called ID, Value, Name and Description. You need to display a dropdown list of address type in your application. Which column do you bind to? ID or Value? Which column do you display to the user? The Value or Description? Depending on how the data was added to the OTLT table, the answer might change, and it also might be different for different lookup types. The biggest problem is that the OTLT is not self-documenting. We don't really know what the values mean since they are stored out of context, that is all values for all lookup tables are in a single table, not related explicitly to the table which they are joined to.

This also brings up another annoying thing. When doing large queries, you end up doing many joins to the same OTLT table to fetch the required lookup values. This can be a real pain to manage since it is easy to get the wrong value without careful query construction. Also, as a developer, when I need to see the AddressType values, I would simply browse the AddressType table. If they are in the OTLT table, I would need to open the OTLT table, filter the records to show only AddressType values, and then interpret what those fields mean. Repeat this many times during the day and it's just another thing that slows you down. The bottom line is that OTLTs are difficult to work with.

There are also performance issues when using an OTLT that will be discovered as your system grows. Hopefully, at this point, you have decided not to go the OTLT route and never have to deal with the performance issues in the future.

Summary

Below is a quick summary of some of the pros and cons of each lookup table pattern. I'm sure there are several more, but this should cover the biggest issues.

Issue OTLT Separate Tables
Data Integrity Difficult Easy
Self Documenting No Yes
Flexible No Yes
Efficient No Yes
Maintainable Easy Easy
Easy to Work With No Yes
Scalability/Performance Poor Good

Conclusion

While it might seem like a good idea at the time, the OTLT model trades off what appears to be an easier way to manage lookup data for a plethora of problems just under the surface. If you are thinking of using a OTLT, just say no. If for some reason you require all the values in one table, simply create a single view that unions all the separate lookup tables.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)