In this article, we will see why primary keys are important, different types of keys you can use, and a primary key with values that are guaranteed to have never, ever been used in the entire universe (with a 1/10^37 chance of error).
Changing the schema for large relational databases costs companies millions of dollars every year. Relational databases are highly intertwined, so one small change can have unexpected, cascading effects.
The foundation for a table in a relational database is the primary key. Because of its importance, we have to give special consideration to how we select our primary keys.
In this article, I’ll discuss why primary keys are important, different types of keys you can use, and a primary key with values that are guaranteed to have never, ever been used in the entire universe (with a 1/10^37 chance of error).
Understanding Keys in SQL
Relational databases have been around for almost 50 years. In its simplest form, the relational data model is a collection of tables containing rows of data. These databases receive the “relational” qualifier because “related” tables are connected together by keys.
There are two types of keys in SQL: primary and foreign.
Primary keys uniquely identify rows in a table. By placing a PRIMARY KEY
constraint on a specific column in a table, the database engine will guarantee that no row can be entered with the same key as an existing row. In the next section, I cover how to create a primary key in Postgres.
A foreign key is a value in a second table that references the primary key for the first table. This primary/foreign key relationship is the glue that ties relational tables together.
In the relationship diagram above, id
in the Pets
table is the primary key. The Pet Owners
table creates a foreign key, pet_id
, that is refers is the same value as id
in the Pets
table. Because these two values are the same, now the Pets
and Pet Owners
can be connected with a join
statement.
Natural vs. Surrogate Keys
Primary keys can have either natural or surrogate key values. Natural keys are associated with a business context. For example, if you used a dogs breed as the primary key for the Pets
table.
Now, that would be inadvisable since many pets can be the same breed.
A surrogate key is a generated unique identifier with no additional meaning. The two primary keys we discuss later in this post, UUIDs and auto-incremented, fall into this category. In our Pets
table, a surrogate key would be a better option because it's guaranteed to be unique.
Syntax for Creating a Primary Key with PostgreSQL
The primary key is defined when you create the table:
CREATE TABLE PINK_FLOYD (
id INTEGER PRIMARY KEY,
album_name TEXT NOT NULL,
release_date DATE NOT NULL…
);
In the background, Postgres implements primary keys by combining the UNIQUE
and NOT NULL
constraints.
Creating a Composite Primary Key
A primary key can also be created by combining two (or more) columns. In this case, the primary key is declared at the bottom of the CREATE TABLE
command, rather than at the top.
CREATE TABLE PINK_FLOYD (
id INTEGER,
album_name TEXT NOT NULL,
release_date DATE NOT NULL,
PRIMARY KEY (id, album_name,)
);
This type of concatenation is useful for organizing your database when you can be sure that the combination of the natural keys will be unique.
If you were at a very small company, for example, then you could use a combination of your employee’s first and last names as a primary key. However, this may not work at a larger company which employs more than one John Smith.
Primary keys guarantee uniqueness for each row of data within the same table. But what if we want a primary key that is unique across tables and even between databases?
A Universally Unique ID? Intro to UUIDs
UUID_v1 Example
Universally Unique Identifiers, or UUIDs, were first introduced in 1990 by Apollo Computers. They were later standardized by the Open Software Foundation (OSF) as an ID that guaranteed “uniqueness across space and time.”
UUIDs are also referred to as GUIDs (Globally Unique Identifiers), which comes from Microsoft’s name for them.
UUIDs, or GUIDs, are 128-bit values, represented as 32 base-16 digits. But there are different methods for generating these 32 digits.
UUID_v4 vs UUID_V1
In UUID version 1 (UUID_v1
), the values are generated by the computer system using a combination of the current time and the computer's MAC address (shown in the example above).
Alternatively, UUID_v4 generates the 32 random digits using a random number generator. This is the most commonly used UUID.
There is also a UUIDV5, but it is not random, so you have to make sure it’s unique by watching the inputs.
Creating a UUID Primary Key Using uuid-osp - PostgreSQL Example
Installing a SQL Client
The first step is downloading a SQL client to run your SQL commands.
Installing uuid-osp
First, check if you have the extension already installed by running SELECT * FROM pg_extension
.
If uuid-ossp
is not in the returned list, first you would run the make
and make install
commands to build from the source distribution. make
runs a compiling program, and make-install
copies the compiled files and sends them to their locations for you.
Now we can install the uuid-ossp
module by running:
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
Adding UUID to a table - Syntax
Let’s say we wanted to create a table to store all of Pink Floyd’s albums. This is how we could add a UUID to that table using uuid_generate_v4
:
CREATE TABLE PINK_FLOYD (
id uuid DEFAULT uuid_generate_v4 (),
album_name VARCHAR NOT NULL,
release_date DATE NOT NULL,
PRIMARY KEY (id)
);
We declare the primary key at the bottom, like when we were combining two tables to create one.
Alternative to UUID - Auto-Increment Primary Key
What if we don't need to be unique across all space and time? Another type of surrogate key is the auto-increment primary key.
An auto-increment primary key increases the key value by 1 for every new row that is added. MySQL uses an AUTO_INCREMENT
keyword, but in Postgres, we create auto-incrementing keys by setting the data type as SERIAL
:
CREATE TABLE PINK_FLOYD (
id SERIAL PRIMARY KEY,
album_name TEXT NOT NULL,
release_date DATE NOT NULL…
PRIMARY KEY (id)
);
Now, every time you INSERT
, Postgre will make a new, auto-incremented key for your insertion. For example, if you wanted to add data for a new album, you could run the following command without having to specify the id in VALUES
:
INSERT INTO PINK_FLOYD (album_name, release_date) VALUES ("The Wall", 1979-11-30);
UUID vs Auto-Increment Primary Key
Auto-incremented primary keys increase read performance because they’re simpler than UUIDs and faster to scan, but they have some major downsides:
- Revealing potentially sensitive data
- Duplicate keys across databases
Auto-incremented keys reveal how many rows your table has, and you may not want that information to be public-- and sometimes you might want primary keys displayed in your URLs to keep them dependable.
Auto-incremented primary keys also have the potential to be duplicated across databases. This could cause a problem in distributed computing systems that require a key to be unique across each node.
But are UUIDs Really Unique?
You might say ok, but couldn't a UUID generate create the same primary key within a table? Since UUIDs are randomly generated, that’s technically possible.
Possible, but the probability is infinitesimally low. If you had 103 trillion version-4 UUIDs, the probability of finding a duplicate is one in a billion. At a rate of 1 UUID per second, it would take 3 million years to get to 103 trillion IDs.
UUIDs are sufficiently unique or almost every use case, but they can be a lot to store. If your table is too large to fit into the cache all at once, your UUID might be outside the cache if you’re selecting by UUID, slowing everything down instead of just moving to the next id.
So should I use a UUID?
It depends.
If security is your top concern and you have a lot of storage, UUIDs are a good strategy. If you’ve got a large, but relatively unexposed database, you may want to opt for auto-incrementation.
Conclusion
In summary, primary keys uniquely identify rows in a table.
You can create a primary key in a couple of different ways, including by making a composite primary key.
If you want to use UUIDs, you’ll need to install uuid-ossp
in postgres and then include your UUID version in your CREATE TABLE
command.
There are pros and cons to UUIDs -- they’re very good for security, especially if your ids are in your URLS, but if your database is huge, they can slow it down in comparison to auto-incremented ids.
But can you put a price tag on being unique? Go forth and be the proud owners of a 128 digit number that no one else in the world has! (probably)