Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Relational Database Schema Versioning in 55 Lines of Code

5.00/5 (2 votes)
22 Oct 2018BSD4 min read 10.6K  
How to reliably version database schemas
The article describes an approach to reliably version relational database schemas. The proposed solution aims to solve the relational database schema versioning problem in a most streamlined and lightweight way without imposing any new build artifacts and build steps.

Introduction

The database schema versioning is a problem often thought of last during the implementation of database-backed applications (like web applications or micro services). Sometimes, even the basic rules get broken. The application code gets carefully versioned under version control, but the database schema versioning is neglected. Database creation scripts dumped from an ERD tool get stuffed into code repository in the hope that this is enough. More advanced teams provide delta scripts used to upgrade the production databases, which are already populated with data, and cannot be simply dropped and recreated from scratch. Often, the database schema upgrade record is being tracked manually within tools like Jira, MS Excel or just plain text file lying on DBAs desktop. These approaches are neither repeatable nor reliable.

Solution

On the internet, one can find solutions for managing database schema modification DDL scripts like flywaydb.org. These tools are fine but they impose complexity into the building process. The proposed solution, on the other hand, is more a convention than code and assumes that one:

  • keeps all database creation and modification DDL statements as a set of functions expressed in a programming language
  • uses a separate database table to keep track of schema versions
  • forces the application to check schema version at startup and makes it fail to start if mismatch gets detected
  • provides the application with a distinguished runtime mode (denoted for example by command line parameter) to upgrade the database schema

Implementation

The proposed solution has been implemented as a BSD licensed micro-library called "schemaupgrader". The library consists of a single source file SchemaUpgrader.java (55 lines of code excluding comments and blank lines) which provides two static functions:

  • getVersionOf - which returns the current version of a database schema, and
  • upgradeVersion - which upgrades database schema to a provided version and handles version tracking in a designated table called "versions".

The following code snippet shows proper application of these functions.

Java
import java.sql.Connection;
import static schemaupgrader.SchemaUpgrader.*;
import static java.util.Arrays.asList;

public class Main {

   private final static int EXPECTED_DB_VERSION = 1;
   /****************************************************************************
    *
    ***************************************************************************/
   public static void main(String[] args) throws Exception {

      finalJDBCDataSource ds = null; // I surely should have initialized this

      try (final Connection c = ds.getConnection()) {
         if (asList(args).contains("upgradeDB")) {
            upgradeVersion(c, EXPECTED_DB_VERSION, DatabaseVersions::build);
         } else if (getVersionOf(c) != EXPECTED_DB_VERSION) {
            throw new AssertionError("Database version mismatch.");
         }
      }
      //rest of the application logic here
   }
}

As has been stated before, the main idea is to check the version of the database schema first and exit an application if the version does not match the expected one. Alternatively, an upgrade of the application happens if a special command-line switch is provided. The schema upgrade may be also called from an alternative application entry point (another class with "main" function) or an entirely separate application if many applications share the same database.

In order to perform the actual schema changes, the upgradeSchema function requires a pointer to a builder function that is able to build the requested schema version by applying the set of changes to the previous one.The implementation is totally up to the developer, but for clarity, it is important to keep every version "patch" in a separate function. The example implementation may look like the following (to issue SQL commands, I used another micro-library of mine that can be found here but it is not mandatory).

Java
import java.sql.Connection;
import static fluentJDBC.FluentConnection.using;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.List;

public final class DatabaseVersions {
   /****************************************************************************
    *
    ***************************************************************************/
   public static void build(final Connection c, final int version)
         throws Exception {

      switch (version) {
         case 1:
            v1_createUsersTable(c);
            return;
         case 2:
            v2_createLogsTable(c);
            return;
         case 3:
            v3_trimUserNames(c);
            return;
      }
   }
   /****************************************************************************
    * 
    ***************************************************************************/
   static void v1_createUsersTable(final Connection c) throws SQLException {

      try (final Statement s = c.createStatement()) {
         s.execute("create table users(name varchar(20) primary key, pass varchar(20))");
         s.execute("insert into users values('\ta', 'b')");
      }
   }
   /****************************************************************************
    *
    ***************************************************************************/
   static void v2_createLogsTable(final Connection c) throws SQLException {

      using(c).prepare("create table logs(ts bigint primary key,"
            + "user varchar(20),"
            + "msg varchar(200),"
            + "foreign key (user) references users(name))").andUpdate();
   }
   /****************************************************************************
    *
    ***************************************************************************/
   static void v3_trimUserNames(final Connection c) throws Exception {

      final List<String> userNames = using(c).prepare("select name from users").
            andMap((rs) -> rs.getString(1));

      for (final String name : userNames) {
         using(c).prepare("update users set name = ? where name = ?").
               set(name.trim()).set(name).andUpdate();
      }
   }
}

In this example, the first function creates "users" table with a single user. The second function adds "logs" table. The third function does not change the schema itself but is used to clean the data already populating the database. The obvious constraint of this approach is that one may NEVER modify the existing functions that have been deployed to production, as this will cause loss of consistency.

Pros and Cons

The proposed solution exhibits the following advantages:

  • DDL code is controlled along with application code (syncing a working copy with the repository will always result with newest set schema version patch functions, since they are code).
  • The process is reliable (if the database version does not match the expected version, the application will not start, upgrading database, on the other hand, guarantees proper schema version alignment).
  • The process is repeatable (all schema changes are applied in the same manner: add version patch function, update expected version constant, build the application, run the application against the existing database), and does not require any additional build steps.
  • The solution is flexible (version patches can upgrade the schema, migrate data, clean data, transform data, etc.).
  • The DDL code is testable - every version patch function can be put under test harness; the following listing shows such test.
Java
@Test public void upgradeVersion_upgradesDatabase_forProperInvocation()
         throws Exception {

   upgradeVersion(this.c, 3, DatabaseVersions::build);

   assertEquals(3, getVersionOf(this.c));
   assertEquals(new Integer(3),
            using(c).prepare("select count(*) from versions").
            andMapOne((rs) ->rs.getInt(1)).get());
   assertEquals("b",
            using(c).prepare("select pass from users where name = 'a'").
            andMapOne((rs) ->rs.getString(1)).get());
   assertFalse(using(c).prepare("select pass from users where name = '\ta'").
               andMapOne((rs) ->rs.getString(1)).isPresent());
 }

The proposed solution exhibits the following disadvantages:

  • No SQL DDL script that could be fetched into an ERD tool - this is widely offset by the ability of tools to dump schema from a live database.
  • No SQL syntax checks during development as the SQL code is represented by string constant - offset by testability.
  • Database schema upgrades or data cleaning action may take long for big databases - DDL scripts expose the same issue.
  • Version patch functions unit tests may take a long time because they need to target an actual database.

Conclusion

The proposed solution aims to solve the relational database schema versioning problem in a most streamlined and lightweight way without imposing any new build artifacts and build steps. It is mostly convention over tools, backed by two utility functions placed in a single BDS licensed java file that one can included in a project and use freely.

License

This article, along with any associated source code and files, is licensed under The BSD License