Backup Tables (not databases) in PostgreSQL

Darren G441

4.07/5 (5 votes)

27 Apr 2017CPOL7 min read

17.9K

155

An effective solution for backing up a schema worth of tables on a Window installation of PostgreSQL database

In this article, you will see a simple method to backing up a set of tables in a PostgreSQL database running on a Windows server.

Download source code - 1.5 KB

Background

A client of mine had a small staff team that needed to store geographical data in a proper database in order to improve their business processes. However, being a small team of people in a company, budget was tight and no one could justify a business case for a full blown SQL Server.

The express version was not an option because the database needed to contain large reference data sets such as building locations, river courses, etc. that blew the 10GB limit of SQL express before any "user" data tables were defined. The only real option was PostgreSQL with PostGIS extensions installed.

This was a great choice. The platform is well supported and there are plenty of community users out there with help and advice. The problem was backup. Postgresql comes out of the box with a couple of command line tools to perform backup, but nothing like the rich set of options for backup that I had been used to with SQL Server.

Although the out-of-the-box tools could easily backup the database, the sheer volume was large. Postgres does not allow cross database queries in the same way that SQL Server does so all the data you need for your application has to reside in the same database. Did I mention the 20GB plus worth of static reference data! Getting the uniquely changing value added data backed up and shipped off site over poor communications links needed a refined approach. My solution was a batch file and some information schema querying!

Using pg_dump and psql in Batch Files

The first hurdle when running the Postgres utility programs was passing in a password. The command line utilities are simple enough to use and both share some common switches to specify the particular database connection, but they don't have a command line switch to specify a password.

The standard connection switches are:

-d, --dbname=DBNAME      database to dump
-h, --host=HOSTNAME      database server host or socket directory
-p, --port=PORT          database server port number
-U, --username=NAME      connect as specified database user

The documentation indicates an environment variable that can be set to hold the password, but that is not something I really wanted to do. Also, this could not be easily used if you needed to connect to different server databases with different passwords. However, the utility programs also look for password file that is very easy to configure.

Setting Up the Password File

Make sure you have a user account setup that can be used to run a task, i.e., the account has the right to "Logon as Batch". You then simply need to create a file in the following location:

BAT

%appdata%\postgresql\pgpass.conf

In practice, this will be like:

BAT

C:\Users\postgresBackup\AppData\Roaming\pgpass.conf

The contents of the file consist of lines of connection details that will be used by the PSQL and PG_DUMP tools. Each line holds five settings separated by a colon:

BAT

host_name:host_port:your_db_name:your_db_user_name:secret_password

Given that you can specify four of these parameters on the utility command lines, it's not hard for the utility to look for this file and scan its contents for a match on the first fields to lookup the fifth field that holds the password!

N.B.: Make sure that other users can't read this file!

HINT: Make sure the password for the user account for "postgresBackup" is strong and probably set to never expire! The last thing you want with a backup script is it stops running after 3 months because the underlying account password expired!

Using the Batch File

The script is designed to be run daily and it will maintain a 30 day history of table backups.

If you downloaded the batch file with this article, you will be able to use it from a scheduled task. However, you will need to alter the connection details and various paths at the top of the script.

You are probably best serverd by just looking at the batch directly, but I will discuss some the various parts below:

BAT

setlocal ENABLEDELAYEDEXPANSION

REM executable tools specified here
SET qTool=C:\Program Files\postgreSQL\9.6\bin\psql.exe
SET dumpTool=C:\Program Files\postgreSQL\9.6\bin\pg_dump.exe
SET zipTool=C:\Program Files\7-Zip\7z.exe

REM define connection variables here
SET cUser=postgres
SET cHost=localhost
SET cPort=5432
SET cDB=your_data_base
SET qSchema=your_schema

REM determine a path for the  Daily extract directory structure
SET DRota=C:\DBBackups\dailyRotation

REM Save typing later, these can be combined to the connect options required
SET cOptions=--username=%cUser% --host=%cHost% --port=%cPort% --dbname=%cDB%

REM Check directory structure for rotations
IF NOT EXIST %DRota% GOTO message_no_rotation

I have identified the path to each of the command line utilities that I will be calling in full. These may well be on your PATH, but I was taking no chances! Notice that in addition to the Postgresql tools I am also using 7-zip from here because I plan to zip all the extracted table backups into a single daily archive file.

The next part of the script is all about rotation. Rather than play around with date stamping the archives, I chose to use a 30 day set of folders. The script will create and rotate what it needs as long as the DRota path is valid and writable to the account running the script.

BAT

REM prep and rotate the backup dirs
REM temporarily change directory to the backup rotation area
pushd %DRota%

IF EXIST D-30 (
    DEL /Q /S D-30\*.*
    RmDir D-30
)
REM Rename D-29 to D-1 to D-30 to D-2
FOR /L %%i IN (29,-1,1) DO (

    SET /A "j=%%i + 1"

    REM Rename D-29 to D-30 etc.
    IF EXIST D-%%i RENAME D-%%i D-!j!

    REM if the D-29 did not exist then a "D-30" will not, so make an empty D-30
    IF NOT EXIST D-!j! MkDir D-!j!
)

REM Always create a D-1 to hold the current backup
MkDir D-1
REM return to the previous directory  when the last PUSHD command was run from 
popd

What's most important to notice here is that delayed expansion has been enabled. This means that the variable j is accessed using !j! instead of %j%. This is something to remember when looping in a batch script with the FOR command.

What I am doing here is renaming the directories rather than wasting disk activity of copying backups from one folder to the next.

I also use the PUSHD and POPD to move current working directories as a convenience and to keep the script legible.

Having renamed 1 through 29, I re-create the "D-1" folder ready for today's backup.

BAT

REM create WDXD (Working Data eXtract Directory) if not already present in the current directory
IF NOT EXIST .\WDXD mkdir WDXD

REM clear out the WXDX (using /Q to suppress the "Are you sure prompt" 
                        and /S to include subdirectories)
DEL /Q /S .\WDXD\*.* > NUL

REM Build a list of tables using psql
"%qtool%" %cOptions% -t -o ".\WDXD\tables.txt" -A -c "SELECT table_schema, table_name   
FROM information_schema.tables WHERE table_schema = '%qSchema%';"

As another convenience, I created a working folder WDXD. I could have used %TMP%, but this works equally well.

The key to using pg_dump to get just a set of tables is knowing what tables you need to backup. In this case, all the tables I want are held in a particular schema. The following query will give the required results as a two column recordset.

BAT

SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = '%qSchema%';

This query is embedded into command like to the PSQL utility. Notice that the value to filter the result set in the WHERE clause is substituted by an environment variable. There is scope here to generate bad SQL, if there happened to be embedded single quotes in the configured qSchema value.

The resulting output from this query is directed to a file using the -o switch. You will get a text file with content like:

schema1|table1
schema1|table2

There will not be any space padding of the result data because the -A switch was used. This suppresses the alignment of data in the results text.

Having this text file lets us do the real work stage.

BAT

pushd WDXD

FOR /F "tokens=1,2 delims=|" %%s in (tables.txt) DO     (

echo Dump %%s.%%t to %%s_%%t.sql

"%dumpTool%"  %cOptions% -c --format=plain --schema=\"%%s\" 
                         --table=\"%%s\".\"%%t\" > %%s_%%t.sql
echo.
echo add %%s_%%t.sql to archive %qSchema%_backup.7z
"%zipTool%" a -tzip %qSchema%_backup.zip %%s_%%t.sql
echo ---------------
)

The FOR /F command reads an input file line by line. The options in quotes following the /F switch controls how each line is interpreted. The delims=| tells the command to split the line into different fields when a pipe symbol is encountered. The tokens=1,2 directive tells the command to return the first and second fields into variables. So, the first field will be placed into the variable %%s as declared and the second into the next variable available, %%t.

The pg_dump command interprets the --table argument as a regular expression to find a table. It can also take numerous --table switches. It would have been possible to generate all the needed table dumps in a single command, but since we don't know ahead of time how many tables we will find in the future it seems safer to do it one table at a time.

Note: The regular expression interpretation is not what you may expect. Refer to the PostgreSQL documentation on the regular expression rules and evaluations!

Once the data is extracted, it is immediately added to an archive zip file in the working directory.

BAT

REM Move the 7z archive file to the D-1 extract folder
Move %qSchema%_backup.zip "%DRota%\D-1\%qSchema%_backup.zip"

REM back to the starting directory
popd

Finally, after all the tables are processed, the archive is moved to the D-1 directory and the current directory is returned to where we started when the script was first run. I always return any changed directories back to where you start because you never know over time if your script will be called from other scripts that might not favourably handle an unexpected change of working directory.

At the end of all this, you will have a simple to use 30 days history of the data in your tables.

This may not be as elegant a backup solution such as log shipping, but it will get you out of a hole when the inevitable day comes that someone tells you they deleted the wrong record before they went on holiday!

In your production environments, I would recommend running a monthly process to copy the D-1 archives so that you have a long term backup of "snapshots" of your tables.

History

V1: Initial publication (26^th April, 2017)
V2: Correction for type and formatting (27^th April, 2017)

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)