In this article, you will see a simple method to backing up a set of tables in a PostgreSQL database running on a Windows server.
Background
A client of mine had a small staff team that needed to store geographical data in a proper database in order to improve their business processes. However, being a small team of people in a company, budget was tight and no one could justify a business case for a full blown SQL Server.
The express version was not an option because the database needed to contain large reference data sets such as building locations, river courses, etc. that blew the 10GB limit of SQL express before any "user" data tables were defined. The only real option was PostgreSQL with PostGIS extensions installed.
This was a great choice. The platform is well supported and there are plenty of community users out there with help and advice. The problem was backup. Postgresql comes out of the box with a couple of command line tools to perform backup, but nothing like the rich set of options for backup that I had been used to with SQL Server.
Although the out-of-the-box tools could easily backup the database, the sheer volume was large. Postgres does not allow cross database queries in the same way that SQL Server does so all the data you need for your application has to reside in the same database. Did I mention the 20GB plus worth of static reference data! Getting the uniquely changing value added data backed up and shipped off site over poor communications links needed a refined approach. My solution was a batch file and some information schema querying!
Using pg_dump and psql in Batch Files
The first hurdle when running the Postgres utility programs was passing in a password. The command line utilities are simple enough to use and both share some common switches to specify the particular database connection, but they don't have a command line switch to specify a password.
The standard connection switches are:
-d, --dbname=DBNAME database to dump
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port number
-U, --username=NAME connect as specified database user
The documentation indicates an environment variable that can be set to hold the password, but that is not something I really wanted to do. Also, this could not be easily used if you needed to connect to different server databases with different passwords. However, the utility programs also look for password file that is very easy to configure.
Setting Up the Password File
Make sure you have a user account setup that can be used to run a task, i.e., the account has the right to "Logon as Batch". You then simply need to create a file in the following location:
%appdata%\postgresql\pgpass.conf
In practice, this will be like:
C:\Users\postgresBackup\AppData\Roaming\pgpass.conf
The contents of the file consist of lines of connection details that will be used by the PSQL and PG_DUMP tools. Each line holds five settings separated by a colon:
host_name:host_port:your_db_name:your_db_user_name:secret_password
Given that you can specify four of these parameters on the utility command lines, it's not hard for the utility to look for this file and scan its contents for a match on the first fields to lookup the fifth field that holds the password!
N.B.: Make sure that other users can't read this file!
HINT: Make sure the password for the user account for "postgresBackup
" is strong and probably set to never expire! The last thing you want with a backup script is it stops running after 3 months because the underlying account password expired!
Using the Batch File
The script is designed to be run daily and it will maintain a 30 day history of table backups.
If you downloaded the batch file with this article, you will be able to use it from a scheduled task. However, you will need to alter the connection details and various paths at the top of the script.
You are probably best serverd by just looking at the batch directly, but I will discuss some the various parts below:
setlocal ENABLEDELAYEDEXPANSION
SET qTool=C:\Program Files\postgreSQL\9.6\bin\psql.exe
SET dumpTool=C:\Program Files\postgreSQL\9.6\bin\pg_dump.exe
SET zipTool=C:\Program Files\7-Zip\7z.exe
SET cUser=postgres
SET cHost=localhost
SET cPort=5432
SET cDB=your_data_base
SET qSchema=your_schema
SET DRota=C:\DBBackups\dailyRotation
SET cOptions=--username=%cUser% --host=%cHost% --port=%cPort% --dbname=%cDB%
IF NOT EXIST %DRota% GOTO message_no_rotation
I have identified the path to each of the command line utilities that I will be calling in full. These may well be on your PATH, but I was taking no chances! Notice that in addition to the Postgresql tools I am also using 7-zip from here because I plan to zip all the extracted table backups into a single daily archive file.
The next part of the script is all about rotation. Rather than play around with date stamping the archives, I chose to use a 30 day set of folders. The script will create and rotate what it needs as long as the DRota
path is valid and writable to the account running the script.
pushd %DRota%
IF EXIST D-30 (
DEL /Q /S D-30\*.*
RmDir D-30
)
FOR /L %%i IN (29,-1,1) DO (
SET /A "j=%%i + 1"
IF EXIST D-%%i RENAME D-%%i D-!j!
IF NOT EXIST D-!j! MkDir D-!j!
)
MkDir D-1
popd
What's most important to notice here is that delayed expansion has been enabled. This means that the variable j
is accessed using !j!
instead of %j%
. This is something to remember when looping in a batch script with the FOR
command.
What I am doing here is renaming the directories rather than wasting disk activity of copying backups from one folder to the next.
I also use the PUSHD
and POPD
to move current working directories as a convenience and to keep the script legible.
Having renamed 1 through 29, I re-create the "D-1" folder ready for today's backup.
IF NOT EXIST .\WDXD mkdir WDXD
and /S to include subdirectories)
DEL /Q /S .\WDXD\*.* > NUL
"%qtool%" %cOptions% -t -o ".\WDXD\tables.txt" -A -c "SELECT table_schema, table_name
FROM information_schema.tables WHERE table_schema = '%qSchema%';"
As another convenience, I created a working folder WDXD. I could have used %TMP%, but this works equally well.
The key to using pg_dump
to get just a set of tables is knowing what tables you need to backup. In this case, all the tables I want are held in a particular schema. The following query will give the required results as a two column recordset
.
SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = '%qSchema%';
This query is embedded into command like to the PSQL utility. Notice that the value to filter the result set in the WHERE
clause is substituted by an environment variable. There is scope here to generate bad SQL, if there happened to be embedded single quotes in the configured qSchema
value.
The resulting output from this query is directed to a file using the -o
switch. You will get a text file with content like:
schema1|table1
schema1|table2
There will not be any space padding of the result data because the -A
switch was used. This suppresses the alignment of data in the results text.
Having this text file lets us do the real work stage.
pushd WDXD
FOR /F "tokens=1,2 delims=|" %%s in (tables.txt) DO (
echo Dump %%s.%%t to %%s_%%t.sql
"%dumpTool%" %cOptions% -c --format=plain --schema=\"%%s\"
--table=\"%%s\".\"%%t\" > %%s_%%t.sql
echo.
echo add %%s_%%t.sql to archive %qSchema%_backup.7z
"%zipTool%" a -tzip %qSchema%_backup.zip %%s_%%t.sql
echo ---------------
)
The FOR /F
command reads an input file line by line. The options in quotes following the /F
switch controls how each line is interpreted. The delims=|
tells the command to split the line into different fields when a pipe symbol is encountered. The tokens=1,2
directive tells the command to return the first and second fields into variables. So, the first field will be placed into the variable %%s
as declared and the second into the next variable available, %%t
.
The pg_dump
command interprets the --table
argument as a regular expression to find a table. It can also take numerous --table
switches. It would have been possible to generate all the needed table dumps in a single command, but since we don't know ahead of time how many tables we will find in the future it seems safer to do it one table at a time.
Note: The regular expression interpretation is not what you may expect. Refer to the PostgreSQL documentation on the regular expression rules and evaluations!
Once the data is extracted, it is immediately added to an archive zip file in the working directory.
Move %qSchema%_backup.zip "%DRota%\D-1\%qSchema%_backup.zip"
popd
Finally, after all the tables are processed, the archive is moved to the D-1 directory and the current directory is returned to where we started when the script was first run. I always return any changed directories back to where you start because you never know over time if your script will be called from other scripts that might not favourably handle an unexpected change of working directory.
At the end of all this, you will have a simple to use 30 days history of the data in your tables.
This may not be as elegant a backup solution such as log shipping, but it will get you out of a hole when the inevitable day comes that someone tells you they deleted the wrong record before they went on holiday!
In your production environments, I would recommend running a monthly process to copy the D-1 archives so that you have a long term backup of "snapshots" of your tables.
History
- V1: Initial publication (26th April, 2017)
- V2: Correction for type and formatting (27th April, 2017)