Introduction
For a system/service I am working on, I basically need to keep a baseline which is a CSV file containing demographic information. The baseline file has many rows, of which each row represents a person's demographic information. Then a cronjob service will create a temporary CSV file with updated information or added persons. My service processes the new temporary file and updates the baseline to include any changed data or added persons, and also any persons to be removed from the baseline.
This article basically tells you how to use the diff -e
option to generate a GNU ed editor script combined with ed and some other commands to properly create a new baseline file.
Bash Script & Variables
The script is run in a bash shell, and I create variables to reference the baseline and temporary files:
NOW=$(date +"%Y%m%d%H%M")
BASELINE=`ls baseline/baseline.csv`
TEMP=`ls temp/compare.csv`
In the above script, the variable $NOW
is used for timestamping files. The format for the timestamp is “yyyymmddhhmm
”. This is handy whenever you want to keep track of when a file was created. The variable $baseline
is the baseline file and $TEMP
is the temporary file.
Creating ed Script
The following line uses the -e
option with diff
to create an ed script
:
diff -e $BASELINE $TEMP > ed-script
The file “ed-script
” is basically an ed
editor script.
Creating New Baseline
Then, to create a new baseline with the ed script
, you need to run the following command(s):
cp $BASELINE baseline/new_baseline.csv
(cat ed-script && echo w) | ed - baseline/new_baseline.csv
I’ve shown 2 command lines, one to first create a copy of the original baseline so as to not overwrite the original baseline yet; and then secondly, create the new baseline. The (cat ed-script && echo w)
part of the script, basically cats the ed-script
to standard output and then issues a w
to write the file; this is all piped into ed
and the new baseline file to generate.
Backing Up Old Baseline
It’s a good idea to archive (or backup) things in case anything goes wrong:
mv $BASELINE archive/baseline_$NOW.csv
mv baseline/new_baseline.csv $BASELINE
The above moves the original baseline to an archive folder and appends a timestamp to the filename. Then the second move (mv
), renames the file to baseline.csv which completes the creation of the new baseline.
Entire Script
For reference, here is the entire script:
NOW=$(date +"%Y%m%d%H%M")
BASELINE=`ls baseline/baseline.csv`
TEMP=`ls temp/compare.csv`
diff -e $BASELINE $TEMP > ed-script
cp $BASELINE baseline/new_baseline.csv
(cat ed-script && echo w) | ed - baseline/new_baseline.csv
mv $BASELINE archive/baseline_$NOW.csv
mv baseline/new_baseline.csv $BASELINE
Enjoy!