Aiming for beauty through simplicity
Some time ago, I was working on a project to send emails when events happen. I took an aggressively simple approach: "Make it as simple as possible, then make it simpler."
- Use VBScript
- Do not use a database
- Make it robust and restartable
Part of the solution is a persistent, semi-transactional extension of the Scripting.Dictionary
object written in pure VBScript. This post explains how it works and gives the code. This approach would be very simple to implement in other languages like Python, PHP, Ruby, or even C#.
Persistence and serialization
To persist an object from memory, we need to serialize it. That means to take the spread about bits of data in memory and write them out to a file. Files are sequential bytes of data and so the process gets serialization (even though we can actually access files randomly).
One serious challenge when serializing anything from VBScript is the lack of a binary format. The easiest way to serialize and deserialize is to use text streams where a line represents a serialized datum or group of data. The easiest way to encode a group of data into a line is via some delimited character because of VBScript's high speed Join and Split operators. However, simply joining strings is no good because those strings may well have a delimiter or line end characters in them.
Escape()
and Unescape()
come to the rescue. These functions were originally intended to URL escape strings. URLs are in ASCII (8 bit per character) standard and only allow letters, numbers, and a few other symbols. This means that any string, including Unicode, can be escaped into a string which will never have a line end character and will never have a comma in it. So, we can escape all strings and then group data using comma separated values and put one group per line.
This piece of code demonstrates how Escape and Unescape function with Unicode characters:
Dim s,es
s=chrw(256) & chrw(512) & chrw(1024) & chrw(2048) & _
chrw(4096) & chrw(8192) & chrw(16384) & chrw(32768) & chrw(65535)
es=Escape(s)
WScript.Echo es
es=Unescape(es)
WScript.Echo s=es
Persistence, restartability, and transaction logs
The aim of this development is to create a very simple way of persisting data so that:
- Scripts can restart 'knowing' what it did last time.
- If a script crashes, or there is some other failure, it can restart where it left off.
- Storage of data is in some format where it can be 'looked up'.
- We do not have to use some complex database solution like JET or MS SQL Server.
Starting with requirements 3 and 4, I chose to use a Dictionary
object. These are standard in VBScript as they come with the Scripting COM object set as "Scripting.Dictionary
". Dictionaries allow us to look up data by keys, and they do not require a database. Using the Escape and Unescape techniques I discussed above, we can serialize the values and keys of a Dictionary
to a file. That leaves requirements 1 and 2. These, I tackled with a transaction log, coupled with rollback, roll-forward, and commit.
Typically, when a programmer sets out to store something on disk, they write code to serialize the whole of a data-structure at once. This means that if the program crashes in the process of writing the file, then the file is corrupt and none of it is any use. It also means that each time the data is changed, if that change is to be persistent, the whole object has to be serialized to disk again.
A completely different way of thinking about serialization is to store the changes to a data structure to disk. A data structure is the product of all the changes made to it. If a program stores each and every change in exactly the order they occurred, then re-running the changes will recreate the data. This is exactly how many transactional databases pull off their transactional trick. They store an out of date version of the data on disk and then the up to date information is persisted by writing out the changes. I took an even simpler approach and only stored the changes, and so only had the transaction log.
The transaction log approach is great, but it has a few drawbacks:
- Every time a change is made to the data, a write has to be made to disk, which is performance sapping.
- If several changes are part of a single restartable 'unit of work', then it is incorrect to write only some and not others. I.e., for restartability, we need to be able to choose when to persist the changes and persist them in groups.
- For exception handling, if an exception (error) occurs during one of these units of work, it would be nice to be able to rollback the whole unit of work.
This is where rollback, roll-forward, and commit come in. Each time a change is made to my dictionary, a record of the action required to undo the change is recorded in memory, as is a record of the action to do the change. These two records are called rollback and roll-forward. It is only when Commit
is called that the roll-forward records are written to disk. If Rollback
is called, then the undo actions are read in reverse order and the appropriate actions to undo all the changes in the unit of work (transaction) are taken. Because the system is not multi-threaded, I am loath to call it truly transactional. Also, there is no way to get the FileSystemObject.TextStream
object to write through the disk cache, so I call my TransDictionary
'semi-transactional'.
The final issue is that after a long time and a lot of data changes, the transaction log will get rather long. Always recreating the data from the complete list of changes can become inefficient. To help with this, I added a method CreateCleanLog
which writes out the minimum log file required to recreate the data. The idea being that one would write out the clean log, back up the old one, and then replace the old with the clean.
Some examples of using TransDictionary
Option Explicit
Const LogFile="C:\Logs\Log.txt"
Const CleanLogFile="C:\Logs\CleanLog.txt"
Dim transDict,rc
Set transDict = New TransDictionary
transDict.LoadLog(LogFile)
If transDict.Exists("Ran Count") Then
rc=transDict.Item("Ran Count")
rc=rc+1
Else
rc=1
End If
WScript.Echo "This script had been run " & rc & " times"
transDict.SetValue "Ran Count",rc
WScript.Echo "The dictionary has the value " & transDict.Item("Ran Count")
transDict.Rollback
WScript.Echo "After rollabck it has the value " & transDict.Item("Ran Count")
transDict.SetValue "Ran Count",rc
transDict.Commit
transDict.CreateCleanLog CleanLogFile
First run:
Output:
The dictionary has the value 1
After rollabck it has the value
logFile:
R,Ran%20Count
S,Ran%20Count,1
cleanLogFile:
S,Ran%20Count,1
Second run:
Output:
This script had been run 2 times
The dictionary has the value 2
After rollabck it has the value 1
logFile:
S,Ran%20Count,2
cleanLogfile:
R,Ran%20Count
S,Ran%20Count,1
R,Ran%20Count
S,Ran%20Count,2
The Code
Class TransDictionary
Dim dictionary,logFile
Dim rollBk,rollFw,lfn,fso,clean
Public Sub Class_Initialize
Set Me.dictionary=CreateObject("Scripting.dictionary")
Set Me.rollBk=CreateObject("Scripting.dictionary")
Set Me.rollFw=CreateObject("Scripting.dictionary")
Me.clean=TRUE
End Sub
Public Sub LoadLog(logFileName)
Me.lfn=logFileName
Set Me.fso=CreateObject("Scripting.FileSystemObject")
Set Me.logFile=Me.fso.OpenTextFile(Me.lfn,1,true)
While Not Me.logFile.AtEndOfStream
action Me.logFile.ReadLine()
Wend
Me.logFile.Close
Set Me.logFile=Me.fso.OpenTextFile(Me.lfn,8,false)
End Sub
Private Sub Class_Terminate
On Error Resume Next
Me.logFile.Close
End Sub
Private Sub action(line)
Dim row
row=Split(line,",")
If row(0)="S" Then
internalSet Unescape(row(1)),Unescape(row(2))
ElseIf row(0)="R" Then
internalRemove Unescape(row(1))
End If
End Sub
Private Sub addRemove(key)
Me.rollBk.Add Me.rollBk.count,"S," & Escape(key) & _
"," & Escape(Me.dictionary.Item(key))
Me.rollFw.Add Me.rollFw.count,"R," & Escape(key)
End Sub
Private Sub addSet(key,value)
Me.rollBk.Add Me.rollBk.count,"R," & Escape(key)
Me.rollFw.Add Me.rollFw.count,"S," & Escape(key) & "," & Escape(value)
End Sub
Private Sub internalSet(key,value)
If Me.dictionary.Exists(key) Then Me.dictionary.Remove key
Me.dictionary.Add key,value
Me.clean=False
End Sub
Private Sub internalRemove(key)
If Me.dictionary.Exists(key) Then Me.dictionary.Remove key
Me.clean=False
End Sub
Public Sub Commit
Dim i,c
c=Me.rollFw.Count -1
For i=0 To c
Me.logFile.WriteLine Me.rollFw.Item(i)
Me.logFile.Close
Set Me.logFile=Me.fso.OpenTextFile(Me.lfn,8,false)
Next
Me.rollFw.RemoveAll
Me.rollBk.RemoveAll
Me.clean=True
End Sub
Public Sub RollBack
Dim i,c
c=Me.rollBk.Count -1
For i=c To 0 Step -1
action Me.rollBk.Item(i)
Next
Me.rollBk.RemoveAll
Me.rollFw.RemoveAll
Me.clean=true
End Sub
Public Sub CreateCleanLog(newFileName)
If Not Me.clean Then
Err.Raise -1,"Not Me.clean, commit or rollback first"
End If
Me.logFile.Close
Dim olfn
olfn=Me.lfn
Me.lfn=newFileName
Set Me.logFile=Me.fso.OpenTextFile(Me.lfn,2,true)
Dim k
For Each k In Me.dictionary.Keys
addSet k,Me.dictionary.Item(k)
Next
Commit
Me.logFile.Close
Me.lfn=olfn
Set Me.logFile=Me.fso.OpenTextFile(Me.lfn,8,false)
End Sub
Public Sub SetValue(key,value)
If Me.dictionary.Exists(key) Then
addRemove key
End If
addSet key,value
internalSet key,value
End Sub
Public Sub RemoveAll
Dim k
For Each k In Me.dictionary.Keys
Remove(k)
Next
End Sub
Public Sub Remove(key)
If Me.dictionary.Exists(key) Then
addRemove(key)
internalRemove(key)
End If
End Sub
Public Function Items()
Items=Me.dictionary.Items
End Function
Public Function Item(key)
Item=Me.dictionary.Item(key)
End Function
Public Function Keys()
Keys=Me.dictionary.Keys
End Function
Public Function Exists(key)
Exists=Me.dictionary.Exists(key)
End Function
End Class