Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / database / SQL-Server

File Synchronization using a NAnt Task

4.37/5 (9 votes)
3 May 2006CPOL13 min read 1   805  
Codes for file synchronization between local and local/remote target directory trees that can be run under the NAnt or a console are provided.

Figure: Graphic representation of the file synchronization task.

Introduction

Many developers, like myself, with a sufficiently large collection of files to manage, find it at least time consuming to do it manually. There are many tools that one can use, some of them are free, the others require a fee. They fit different needs. I am going to introduce one of them here, which is extracted from the build system that was developed to realize some of our development requirements of being flexible in an evolving (developing!) environment and expandable. Being tight on time and budget, developing it from scratch was not the first option. After a search, the following open source tools were found that could be integrated in the said system of ours:

  1. The build program NAnt (0.85 Release Candidate 1), which controls the build process using XML (script). It allows one to visualize the task using a tree (see the above picture).
  2. Uwe Keim's Application for uploading modified files to a FTP Server, which uses a local database to trace the update status of target files.

It contains an independent console entry point when it is compiled with the _STANDALONE compilation flag defined. The synchronization process is controlled using the same configuration file as the one provided by Uwe Keim; the added "versionDate" attribute for the <fileSettings> node is the used version control, which is discussed below. Some additional configuration nodes can also be seen in the code, but they are not used by the current functional units. A sample configuration file is included in the source package. Some of the readers, especially those who know how to run NAnt, may find it more convenient to run the application under NAnt when the need to integrate it into their auto build system arises.

For those who don't, and who find learning script writing rules of NAnt hard to follow, given one's available time constraint, there are tools to help. An example is the NAntPad. We naturally prefer our own:-), namely the X-Script Generator of CryptoGateway. It was used here to handle simple tasks like file synchronization, to complex ones like C++ multi-hierarchic state-machine generation, and web site framework generation based on data relations, e.g., those that mirror cryptogateway.com. Simply put, it can be used as a syntax based micro-program visual integrator that can't be matched by traditional UI programs, albeit it does not look as fancy. The code provided is a realistic sample assembly that contains sufficiently complete usage of the meta tags in a task to extend the NAnt built-in syntax (as we call it) consumed by the X-script Generator. Interested users of the latter should consult the developer's manual, also included in the binaries package, for more details. It can be useful to a user of the said generator who would like to do some meta programming to unleash the full power of it, or those who simply like to play with their own language construction for fun, or for rapid acceptance of their construction by their peers without writing a document like this one and have them to read it in the first place when it gets a little involved.

How it works and how to use the code

There are five major functional units and three layers in the code. The five units are:

  1. The IO component - If the console build is used, the input and output parameters are specified in the configuration file, a sample of which is provided in the included package. If it is run under NAnt, the input and output parameters are specified inside the NAnt script file.
  2. The action to perform which is specified in the verb attribute of the top <fileSync> node. There are three actions:
    • "scan" instructs the program to scan a directory tree, starting from the folder specified by the <source> child node of the file <group> node. The scan is recursive inside the directory tree if the "recurs" attribute of the <settings> child node of the file <group> node is either not set or is set to true. Otherwise, only the starting directory is scanned.
    • "listen" instructs the program to watch file changes in the set of file <group>s and perform the corresponding update when necessary. It is realized using the FileSystemWatcher provided by the .NET class library. This branch of codes in the current assembly is not well tested yet, since it is not used in our current build system. It was found that the FileSystemWatcher does not always report the file changes correctly. No real effort is spent on finding the cause of it yet. The user is invited to use it and improve it if it fits their needs. For example, the use of a job queue to handle the events asynchronously since the present version is not expected to work well if the target directory is not on a local machine, and the average time between two nearest file change events fired is less than the average time needed to finish a typical file update.
    • "dbsync" instructs the program to insert records of the source file <group> into the database without performing the actual update on the target directory. This can be useful if the user knows that the update has already been performed using other means or tools and the actual update process takes substantial time to finish.
  3. The database component - The database is used to record, for each target file storage directory tree called <group>, the last modification time of each file contained in the target directory (tree). In each scan of the source directory tree, every source file's last modification date is compared against the database record. If the source file is newer, it will be copied to the target directory. Before a scan, each database record corresponding to a specific file <group> is marked "checked out". During the scan, those files that are found in the source directory tree will either be inserted into the database when it is absent or be marked as "checked in". When the scan is done, the program will delete all the target files that are not "checked in" and the corresponding records. It is, therefore, important to give each file <group> a unique name. The current version of the program cannot handle overlapped file groups, namely the root directory for one group of files is a subdirectory of the other one.
  4. The FTP and file system component - The original component is written by Alex Kwok, some modifications were made here to the version contained in Uwe Keim's code to make it to work with more FTP servers (no claim is made here that it works with all servers). The main differences between the different FTP servers are concentrated in the following aspects:
    • The response codes - Different servers may respond differently in terms of the return code.
    • The response messages - Some servers contain response (information) lines that do not begin with any of the numerical FTP response code.
    • The directory list format - Different servers have different formats (UNIX, Windows, and a mix of the two) that have to be parsed correctly by implementing their handler.

    The component has to work on a machine to machine time scale in which the requests and responses are pipelined with no or little time delay. The original one tends to become out of sync under such conditions. The directory tree under the target folder (which should be pre-existing) mirrors that of the source one under the folder specified in the "folder" attribute of the <source> node. If any of the sub-directory trees does not exist, they will be created automatically. So, create directory should be authorized to the account that receives the target files, and care has to be taken to properly specify the target directory tree. Three elements are combined to form a full (local) path on the FTP server. They are:

    • The "baseFolder" attribute of the <account> node under the <ftp> node specifies the root directory of the FTP account.
    • The "folder" attribute of the <ftp> node under the <sink> node specifies the root target directory under which the source "folder" directory tree is going to be duplicated on the remote FTP account (or another local folder). It is always relative to the "baseFolder" so the leading '/' or '\' character, if present, will be ignored.
    • And the relative path to each source file under the source folder (directory tree) that is going to be mirrored in the root target directory, they are auto generated (mirrored) by the file scanner.
  5. The versioning mechanism - The NAnt task adaptor of the application uses the "version" attribute of the <sink> node to turn on/off the versioning; if set to empty, the versioning is turned on and its value is used as the version. Through the console entrance of the application, the "incrementalBackup" attribute in the <settings> node of the configuration file is used to turn on/off the versioning, and the "versionDate" attribute of the <fileSettings> node to set the version value. When the versioning is turned on, the target directory will contain the version information. Any version identifier can be used. When run under NAnt, two ways of marking a file's version can be selected from the list of valid ones for the "versionPath" attribute of the <sink> node:
    • "subpath" which instructs the program to create a subdirectory for each version, under the target root directory that mirrors the source directory tree and puts the target files there. So the version identifier should be so chosen that it is not the same as any child folder name under the target root directory. In the following sample build script, a current time-stamp is used since it is thought to be the most informative. In fact, any identifier can be used as long as it does not conflict with the existing folder name.
    • "append" which instructs the program to append the version string to the target root directory name. In this case, a user need not worry about the existing folder name in the directory tree, but the target root directory name is modified.

    The console version of the application always appends the version information to the target (root) directory.

  6. One more thing, there is actually a not finished and little tested branch of the code that performs difference analysis and differential updates. If anyone is interested in developing it further in an open project, please contact me.

The three layers are:

  1. The C# code, which is briefly introduced above.
  2. The NAnt attributes used by NAnt to recognize its nodes and attributes. Interested users who are new to NAnt should visit its hosting website to find more information.
  3. X-Script Generator tokenization attributes that are used to mark the 'lexical' and 'context' tokens and their relationships in the X-script. If a reader is interested in using the X-Script Generator and in developing their own version of "build languages", read the developer's manual included here or inside the X-Script Generator program package. The doc is in the binary package. It can also be generated from the code provided. The user's manual is online.

Using the task

When it is used in plain NAnt XML script, the task can be included under any target node of NAnt, provided that this task assembly is preloaded using the build in <loadtasks> task. The major task elements and their attributes are introduced above, others are either self-explanatory or are part of the NAnt built-in ones. If one loads the X-script project file in the Samples directory using the X-Script Generator, there is a tooltip popup window for each attribute that contains a description of that attribute. The user can also read the code to find these descriptions. They are contained in the "Description" (meta) property of the "TokenAttribute" (code) property (meta) attribute of each task element (class) that contains a "LexicToken" (meta) attribute.

The following NAnt script corresponds to an instance of the figure shown at the beginning of the article (the figure is actually copied from the node editing panel of the X-Script Generator):

XML
<?xml version="1.0" encoding="utf-8"?>
<project name="demo project" 
         default="File Upload" basedir="C:\Build" 
         output_script="C:\Build\scripts\file-sync.build">
    <description>
           Generated By: X-Script Generator 0.2.0.0
           Project File: FileSync.nant.proj
           Working Path: C:\Build\projects
           Other Info: This is a sample project.
           Generated At: 
             Tue, 25 Oct 2005 22:24:57 GMT  
             (Local Time: 2005-10-25 15:24:57Z)
    </description>
    <target name="File Upload">
        <fileGroup name="File Group C" 
                   id="6ec3a262-8768-4066-841a-9ad7cde926f8">
            <settings dbType="MSSQL">
                <dbConn server="10.0.0.12" 
                           database="test-db" 
                user="user" pwd="password" />
            </settings>
            <source folder="C:\SourceDir" />
            <sink versionPath="subpath">
                <ftp folder="/RelativePath/DestinationDir">
                  <account domain="www.somewhere.net" 
                    baseFolder="/" port="21" 
                    user="ftp-user" 
                    password="ftp-password" />
                </ftp>
            </sink>
        </fileGroup>
        <fileSync verb="scan" verbose="false">
            <group name="File Group A">
                <settings dbType="MSSQL">
                    <dbConn server="10.0.0.12" 
                               database="test-db" 
                               user="user" 
                               pwd="password" />
                </settings>
                <source folder="C:\SourceDir" />
                <sink version="2005-10-26 00:10:46" 
                         versionPath="subpath">
                    <ftp folder="/RelativePath/DestinationDir">
                       <account domain="www.somewhere.net" 
                         baseFolder="/" 
                         port="21" user="ftp-user" 
                         password="ftp-password" />
                    </ftp>
                </sink>
                <includes pattern=".txt" />
                <excludes pattern=".exe" />
            </group>
            <group name="File Group B">
                <settings dbType="ACCESS">
                    <dbConn database="test-db" />
                </settings>
                <source folder="C:\SourceDir" />
                <sink versionPath="subpath">
                    <local folder="C:\LocalTargetDir" />
                </sink>
                <includes pattern=".jpg" />
                <includes pattern=".gif" />
                <excludes pattern=".png" />
            </group>
            <group name="" 
              refid="6ec3a262-8768-4066-841a-9ad7cde926f8" />
        </fileSync>
    </target>
    <target name="File Download">
        <downloader outdir="C:\FtpDownloadDir" verbose="false">
            <source path="/RemoteItem1RelativePath">
                <ftp folder="/RemoteItemFolder">
                   <account domain="www.somewhere.net" 
                      baseFolder="/" 
                      port="21" user="ftp-user" 
                      password="ftp-password" />
                </ftp>
            </source>
            <source path="/RemoteItem2RelativePath">
                <ftp folder="/RemoteItemFolder">
                    <account domain="www.somewhere.net" 
                      baseFolder="/" 
                      port="21" user="ftp-user" 
                      password="ftp-password" />
                </ftp>
            </source>
        </downloader>
    </target>
</project>

To use it under the X-Script Generator, the assembly has to be pre-loaded so that it can recognize the syntax extensions defined in the assembly.

There is also a fTP file downloader which can be handy if included in an auto-build process. The current version of it is quite simple. It is suitable to be used when only a few items are downloaded at a time and cannot resume the task if the download is interrupted due to some unexpected events.

What's included

  1. Database and table creation scripts (named "fileinfo_table.sql" and "fileinfo_table_firebird.sql") located in the main directory that creates the single table (and stored procedures) used by the program on a user selected database. It is generated from a Microsoft SQL Server database and a Firebird database (version 1.5.2.4731). For other databases, slight modifications may be needed.
  2. The code/binaries for the task located in the "FileSync" subdirectory under the main one.
  3. The code/binary for the CodeTokenizers.dll under the TokenAttrib subdirectory, which is mainly a .NET reflection attribute's definition, is the assembly used by the X-Script Generator. It contains the developer's manual that can be processed to generate the HTML form to read, e.g., using the provided NAnt built script.
  4. The sample NAnt template project file named "file-sync.build" shown above which is contained in the "Samples" subdirectory that is initialized with fictitious parameters.
  5. A template configuration file named "sample.syncjob" that can be modified according to the application and then be read by the console version of the application. It is in the "Samples" subdirectory.
  6. The X-Script Generator project file named "FileSync.nant.proj" used to generate the above NAnt script. It is also in the "Samples" subdirectory.
  7. NAnt built files for the assemblies and developer's manual in the "Build\nant-scripts" subdirectory. A builder should read the "Readme.txt" file first. If he/she prefers to build it in an IDE, he/she can always include all the C# files in a project that refers to "NAntCore.dll" contained in the NAnt distribution and the CodeTokenizers.dll assembly contained in this package together with the references to the standard .NET libraries (including System.Xml.dll). The X-Script Generator project file that can be used to generate the build files is located in the "Build\XScriptProjs" subdirectory.

Revision History

  • Version 1.0: Nov. 1, 2005. Initial release.
  • Version 1.1: May. 2, 2006. Support for Firebird database is added. It uses stored procedures since the Firebird database does not seem to support procedure SQL (PSQL) outside of its stored procedures. (MS Access does not support PSQL either, but we used a less efficient way to realize the functionality needed). Some minor changes in the build scripts are also made.

To Do

  • Interested readers may find that further integration between the provided node elements and the NAnt built-in elements like the <fileset>, which allows a user to define the included file set on a much finer scale, and the current <includes> and <excludes> nodes of the task is possible. It is not attempted at present since many of our build scripts are dependent on the present version. It is not hard to do though.
  • The Oracle database option is not tested since we do not own an Oracle database. The user may find it easy to extend this to include other databases, like MySQL, etc.. Interested programmers/users may find unifying it under a OLE DB provider useful. It's not done here simply because some of our available codes/experience were used directly.
  • A standard NAnt documentation for the task for those who do not use the X-Script Generator.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)