In this post, we will go through number of refactors to clean up our application and to make it more extendable. From here, we can easily extend our application.
Let's continue from where we were in the last part of this project.
As always, if you want to follow along, the source code from last time can be found here. The reason there is a new release and not just continuing from last time is that in December, I made some modifications that were not mentioned in the post.
Let’s get up to speed with those changes and then we will see what we have planned for today.
- Updated the automated build command so that our NuGet testing dependencies are included in the build process, otherwise, the build will fail
- Trimmed the trailing backslash from the source and destination paths
- Updated the projects to use .NET Framework 4.7 (this won’t affect how the application runs but let us use the syntax)
So now, let’s recap where we left off. We created a glorified copy paste application that copies from one path and pastes it into the destination path, the difference with a normal copy paste is that we can configure where the paths come from, we don’t get a confirmation for overwriting files if they exist and we only overwrite files that are different (based on file size and when they were last updated).
What we want to change in this iteration are the following in order:
Step | Action | Reason |
1 | Move the logic inside the library | We might want to reuse the functionality in different applications like desktop applications, web applications, windows service and anything else we might think of. |
2 | Create our own file system enumerator | I noticed when we try to copy a full drive, the enumerator would try to access system files, which ideally shouldn’t be copied (and applications don’t have access to them most of the time anyway), as such we will make our own iterator to see how that works and we can extend with custom logic. |
3 | Create a base class for the copy behavior | At the moment, we cannot see what is going to be copied, what are new files that are going to be copied and what are the files that will be updated, with this we will be able to extend and reuse the enumerator logic but do different actions with each file. |
4 | Create a strategy and bring our workflow to life | With the changes we made earlier, we will see how to incorporate more workflows into one. |
5 | Add filters | We will add filters for specific files or file names or extensions. |
With that being said, let’s begin.
Move the Logic Inside the Library
First, we are going to create a new class in the FileSystemSyncher.Commons
project called FileSystemProcessor
. Inside that class, we will create a constructor that accepts an IConfigurationProvider
and a method called Run
. The reason for the constructor is because we want to provide a configuration provider from the outside of the class so that we can swap the provider out later on if we so wish.
Next, we will move the ConfigFileConfigurationProvider
class into the FileSystemSyncher.Commons
project as well because every .NET application can have a configuration file so we can reuse this logic on a web server or Windows service as well.
The changes we’re going to make to the Program.cs file are as follows:
namespace FileSystemSyncher.Console
{
using System;
using System.IO;
using Commons;
public static class Program
{
public static void Main()
{
try
{
IConfigurationProvider configurationProvider =
new ConfigFileConfigurationProvider();
ConfigurationOptions configurationOptions =
configurationProvider.GetOptions();
if (!configurationOptions.DestinationDirectory.Exists)
{
configurationOptions.DestinationDirectory.Create();
}
foreach (FileInfo sourceFile in
configurationOptions.SourceDirectory.EnumerateFiles
("*", SearchOption.AllDirectories))
{
string destinationFilePath = sourceFile.FullName.Replace
(configurationOptions.SourceDirectory.FullName,
configurationOptions.DestinationDirectory.FullName);
FileInfo destinationFile = new FileInfo(destinationFilePath);
if (!destinationFile.Exists ||
sourceFile.Length != destinationFile.Length ||
sourceFile.LastWriteTime != destinationFile.LastWriteTime)
{
if (destinationFile.DirectoryName != null)
{
Directory.CreateDirectory(destinationFile.DirectoryName);
File.Copy(sourceFile.FullName, destinationFilePath, true);
Console.WriteLine(destinationFilePath);
}
}
}
}
catch (Exception e)
{
Console.WriteLine(e);
}
}
}
}
to:
namespace FileSystemSyncher.Console
{
using System;
using Commons;
public static class Program
{
public static void Main()
{
try
{
IConfigurationProvider configurationProvider =
new ConfigFileConfigurationProvider();
FileSystemProcessor fileSystemProcessor =
new FileSystemProcessor(configurationProvider);
fileSystemProcessor.Run();
}
catch (Exception e)
{
Console.WriteLine(e);
}
Console.ReadLine();
}
}
}
There, isn’t that cleaner :D? Also, note the additional Console.ReadLine()
line after the catch
block (I forgot to add this at the end last time) which will keep the console up until the user presses any key so that if an exception is thrown, then we actually have time to see it.
Now for the FileSystemProcessor
class. This will look as follows:
namespace FileSystemSyncher.Commons
{
using System;
using System.IO;
public sealed class FileSystemProcessor
{
private readonly IConfigurationProvider _configurationProvider;
public FileSystemProcessor(IConfigurationProvider configurationProvider)
{
_configurationProvider = configurationProvider ??
throw new ArgumentNullException(nameof(configurationProvider));
}
public void Run()
{
ConfigurationOptions configurationOptions = _configurationProvider.GetOptions();
foreach (FileInfo sourceFile in
configurationOptions.SourceDirectory.EnumerateFiles
("*", SearchOption.AllDirectories))
{
string destinationFilePath = sourceFile.FullName.Replace(
configurationOptions.SourceDirectory.FullName,
configurationOptions.DestinationDirectory.FullName);
FileInfo destinationFile = new FileInfo(destinationFilePath);
if (!destinationFile.Exists || sourceFile.Length != destinationFile.Length ||
sourceFile.LastWriteTime != destinationFile.LastWriteTime)
{
if (destinationFile.DirectoryName != null)
{
Directory.CreateDirectory(destinationFile.DirectoryName);
File.Copy(sourceFile.FullName, destinationFilePath, true);
System.Console.WriteLine(destinationFilePath);
}
}
}
}
}
}
Things to note in this class:
- Notice the line inside the constructor, this will ensure that this class always receives a configuration provider since it is necessary for the algorithm to work.
- The
if
clause that checks if destination directory exists has been removed since we’re not interested if it exists or not because it will be created anyway when we are creating the path for the file.
This wraps up the moving of the class step.
Create Our Own File System Enumerator
Now that we moved the logic outside of the application, we can look into creating our own file enumerator so that we can have better control over the file retrieval.
First, let’s create an internal FileSystemEnumerator
class inside the FileSystemSyncher.Commons
project.
The resulting file will be as follows:
namespace FileSystemSyncher.Commons
{
using System.Collections.Generic;
using System.IO;
using System.Linq;
internal sealed class FileSystemEnumerator
{
internal static FileSystemEnumerator CreateInstance()
{
return new FileSystemEnumerator();
}
internal IEnumerable‹FileInfo›
EnumerateFilesBreadthFirst(DirectoryInfo directoryInfo)
{
foreach (FileInfo file in directoryInfo.EnumerateFiles())
{
yield return file;
}
foreach (DirectoryInfo subDir in directoryInfo.EnumerateDirectories()
.Where(info => !info.Attributes.HasFlag(FileAttributes.System)))
{
foreach (FileInfo file in EnumerateFilesBreadthFirst(subDir))
{
yield return file;
}
}
}
}
}
Let’s analyze this class a bit:
- The class and its methods are declared as
internal
and sealed
. This is because at this point in time, we have no reason to access this class from outside the assembly, neither do we see a reason to extend it. - The class has a
private
constructor and a static
method called CreateInstance
. This gives us control over how the class is used and also gives us the chance to control how it’s instantiated. - The name of the enumerating method is called EnumerateFilesBreadthFirst. This just gives us some insight into how the method traverses the file system.
- The inner
foreach
construct uses the same method to traverse recursively, so any custom logic we apply will still be in effect (more on this when we do the filters). - The sub directory enumeration filters out any system folders like the recycle bin folders in the root of drives. We shouldn’t copy system folders because most of the time, those folders are marked as such by Windows and might not work from one Windows installation to another, plus that might require admin privileges.
Now, with this class created, we just have to update our FileSystemProcessor
class as follows:
namespace FileSystemSyncher.Commons
{
using System;
using System.IO;
public sealed class FileSystemProcessor
{
private readonly IConfigurationProvider _configurationProvider;
public FileSystemProcessor(IConfigurationProvider configurationProvider)
{
_configurationProvider = configurationProvider
?? throw new ArgumentNullException
(nameof(configurationProvider));
}
public void Run()
{
ConfigurationOptions configurationOptions =
_configurationProvider.GetOptions();
FileSystemEnumerator fileSystemEnumerator =
FileSystemEnumerator.CreateInstance();
foreach (FileInfo sourceFile in fileSystemEnumerator.EnumerateFilesBreadthFirst(
configurationOptions.SourceDirectory))
{
string destinationFilePath = sourceFile.FullName.Replace(
configurationOptions.SourceDirectory.FullName,
configurationOptions.DestinationDirectory.FullName);
FileInfo destinationFile = new FileInfo(destinationFilePath);
if (!destinationFile.Exists || sourceFile.Length != destinationFile.Length
|| sourceFile.LastWriteTime != destinationFile.LastWriteTime)
{
if (destinationFile.DirectoryName != null)
{
Directory.CreateDirectory(destinationFile.DirectoryName);
File.Copy(sourceFile.FullName, destinationFilePath, true);
System.Console.WriteLine(destinationFilePath);
}
}
}
}
}
}
Please note that the enumerator is not passed into the method or constructor since at this point in time, it will only complicate things, especially since the enumerator is not open for extension. Later on, if we required working with multiple enumerators, we could pass it in as an argument.
That concludes this step in our development. Up next, we’re going to create a base class for processing the files.
Create a Base Class for the Copy Behavior
First, let’s create an abstract
class called FileProcessingStrategyBase
in the FileSystemSyncher.Commons
project.
The class will look as follows:
namespace FileSystemSyncher.Commons
{
using System;
using System.IO;
public abstract class FileProcessingStrategyBase
{
public event EventHandler OnFileProcessed;
public void ProcessFiles(FileInfo sourceFileInfo, FileInfo destinationFileInfo)
{
sourceFileInfo = sourceFileInfo ??
throw new ArgumentNullException(nameof(sourceFileInfo));
destinationFileInfo = destinationFileInfo ??
throw new ArgumentNullException(nameof(destinationFileInfo));
ProcessFilesInternal(sourceFileInfo, destinationFileInfo);
OnFileProcessed?.Invoke(this, EventArgs.Empty);
}
protected abstract void ProcessFilesInternal
(FileInfo sourceFileInfo, FileInfo destinationFileInfo);
}
}
Let’s look at this class:
- The class is declared as
abstract
and public
. Being abstract
, we ensure that no one can instantiate this class as it serves as a base class for future strategies, and we’re declaring this one as public
because we expect this class to be extended. Also, it is a good habit to name your abstract
classes with a suffix of Base
. - The
ProcessFiles
method is declared as public
as well because this is the public
API we will be using, and it works with two files. - Added an event called
OnFileProcessed
in case we want to add additional logic like showing progress when a file is processed. - Added a
protected
method called ProcessFilesInternal
, this is known as the Template Pattern which lets us control the workflow of the public
method and extending parts of it by using protected abstract
methods. In this case, we want the event to be called every time the ProcessFiles
method is called. - We added validation inside the
public
method because we want to ensure that all of our derivations have the same enforcement for null
objects in place.
And with this, we finished another step :).
Create a Strategy and Bring Our Workflow to Life
Now that we moved everything and have a base class, we can go onto creating different strategies for how to process the different files, and we will also include a progress display.
Let’s create a public sealed class
called FileCopyStrategy
in the FileSystemSyncher.Commons
project and inherit from the FileProcessingStrategyBase
class. We could move the lines that actually copy a file into the implementation of this class.
The class will look like this:
namespace FileSystemSyncher.Commons
{
using System;
using System.IO;
public sealed class FileCopyStrategy : FileProcessingStrategyBase
{
protected override void ProcessFilesInternal
(FileInfo sourceFileInfo, FileInfo destinationFileInfo)
{
if (string.IsNullOrWhiteSpace(destinationFileInfo.DirectoryName))
{
throw new ArgumentException("The destination file
must be inside a directory", nameof(destinationFileInfo.DirectoryName));
}
Directory.CreateDirectory(destinationFileInfo.DirectoryName);
File.Copy(sourceFileInfo.FullName, destinationFileInfo.FullName, true);
}
}
}
As we see here, we added some basic validation for the parameters input (this strategy requires an additional validation besides the ones in the base class) and then we just copy the files as we did before. Though notice that we still do not know which files will be copies as new or updating an existing file. For this, we will create another strategy that will report which files will be copied.
Let’s create a new class called CopyReportStrategy
in the same project, and that one will look like this:
namespace FileSystemSyncher.Commons
{
using System;
using System.IO;
public sealed class CopyReportStrategy : FileProcessingStrategyBase
{
protected override void ProcessFilesInternal
(FileInfo sourceFileInfo, FileInfo destinationFileInfo)
{
if (!destinationFileInfo.Exists)
{
Console.WriteLine($"{destinationFileInfo.FullName} [+](new)");
}
else
{
Console.WriteLine($"{destinationFileInfo.FullName} [!](update)");
}
}
}
}
All this strategy does is to report which file will be replaced and which will be copied.
With these classes created, now we want to update our program to pass these strategies in from the applications, for this, we will use an array of strategies.
namespace FileSystemSyncher.Console
{
using System;
using Commons;
public static class Program
{
public static void Main()
{
try
{
IConfigurationProvider configurationProvider =
new ConfigFileConfigurationProvider();
long numberOfFileToUpdate = 0;
long numberOfFileUpdated = 0;
CopyReportStrategy copyReportStrategy = new CopyReportStrategy();
copyReportStrategy.OnFileProcessed +=
(sender, args) => numberOfFileToUpdate++;
FileCopyStrategy fileCopyStrategy = new FileCopyStrategy();
fileCopyStrategy.OnFileProcessed += (sender, args) =>
{
numberOfFileUpdated++;
Console.WriteLine(
$"[{numberOfFileToUpdate / numberOfFileToUpdate}]
({(numberOfFileUpdated / (double)numberOfFileToUpdate):P})");
};
FileProcessingStrategyBase[] strategies =
{ copyReportStrategy, fileCopyStrategy };
FileSystemProcessor fileSystemProcessor =
new FileSystemProcessor(configurationProvider, strategies);
fileSystemProcessor.Run();
}
catch (Exception e)
{
Console.WriteLine(e);
}
Console.ReadLine();
}
}
}
Notice that we are creating the strategies and attaching event handlers so that we can keep a count of how many files will be updated and how many have been updated;
Because now we create an array of strategies, notice that we pass it into the FileSystemProcessor
. That means we have to update that file as well, we will update the constructor to accept a collection of strategies, and we will update the Run
method as well to use that collection.
namespace FileSystemSyncher.Commons
{
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
public sealed class FileSystemProcessor
{
private readonly IEnumerable‹FileProcessingStrategyBase› _strategies;
private readonly IConfigurationProvider _configurationProvider;
public FileSystemProcessor(IConfigurationProvider configurationProvider,
IEnumerable‹FileProcessingStrategyBase› strategies)
{
_configurationProvider = configurationProvider
?? throw new ArgumentNullException
(nameof(configurationProvider));
_strategies = strategies ?? Enumerable.Empty‹FileProcessingStrategyBase›();
}
public void Run()
{
ConfigurationOptions configurationOptions =
_configurationProvider.GetOptions();
FileSystemEnumerator fileSystemEnumerator =
FileSystemEnumerator.CreateInstance();
foreach (FileProcessingStrategyBase fileProcessingStrategyBase in _strategies)
{
foreach (FileInfo sourceFile in
fileSystemEnumerator.EnumerateFilesBreadthFirst(
configurationOptions.SourceDirectory))
{
string destinationFilePath = sourceFile.FullName.Replace(
configurationOptions.SourceDirectory.FullName,
configurationOptions.DestinationDirectory.FullName);
FileInfo destinationFile = new FileInfo(destinationFilePath);
if (!destinationFile.Exists ||
sourceFile.Length != destinationFile.Length
|| sourceFile.LastWriteTime != destinationFile.LastWriteTime)
{
fileProcessingStrategyBase.ProcessFiles(sourceFile, destinationFile);
}
}
}
}
}
}
This is known as the Strategy Pattern, appropriately right? This will allow us to build workflows as complex as we required.
Also note that we are enumerating the files for every strategy, even though this might not be as efficient. This makes it so that every strategy is working on the actual files. What I mean by this is, imagine you have a strategy that renames or deletes a file, then the following strategies might not work as intended due to the changes made by the previous strategy, or more concretely in our case, the progress would be always at 100% because once we find a file that needs to be updated or copied, we already copy or update it, so the progress would work properly. And this is it for this step.
Add Filters
Finally at the last step - when I came up with the idea for this application, I wanted a way to copy my source code into a backup storage and to keep it in sync. Due to this, I didn’t want to copy out the binary files that were compiled, neither did I want to miss some third-party DLLs that were also in the binary folder, as such the idea was to have a whitelist and blacklist filter, fortunately for us, we have almost everything in place to do that. Let’s start.
First off, we need to update the ConfigurationOptions file to include a whitelist and blacklist of string
s which will represent the files, extensions, and folders that we want to keep and ignore.
namespace FileSystemSyncher.Commons
{
using System.Collections.Generic;
using System.Configuration;
using System.IO;
using System.Linq;
public sealed class ConfigurationOptions
{
public ConfigurationOptions(string sourceDirectory, string destinationDirectory)
{
string sourceFullPath = Path.GetFullPath(sourceDirectory).TrimEnd('\\');
SourceDirectory = new DirectoryInfo(sourceFullPath);
if (!SourceDirectory.Exists)
{
throw new ConfigurationErrorsException
("The folder in the source path does not exist");
}
string destinationPath = destinationDirectory;
if (!string.IsNullOrWhiteSpace(destinationDirectory))
{
destinationPath = destinationPath.TrimEnd('\\');
}
DestinationDirectory = new DirectoryInfo(destinationPath);
Whitelist = Enumerable.Empty‹string›();
BlackList = Enumerable.Empty‹string›();
}
public DirectoryInfo SourceDirectory { get; }
public DirectoryInfo DestinationDirectory { get; }
public IEnumerable‹string› Whitelist { get; set; }
public IEnumerable‹string› BlackList { get; set; }
}
}
Since the filters are not mandatory, they are only initialized inside of the constructor.
Next, we want to pass in our lists in this case from the config file, I will use a semi-colon as a separator.
namespace FileSystemSyncher.Commons
{
using System;
using System.Configuration;
public sealed class ConfigFileConfigurationProvider : IConfigurationProvider
{
public ConfigurationOptions GetOptions()
{
string sourcePath = ConfigurationManager.AppSettings["SourcePath"];
string destinationPath = ConfigurationManager.AppSettings["DestinationPath"];
ConfigurationOptions configurationOptions =
new ConfigurationOptions(sourcePath, destinationPath);
configurationOptions.Whitelist = ConfigurationManager.AppSettings["Whitelist"]?
.Split(new[] { ";" }, StringSplitOptions.RemoveEmptyEntries);
configurationOptions.BlackList = ConfigurationManager.AppSettings["BlackList"]?
.Split(new[] { ";" }, StringSplitOptions.RemoveEmptyEntries);
return configurationOptions;
}
}
}
Next, we have to update our enumerator (this is why it’s good to have your own enumerator in this case):
namespace FileSystemSyncher.Commons
{
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
internal sealed class FileSystemEnumerator
{
private readonly IEnumerable‹string› _whitelist;
private readonly IEnumerable‹string› _blackList;
private FileSystemEnumerator
(IEnumerable‹string› whitelist, IEnumerable‹string› blackList)
{
_whitelist = whitelist;
_blackList = blackList;
}
public static FileSystemEnumerator CreateInstance
(IEnumerable‹string› whitelist, IEnumerable‹string› blackList)
{
whitelist = whitelist ?? Enumerable.Empty‹string›();
blackList = blackList ?? Enumerable.Empty‹string›();
return new FileSystemEnumerator(whitelist, blackList);
}
internal IEnumerable‹FileInfo›
EnumerateFilesBreadthFirst(DirectoryInfo directoryInfo)
{
foreach (FileInfo file in directoryInfo.EnumerateFiles())
{
if (_whitelist.Any(path => IsFileOrExtensionPresent(path, file)))
{
yield return file;
}
if (_blackList.Any(path => IsFileOrExtensionPresent(path, file)))
{
continue;
}
yield return file;
}
foreach (DirectoryInfo subDir in directoryInfo.EnumerateDirectories()
.Where(info => !info.Attributes.HasFlag(FileAttributes.System)))
{
if (ShouldEnumerateDirectory(subDir))
{
foreach (FileInfo file in EnumerateFilesBreadthFirst(subDir))
{
yield return file;
}
}
}
}
private bool IsFileOrExtensionPresent(string path, FileInfo file)
{
if (path.Equals(file.Name, StringComparison.InvariantCultureIgnoreCase))
{
return true;
}
if (path.Equals(file.FullName, StringComparison.InvariantCultureIgnoreCase))
{
return true;
}
if (path.Equals(file.Extension, StringComparison.InvariantCultureIgnoreCase))
{
return true;
}
return false;
}
private bool ShouldEnumerateDirectory(DirectoryInfo subDir)
{
bool shouldEnumerate = true;
if (_blackList.Any(path => path.Equals
(subDir.Name, StringComparison.InvariantCultureIgnoreCase) ||
path.Equals(subDir.FullName, StringComparison.InvariantCultureIgnoreCase)))
{
shouldEnumerate = false;
}
if (_whitelist.Any(path => path.ToLower().StartsWith
(subDir.FullName, StringComparison.InvariantCultureIgnoreCase)))
{
shouldEnumerate = true;
}
return shouldEnumerate;
}
}
}
Things to note in this change:
- This is why it’s good to have a factory method for creating the instance of the enumerator, since the filters are not mandatory, the enumerator doesn’t have to validate for their presence, as such we can check or initialize the lists before passing them to the constructor and we avoid having to throw an exception in this case.
- Since we’re using the same method recursively, we only care about whitelisting the files, so if a file is whitelisted, then we return it, if it’s blacklisted we don’t, if we have the same file in whitelist and blacklist, then the whitelist takes priority, and if no filter is defined, then just return the file as it is, the same goes for file extensions.
- For folders, we need to check that none of the whitelisted paths are present in the collection start with the current directory, like for example if we blacklist the ‘foo’ folder but we whitelist ‘foo/bar.dll’, we still want to enumerate into the folder to reach that file. As you can see in the method
ShouldEnumerateDirectory
, we are using a local boolean variable so that if a folder is blacklisted like the binaries, but a file is specified as under that path in the whitelist, then we overwrite the variable and mark it as a valid folder to traverse. - Also note a small performance boost by using the LINQ
Any()
method. Initially, I was going to use the collection Contains
method, but that causes the lists to be enumerated 3 times for each check. By using the Any()
method, we can compare the same list object with all 3 of the file properties (its name, its full path, and its extension).
And those are all the changes we need to implement a filtering system.
One thing to note of importance here, if we blacklist the binary folder, but we still whitelist an object inside that folder, unless we use extension blacklisting for the other files, we will process the whole folder. In my testing, I wanted to copy a specific DLL and ended up copying the whole folder, how I handled that was by simply adding all the extensions in that folder to the blacklist. Here is an example of how my whitelist and blacklist configuration looks like:
<add key="Whitelist" value="D:\!work\GitHub\FileSystemSyncher\FileSystemSyncher\
FileSystemSyncher.Console\bin\Debug\FileSystemSyncher.Commons.dll"/>
<add key="Blacklist"
value="bin;_ReSharper.Caches;obj;.git;.vs;.user;.dll;.exe;.config;.pdb"/>
As you can see, even though I blacklisted the bin folder, I still needed to add the .dll, .exe, .config and .pdb file extensions because I bypassed the bin filter with my concrete whitelist item. With this, I managed to copy only the source code, solution, and projects + the whitelisted DLL and nothing else.
Conclusion
In this chapter, we went through a number of refactors to clean up our application and to make it more extendable. From here, we can easily extend our application and here are a couple of ideas:
- Use a whitelist and blacklist file so that we don’t write it all down in the config file, or pass it in from a user interface.
- Provide a backup strategy for when files get overwritten.
- Extract the comparison
if
condition and create a concrete class that maybe will compare files only by date, or by hashes, or even byte by byte but that may be a little overkill, depends and what we want to do. - Create a strategy that does throttling, maybe we don’t want to sync everything constantly, maybe due to network traffic or just because it affects machine performance.
- We might hook up a
FileSystemWatcher
that after the initial sync, monitors what files have been modified and only synchronizes those files. - We might use this in a Web Service and run a backup on demand.
As always, the latest code can be found here, and the changes for this post specifically can be found here.
CodeProject