If you have been following my trials with working with C# and files, you know it hasn’t been a bed of roses. I ran into a roadblock when I was trying to build a high performance file copy tool in the CLR. I eventually found a solution. It works but it isn’t very clean. So, I did a little more work, removed a broken bit and simplified another bit to make it a little more stable. I’m not done yet. I know there are a couple of places I can clean the code up a bit more and a way to speed things up a bit more at the cost of using more memory, but that is for another blog post.
Logging, to the Rescue
The first major change was adding a logging facility to the UBCopy program via log4net. I had stayed away from adding it fearing too much overhead, but after some tweaking, it seems pretty darn fast. This allowed me to do some tight timings and log debug information to a log file to track long running copies without much fuss. It also allows me to turn off all console messaging since this is used in an automated fashion. If something fails, I’ve got the log file to look at.
Please Don’t Lock My File
There are a few places where the file target file being written to could potentially be locked. Right now, I have to create the file and set the file length, then close it before opening it for unbuffered writes. I haven’t done anything about this grab yet. The second one is when I am writing the last block to the file. Since you have to write in page aligned multiples I was closing the file, reopening it in a buffered mode and flushing the last partial buffer to disk. I haven’t fixed these up yet but I am working on a solution. I haven’t encountered a problem yet where the file gets locked during this process but it is still a potential threat that must be dealt with. Even though the file stream is seekable and writeable you can’t issue a setlength command on it. I plan on looking deeper into the filestream code and see what they are doing when you issue a setlength
. If I can eliminate the open, close and open routine, it will speed things up a bit and keep anyone from grabbing my file handle.
Here’s an example of what I’m talking about, after the set file size is done, there is a very small window of opportunity here.
try
{
_outfile = new FileStream(_outputfile, FileMode.Create, FileAccess.Write, FileShare.None, 8,
FileOptions.WriteThrough);
_outfile.SetLength(_infilesize > CopyBufferSize ? _infilesize : CopyBufferSize);
_outfile.Close();
_outfile.Dispose();
}
catch (Exception e)
{
throw;
}
try
{
_outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
FileOptions.WriteThrough | FileFlagNoBuffering);
}
catch (Exception e)
{
throw;
}
Unbuffered, Mostly
Since unbuffered writes prevent you from writing data that isn’t memory page aligned (usually 4096), you get an error when writing to the file. To get around this, all writes are the same size period, this really only effects the last write which may have a partially full buffer. It will write down the full buffer, including invalid data. Luckily, an easy fix is to reset the end of file pointer back to the correct location and everything is happy again. This allows me to eliminate an additional bit of code. Here is the old code.
outfile = new FileStream(outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
FileOptions.WriteThrough);
outfile.Seek(infilesize - bytesRead1, 0);
outfile.Write(Buffer1, 0, bytesRead1);
And now the new, faster and better way!
Log.Debug("Close Write File Unbuffered");
_outfile.Close();
_outfile.Dispose();
try
{
if (IsDebugEnabled)
{
Log.Debug("Open File Set Length");
}
_outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
FileOptions.WriteThrough);
_outfile.SetLength(_infilesize);
_outfile.Close();
_outfile.Dispose();
}
catch (Exception e)
{
if (IsDebugEnabled)
{
Log.Debug("Failed to open for write set length");
Log.Debug(e);
}
throw;
}
Buffered or Unbuffered
If the file was smaller than the size of the buffer, I was just copying the file buffered. This was a check and route to a different code path for copying files. Not a huge deal but just another place for problems to creep up over time. So, I removed it when I changed the way unbuffered multi-threaded writes were happening I was able to get rid of the buffered copy routine all together. Now, it will initially set the file size to a single buffer. Write the data and reset the file pointer to the original size.
You Want a MOVE but Get a COPY
There was also a bug that on small files UBCopy
wouldn’t do a move just a copy in some cases. There was an error in the code that prevented the delete of the source file after the copy to the new location due to a lock that was being held by the program. Not the end of the world, just wasn’t working like it should 100% of the time.
So, here is the new AsyncUnbuffCopy
routine. Calling it is super easy. If you want to use the code as-is, you will need to download log4net and include it in your project.
using System;
using System.IO;
using System.Security.Cryptography;
using System.Text;
using System.Threading;
using log4net;
namespace UBCopy
{
internal class AsyncUnbuffCopy
{
private static readonly ILog Log = LogManager.GetLogger(typeof(AsyncUnbuffCopy));
private static readonly bool IsDebugEnabled = Log.IsDebugEnabled;
private static string _inputfile;
private static string _outputfile;
private static string _infilechecksum;
private static string _outfilechecksum;
private static bool _reportprogress;
private static int _origRow;
private static int _origCol;
private static int _numchunks;
private static bool _readfailed;
private static readonly object Locker1 = new object();
public static int CopyBufferSize;
private static long _infilesize;
public static byte[] Buffer1;
private static int _bytesRead1;
public static byte[] Buffer2;
private static bool _buffer2Dirty;
private static int _bytesRead2;
public static byte[] Buffer3;
private static long _totalbytesread;
private static long _totalbyteswritten;
private static FileStream _infile;
private static FileStream _outfile;
const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;
private static void AsyncReadFile()
{
try
{
_infile = new FileStream(_inputfile, FileMode.Open,
FileAccess.Read, FileShare.None, CopyBufferSize, FileFlagNoBuffering);
}
catch (Exception e)
{
if (IsDebugEnabled)
{
Log.Debug("Failed to open for read");
Log.Debug(e);
}
throw;
}
while (_totalbytesread < _infilesize)
{
if (IsDebugEnabled)
{
Log.Debug("Read _buffer2Dirty : " + _buffer2Dirty);
}
_bytesRead1 = _infile.Read(Buffer1, 0, CopyBufferSize);
Monitor.Enter(Locker1);
try
{
while (_buffer2Dirty) Monitor.Wait(Locker1);
Buffer.BlockCopy(Buffer1, 0, Buffer2, 0, _bytesRead1);
_buffer2Dirty = true;
_bytesRead2 = _bytesRead1;
_totalbytesread = _totalbytesread + _bytesRead1;
Monitor.PulseAll(Locker1);
if (IsDebugEnabled)
{
Log.Debug("Read : " + _totalbytesread);
}
}
catch (Exception e)
{
Log.Fatal("Read Failed.");
Log.Fatal(e);
_readfailed = true;
throw;
}
finally { Monitor.Exit(Locker1); }
}
_infile.Close();
_infile.Dispose();
}
private static void AsyncWriteFile()
{
try
{
if (IsDebugEnabled)
{
Log.Debug("Open File Set Length");
}
_outfile = new FileStream(_outputfile, FileMode.Create,
FileAccess.Write, FileShare.None, 8,
FileOptions.WriteThrough);
_outfile.SetLength(_infilesize > CopyBufferSize ? _infilesize : CopyBufferSize);
_outfile.Close();
_outfile.Dispose();
}
catch (Exception e)
{
Log.Fatal("Failed to open for write set length");
Log.Fatal(e);
throw;
}
try
{
if (IsDebugEnabled)
{
Log.Debug("Open File Write Unbuffered");
}
_outfile = new FileStream(_outputfile, FileMode.Open,
FileAccess.Write, FileShare.None, 8,
FileOptions.WriteThrough | FileFlagNoBuffering);
}
catch (Exception e)
{
Log.Fatal("Failed to open for write unbuffered");
Log.Fatal(e);
throw;
}
var pctinc = 0.0;
var progress = pctinc;
if (_reportprogress)
{
if (IsDebugEnabled)
{
Log.Debug("Report Progress : True");
}
pctinc = 100.00 / _numchunks;
}
if (IsDebugEnabled)
{
Log.Debug("While Write _totalbyteswritten : " + _totalbyteswritten);
Log.Debug("While Write _infilesize - CopyBufferSize: " +
(_infilesize - CopyBufferSize));
}
while ((_totalbyteswritten < _infilesize) && !_readfailed)
{
if (IsDebugEnabled)
{
Log.Debug("Write Unbuffered _buffer2Dirty : " + _buffer2Dirty);
}
lock (Locker1)
{
if (IsDebugEnabled)
{
Log.Debug("Write Unbuffered Lock");
}
while (!_buffer2Dirty) Monitor.Wait(Locker1);
if (IsDebugEnabled)
{
Log.Debug("Write Unbuffered _buffer2Dirty : " + _buffer2Dirty);
}
Buffer.BlockCopy(Buffer2, 0, Buffer3, 0, _bytesRead2);
_buffer2Dirty = false;
if (IsDebugEnabled)
{
Log.Debug("Write Unbuffered _buffer2Dirty : " + _buffer2Dirty);
}
_totalbyteswritten = _totalbyteswritten + CopyBufferSize;
if (IsDebugEnabled)
{
Log.Debug("Written Unbuffered : " + _totalbyteswritten);
}
Monitor.PulseAll(Locker1);
if (_reportprogress && !IsDebugEnabled)
{
Console.SetCursorPosition(_origCol, _origRow);
if (progress < 101 - pctinc)
{
progress = progress + pctinc;
Console.Write("%{0}", Math.Round(progress, 0));
}
}
}
try
{
_outfile.Write(Buffer3, 0, CopyBufferSize);
}
catch (Exception e)
{
Log.Fatal("Write Unbuffered Failed");
Log.Fatal(e);
throw;
}
}
Log.Debug("Close Write File Unbuffered");
_outfile.Close();
_outfile.Dispose();
try
{
if (IsDebugEnabled)
{
Log.Debug("Open File Set Length");
}
_outfile = new FileStream(_outputfile, FileMode.Open,
FileAccess.Write, FileShare.None, 8,
FileOptions.WriteThrough);
_outfile.SetLength(_infilesize);
_outfile.Close();
_outfile.Dispose();
}
catch (Exception e)
{
if (IsDebugEnabled)
{
Log.Debug("Failed to open for write set length");
Log.Debug(e);
}
throw;
}
}
public static int AsyncCopyFileUnbuffered(string inputfile,
string outputfile, bool overwrite, bool movefile, bool checksum,
int buffersize, bool reportprogress)
{
if (IsDebugEnabled)
{
Log.Debug("inputfile : " + inputfile);
Log.Debug("outputfile : " + outputfile);
Log.Debug("overwrite : " + overwrite);
Log.Debug("movefile : " + movefile);
Log.Debug("checksum : " + checksum);
Log.Debug("buffersize : " + buffersize);
Log.Debug("reportprogress : " + reportprogress);
}
_reportprogress = reportprogress;
_inputfile = inputfile;
_outputfile = outputfile;
CopyBufferSize = buffersize * 1024 * 1024;
Buffer1 = new byte[CopyBufferSize];
Buffer2 = new byte[CopyBufferSize];
Buffer3 = new byte[CopyBufferSize];
_totalbytesread = 0;
_totalbyteswritten = 0;
_bytesRead1 = 0;
_buffer2Dirty = false;
if (File.Exists(outputfile) && !overwrite)
{
if (IsDebugEnabled)
{
Log.Debug("Destination File Exists!");
}
Console.WriteLine("Destination File Exists!");
return 0;
}
if (!Directory.Exists(outputfile))
{
try
{
Directory.CreateDirectory(Path.GetDirectoryName(outputfile));
}
catch (Exception e)
{
Log.Fatal("Create Directory Failed.");
Log.Fatal(e);
Console.WriteLine("Create Directory Failed.");
Console.WriteLine(e.Message);
throw;
}
}
var inputFileInfo = new FileInfo(_inputfile);
_infilesize = inputFileInfo.Length;
_numchunks = (int)((_infilesize / CopyBufferSize) <= 0 ?
(_infilesize / CopyBufferSize) : 1);
if (IsDebugEnabled)
{
Log.Debug("File Copy Started");
}
Console.WriteLine("File Copy Started");
var readfile = new Thread(AsyncReadFile)
{ Name = "ReadThread", IsBackground = true };
readfile.Start();
if (IsDebugEnabled)
{
Log.Debug("Number of Chunks: " + _numchunks);
}
var writefile = new Thread(AsyncWriteFile)
{ Name = "WriteThread", IsBackground = true };
writefile.Start();
if (_reportprogress)
{
_origRow = Console.CursorTop;
_origCol = Console.CursorLeft;
}
readfile.Join();
writefile.Join();
if (_reportprogress)
Console.WriteLine();
if (IsDebugEnabled)
{
Log.Debug("File Copy Done");
}
Console.WriteLine("File Copy Done");
if (checksum)
{
if (IsDebugEnabled)
{
Log.Debug("Checksum Source File Started");
}
Console.WriteLine("Checksum Source File Started");
var checksumreadfile = new Thread(GetMD5HashFromInputFile) {
Name = "checksumreadfile", IsBackground = true };
checksumreadfile.Start();
if (IsDebugEnabled)
{
Log.Debug("Checksum Destination File Started");
}
Console.WriteLine("Checksum Destination File Started");
var checksumwritefile = new Thread(GetMD5HashFromOutputFile) {
Name = "checksumwritefile", IsBackground = true };
checksumwritefile.Start();
checksumreadfile.Join();
checksumwritefile.Join();
if (_infilechecksum.Equals(_outfilechecksum))
{
if (IsDebugEnabled)
{
Log.Debug("Checksum Verified");
}
Console.WriteLine("Checksum Verified");
}
else
{
if (IsDebugEnabled)
{
Log.Debug("Checksum Failed");
Log.DebugFormat("Input File Checksum : {0}", _infilechecksum);
Log.DebugFormat("Output File Checksum: {0}", _outfilechecksum);
}
Console.WriteLine("Checksum Failed");
Console.WriteLine("Input File Checksum : {0}", _infilechecksum);
Console.WriteLine("Output File Checksum: {0}", _outfilechecksum);
}
}
if (movefile && File.Exists(inputfile) && File.Exists(outputfile))
try
{
File.Delete(inputfile);
}
catch (IOException ioex)
{
if (IsDebugEnabled)
{
Log.Error("File in use or locked cannot move file.");
Log.Error(ioex);
}
Console.WriteLine("File in use or locked");
Console.WriteLine(ioex.Message);
}
catch (Exception ex)
{
if (IsDebugEnabled)
{
Log.Error("File Failed to Delete");
Log.Error(ex);
}
Console.WriteLine("File Failed to Delete");
Console.WriteLine(ex.Message);
}
return 1;
}
public static void GetMD5HashFromInputFile()
{
var fs = new FileStream(_inputfile, FileMode.Open,
FileAccess.Read, FileShare.None, CopyBufferSize);
MD5 md5 = new MD5CryptoServiceProvider();
byte[] retVal = md5.ComputeHash(fs);
fs.Close();
var sb = new StringBuilder();
for (var i = 0; i < retVal.Length; i++)
{
sb.Append(retVal[i].ToString("x2"));
}
_infilechecksum = sb.ToString();
}
public static void GetMD5HashFromOutputFile()
{
var fs = new FileStream(_outputfile, FileMode.Open,
FileAccess.Read, FileShare.None, CopyBufferSize);
MD5 md5 = new MD5CryptoServiceProvider();
byte[] retVal = md5.ComputeHash(fs);
fs.Close();
var sb = new StringBuilder();
for (var i = 0; i < retVal.Length; i++)
{
sb.Append(retVal[i].ToString("x2"));
}
_outfilechecksum = sb.ToString();
}
}
}