Introduction
I develop and maintain a Media Automation system for a satellite broadcaster. I often need to transfer files about 5GB and larger from one location to another. I used File.Copy
or File.Move
making the typical assumption that Microsoft .NET had already written highly efficient code for me. I was wrong! I typically achieved transfer speeds of about 60GB/hour over a Gigabit network connection depending on other traffic and disk activity. I thought I was doing reasonably well. I wasn't actually waiting for these files while they were going through various stages of processing so I didn't really care.
Then I had to do a mass conversion of a 50TB data store of 15,000 files. I thought 1000 hours! That would take 40 days! Well, it was a lot of data so I better get started. It was a two step process where I sent the file to a server where they were processed. An outside vendor provided a program which handled the processing and sent them back to the data store. I was only achieving 25GB/hour so I was beginning to get worried. Then I discovered they were only achieving 5GB/hour writing back. Well, I raised a stink quick. After a lot of wrangling, they changed some settings and all of a sudden they were achieving 160GB/hour!
Well, now I was flabbergasted. How did they achieve more than I was? I wiped the dust off of 45 years of software development and began a deeper consideration of system architecture and software processing. I did a quick search on CodeProject and MSDN without finding anything. The first thing that came to mind was using larger buffer sizes in file read and write to reduce overhead.
Wow! My first test was written in a few minutes and achieved 250GB/hour. Now that was more like it.
I need to explain the production environment so you have a better understanding of the throughput numbers. My numbers may not duplicate on your system. My network source for the files was a Harmonic MediaGrid
, which is a high density parallel network storage system. It is designed to serve multiple servers with high speed access. The destination was a Linux based media playout server. This system is not in production yet so there was little contention for resources. I was working on a dual quad core XEON Windows 2008 Server with 16GB RAM. The program performance used an average 25% CPU of all eight cores with this routine. The gigabit network card used an average 555Mb/sec for both simultaneous send and receive.
After I did some enhancements to my program, I retested the File.Move
throughput at a shockingly slow 25GB/hour. The 1M buffer size achieved 256GB/hour. A 512KB buffer was almost as fast. Any buffer size smaller than that has rapid throughput drop off.
I hope you enjoyed my little story and it brought a smile to your face. Here is the code:
public static void MoveTime (string source, string destination)
{
DateTime start_time = DateTime.Now;
FMove (source, destination);
long size = new FileInfo (destination).Length;
int milliseconds = 1 + (int) ((DateTime.Now - start_time).TotalMilliseconds);
long tsize = size * 3600000 / milliseconds;
tsize = tsize / (int) Math.Pow (2, 30);
Console.WriteLine (tsize + "GB/hour");
}
static void FMove (string source, string destination)
{
int array_length = (int) Math.Pow (2, 19);
byte[] dataArray = new byte[array_length];
using (FileStream fsread = new FileStream
(source, FileMode.Open, FileAccess.Read, FileShare.None, array_length))
{
using (BinaryReader bwread = new BinaryReader (fsread))
{
using (FileStream fswrite = new FileStream
(destination, FileMode.Create, FileAccess.Write, FileShare.None, array_length))
{
using (BinaryWriter bwwrite = new BinaryWriter (fswrite))
{
for (; ; )
{
int read = bwread.Read (dataArray, 0, array_length);
if (0 == read)
break;
bwwrite.Write (dataArray, 0, read);
}
}
}
}
}
File.Delete (source);
}