Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Multi-threaded file download manager

0.00/5 (No votes)
5 Jun 2006 1  
A fully working multi-threaded file downloader application.

Introduction

A few months ago, my lovely wife was downloading her lecture notes from her university website, and I noticed that she had to manually click on every file to save it to the hard disk.

I also noticed that all the hyperlinks were on the same page, and the documents she was downloading were either Word documents or PowerPoint presentations.

Right then, a little light bulb went on in my head, and I decided to build her a file download utility. The requirements were simple:

  • I should be able to point to a web page and filter URLs on it. For example, "*.doc" should give me a collection of URLs that have .doc at the end of the link.
  • From the list of available files, I should be able to select the files I want to download.
  • I should be able to download my selected files simultaneously.
  • I should be able to nominate the number of simultaneous threads for download.
  • I should be able to cancel a download at any point in time.
  • I should be informed of the download status of each selected file.
  • I do not want to re-download files that I have downloaded before.

Using the code

The code is divided into three major sections.

  • FileDownloader.cs - accepts a URL and starts downloading it.
  • WebPageInterrogater.cs - accepts a filter string and a URL, and returns a list of hyperlinks that match the filter.
  • Main.cs - contains the UI code.

Let's have a look at the WebPageInterrogater class.

The regex expression below will find all HREFs in a web page:

const string _findAllHrefsPattern = "(?<HTML><a[^>]" + 
      "*href\\s*=\\s*[\\\"\\']?(?<HRef>[^\"'>\\s]*)" + 
      "[\\\"\\']?[^>]*>(?<Title>[^<]+|.*?)?</a\\s*>)";

The WebPageInterrogater class constructor takes in a string of filter expressions (for example, *.doc;*.ppt) and creates a regex expression from it.

/// <summary>

/// Crawls the given url looking for hyperlinks

/// and extracts all hyperlinks that match the filter.

/// For example *.doc will return hyperlinks for word documents.

/// </summary>

/// <param name="url"></param>

/// <param name="sFilters"></param>

public WebPageInterrogater(string url, string sFilters)
{
    _url = url;
    string[] filters = sFilters.Split(';');
    string pattern = string.Empty;
    for (int i = 0; i < filters.Length; i++ )
    {
        pattern = "\\" + filters[i].Replace("*", 
                  string.Empty) + "$" + "|";
        _filters += pattern;
    }
    _filters = _filters.Substring(0, _filters.Length-1);
}

ListFiles() will search for the requested patterns in the target web page and return a collection of matching URLs.

/// <summary>

/// Returns a collection of documents that are eligible to download.

/// </summary>

/// <returns>>/returns>

public StringCollection ListFiles()
{
    StringCollection sCol = new StringCollection();
    string webPage = GetWebPage();

    string ahref = string.Empty;
    string title = string.Empty;
    string value = string.Empty;
    string fileName = string.Empty;


    Regex regEx = new Regex(_findAllHrefsPattern, 
                  RegexOptions.Compiled | RegexOptions.IgnoreCase);
    Regex regEx2 = new Regex(_filters, 
                   RegexOptions.Compiled | RegexOptions.IgnoreCase);

    MatchCollection matches = regEx.Matches(webPage);
    foreach (Match match in matches)
    {
        int iCount = match.Groups.Count;
        ahref = match.Groups[0].Value;
        value = match.Groups[2].Value;
        title = match.Groups[3].Value;

        if (regEx2.IsMatch(value))
        {
            sCol.Add(TopLevelUrl + "/" + value);
        }
    }
    return sCol;
}

Now, we have a collection of URLs that we want to download. All we have to do is create an instance of the WebPageInterrogater class and pass in the requested URL and save location.

public FileDownloader(string documentUrl, string directory)
{
    _DocumentUrl = documentUrl;
    _DirectoryPath = directory;
}

Once the class is initialized, the download can begin. This method can be called asynchronously. It will raise the DownloadStarting and DownloadCompleted events when the file starts and stops downloading, respectively.

/// <summary>

/// Starts the download of the attached url into the given directory.

/// </summary>

public void StartDownload()
{
    if (_DocumentUrl.Equals(string.Empty))
    {
        throw new ArgumentException("Please supply a document url.");
    }
    if (_DirectoryPath.Equals(string.Empty))
    {
        throw new ArgumentException("Please supply a directory.");
    }
    _IsStarted = true;
    /* raise the download starting event. */
    DownloadStarting(this);
    _IsDownloading = true;
    _IsDownloadSuccessful = false;
    Stream stream = null;
    FileStream fstream = null;

    try
    {
        string destFileName = _DirectoryPath + "\\" + FileName;
        destFileName = destFileName.Replace("/", 
                          " ").Replace("%20", " ");

        if (File.Exists(destFileName) == false)
        {
            HttpWebRequest request = 
               (HttpWebRequest)WebRequest.Create(_DocumentUrl);
            HttpWebResponse response = 
               (HttpWebResponse)request.GetResponse();
            stream = response.GetResponseStream();

            byte[] inBuffer = ReadFully(stream, 32768);

            fstream = new FileStream(destFileName, 
                      FileMode.OpenOrCreate, FileAccess.Write);
            fstream.Write(inBuffer, 0, inBuffer.Length - 1);


            fstream.Close();
            stream.Close();
        }
        _IsDownloadSuccessful = true;
        _IsDownloading = false;
        /* raise a download completed event. */
        DownloadCompleted(this, _IsDownloadSuccessful);
    }
    catch
    {
        _IsDownloadSuccessful = false;
    }
    finally
    {
        if (fstream != null)
        {
            fstream.Close();
        }
        if (stream != null)
        {
            stream.Close();
        }
    }
}

Let's see how this all hangs together from the UI perspective. The Get Files button takes in the given URL and retrieves all files that match the Target Filter.

The available files are displayed in the listbox below. The user will then select the files that he/she wants to download and click the Download button. The files start downloading by spawning a new download thread until the maximum number of threads is used up. Any pending files are queued up until a thread finishes downloading and is available again.

The overall download progress is displayed in the progress bar, and each file's download status is displayed in the status column next to it. The entire download operation can be cancelled by clicking on the Cancel All button.

One last point. You will notice that the URL textbox "remembers" your last URL between application restarts.

This feature is dependent upon a persistence library that I wrote in this article. If you need persistence for any other textbox, then just put the text "persist" in the Tag property.

Points of Interest

I learnt a lot about multi-threaded UI programming in this article. It is not easy updating the main UI thread from multiple executing threads if you do not have planned multi-threaded access. This was especially painful when programming the progress bar, and appropriate code locks in the right places in code helped a great deal. I did not want to end up in a situation where unnecessary locks would slow down my code, so I erred on the side of caution, and improved the code incrementally until the UI was behaving consistently.

Revisions

Thank you everyone for your suggestions. I have taken your advice onboard, and have released a new version with the following additions/fixes:

  • Fixed the missing byte problem (tested).
  • Added proxy server authentication (un-tested). It would be great if readers with a proxy server could test that out please and post comments in the discussion section.
  • todo - allow FTP downloads (next version).

History

  • 21/May/2006 - Initial version.
  • 04/June/2006 - Revision 1.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here