Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Downloading Multiple Files over HTTP Connection

3.75/5 (7 votes)
25 Mar 2009CPOL3 min read 119.3K   5.5K  
An application that can download the files, as listed on an HTML page, over an HTTP connection.

Introduction

My goal is to download multiple files from a directory listed using HTML (see the directory index example in the figure below) over an HTTP connection. I am not able to find an application that can perform both:

  1. Parse the HTML page for all the files of interest, and 
  2. Download the files via HTTP

So I started to write this C# application.

FileIndexHtml.JPG

Reference

I found a lot of helpful tutorials on-line that helped me with the implementation for this application. For more information, refer to articles "Fetching Web Pages with HTTP" by Joe Mayo, and "Creating a download manager in C#" by Andrew Pociu.

How It Works

The download demo application download files form the target host to a local directory using the HTTP protocol. It is a very simple application that has one Window form as shown in the following screen shot.

DownloadMgmr001.JPG

The two main methods that provide two functionalities as described in the background section are:

  • GetHTTPContent – Method to get (download) the HTML page content
  • ParseFileNamesFromWebPage – Method that parses the HTML page and returns a list of filenames
  • DownloadFile – Method that downloads a single file over HTTP

For the download demo application, my goal is to download files over an HTTP connection. The application uses the System.Net.WebRequest and System.Net.HttpWebRequest classes to interact with the HTTP server, such as request for data and retrieve response from the server. Note that the WebRequest class supports a variety of uniform resource identifier (URI) requests, including HTTP, HTTPS, and file scheme identifiers. The user may be able to modify the download demo code for the other scheme identifiers.

The GetHTTPContent method uses the WebRequest.Create method to request access to the URI, in this example the URI is a Web page on my internal server (i.e. MyWebServer), example below:

C#
HttpWebRequest request = 
	(HttpWebRequest)WebRequest.Create("http://MyWebServer:8080/MyFolder/");

The response for the WebRequest is casted to an HttpWebRequest reference, thus the communication will be over the HTTP protocol. The method then captures the data response to the Internet request via WebRequest.GetResponse. The response is read in as a data stream into a data buffer and writes to a string object. Below is the completed code:

C#
// used on each read operation
byte[] buffer = new byte[8192];
string tempString = null;
int    count = 0;

// Create the WebRequest Instance
HttpWebRequest request = 
	(HttpWebRequest)WebRequest.Create("http://MyWebServer:8080/MyFolder/");
// Query for the response
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// Response captured in data stream
Stream responseStream = response.GetResponseStream();

do
{
    // Read the response stream
    count = responseStream.Read(buffer, 0, buffer.Length);

    if (count != 0)
    {
        // Convert from bytes to ASCII text
        tempString = Encoding.ASCII.GetString(buffer, 0, count);
        webPageString.Append(tempString);
    }
}
while (count > 0);

The ParseFileNamesFromWebPage method extracts all the filenames form the web page using a known token that is: NAME="filename.extension". Below is the code demonstration:

C#
// Get the index of the first found token.
int tokenIndex = webPageContent.IndexOf(knownToken);
webPageContent = webPageContent.Remove(0, tokenIndex + knownToken.Length);

// Parse the file to get all the file names from the file.
while (webPageContent.Length > 0 && tokenIndex > 0)
{
    String fileName = webPageContent.Substring(0, webPageContent.IndexOf("\""));
    fileNames.Add(fileName);

    // Find the next token.
    tokenIndex = webPageContent.IndexOf(knownToken);
    webPageContent = webPageContent.Remove(0, tokenIndex + knownToken.Length);
}

Finally, download all the files via DownloadFile method by using the file names retrieved from ParseFileNamesFromWebPage method. The DownloadFile method is similar to GetHTTPContent method, but this time the code uses both System.Net.WebClient and System.Net.WebRequest classes together.

The WebClient class is an encapsulated class and easier to use, but the WebRequest class provides additional information I require to perform the download, and update the download progress control. In this download demo application, I have used the WebRequest to get the size of the file and a WebClient to download the file using a stream. The user may decide to use the WebRequest.GetResponseStream method instead, and either way would work.

C#
// Open the URL for download 
WebClient wcDownload = new WebClient() 
streamResponse = wcDownload.OpenRead(downloadFileName);

// Loop through the buffer until the buffer is empty
while ((bytesSize = streamResponse.Read(
    downBuffer, 0, downBuffer.Length)) > 0)
{
    // Write the data from the buffer to the local hard drive
    fileStream.Write(downBuffer, 0, bytesSize);
    totalSize += bytesSize;
}

if (streamResponse != null)
{
    // When the above code has ended, close the streams
    streamResponse.Close();
}

Note that this download demo application bundles the entire download step into a separate thread to avoid the user interface being frozen while the download is in progress. Thread-safe calls to Windows forms controls must be used to ensure the download thread can safely access and update the Windows control for status update. This topic is out of the scope of this article but details can be found within the download demo application package.

The download demo application gives users the capability to exclude files from download. The application currently has the limitation of only accepting one exclusion string and one inclusion string, as shown in the following screen shots:

DownloadMgmr002.JPG

DownloadMgmr003.JPG

Note

This download demo application can only download the files within a directory. If you have more than one directory that needs to be downloaded, you need to modify the source code to download files from sub-directories. You can also run multiple instances of this application to get files from sub-directories as well.

History

  • March 2009 - First release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)