Introduction
My goal is to download multiple files from a directory listed using HTML (see the directory index example in the figure below) over an HTTP connection. I am not able to find an application that can perform both:
- Parse the HTML page for all the files of interest, and
- Download the files via HTTP
So I started to write this C# application.
Reference
I found a lot of helpful tutorials on-line that helped me with the implementation for this application. For more information, refer to articles "Fetching Web Pages with HTTP" by Joe Mayo, and "Creating a download manager in C#" by Andrew Pociu.
How It Works
The download demo application download files form the target host to a local directory using the HTTP protocol. It is a very simple application that has one Window form as shown in the following screen shot.
The two main methods that provide two functionalities as described in the background section are:
GetHTTPContent
– Method to get (download) the HTML page contentParseFileNamesFromWebPage
– Method that parses the HTML page and returns a list of filenamesDownloadFile
– Method that downloads a single file over HTTP
For the download demo application, my goal is to download files over an HTTP connection. The application uses the System.Net.WebRequest
and System.Net.HttpWebRequest
classes to interact with the HTTP server, such as request for data and retrieve response from the server. Note that the WebRequest
class supports a variety of uniform resource identifier (URI) requests, including HTTP, HTTPS, and file scheme identifiers. The user may be able to modify the download demo code for the other scheme identifiers.
The GetHTTPContent
method uses the WebRequest.Create
method to request access to the URI, in this example the URI is a Web page on my internal server (i.e. MyWebServer
), example below:
HttpWebRequest request =
(HttpWebRequest)WebRequest.Create("http://MyWebServer:8080/MyFolder/");
The response for the WebRequest
is casted to an HttpWebRequest
reference, thus the communication will be over the HTTP protocol. The method then captures the data response to the Internet request via WebRequest.GetResponse
. The response is read in as a data stream into a data buffer and writes to a string
object. Below is the completed code:
byte[] buffer = new byte[8192];
string tempString = null;
int count = 0;
HttpWebRequest request =
(HttpWebRequest)WebRequest.Create("http://MyWebServer:8080/MyFolder/");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
do
{
count = responseStream.Read(buffer, 0, buffer.Length);
if (count != 0)
{
tempString = Encoding.ASCII.GetString(buffer, 0, count);
webPageString.Append(tempString);
}
}
while (count > 0);
The ParseFileNamesFromWebPage
method extracts all the filenames form the web page using a known token that is: NAME="filename.extension"
. Below is the code demonstration:
int tokenIndex = webPageContent.IndexOf(knownToken);
webPageContent = webPageContent.Remove(0, tokenIndex + knownToken.Length);
while (webPageContent.Length > 0 && tokenIndex > 0)
{
String fileName = webPageContent.Substring(0, webPageContent.IndexOf("\""));
fileNames.Add(fileName);
tokenIndex = webPageContent.IndexOf(knownToken);
webPageContent = webPageContent.Remove(0, tokenIndex + knownToken.Length);
}
Finally, download all the files via DownloadFile
method by using the file names retrieved from ParseFileNamesFromWebPage
method. The DownloadFile
method is similar to GetHTTPContent
method, but this time the code uses both System.Net.WebClient
and System.Net.WebRequest
classes together.
The WebClient
class is an encapsulated class and easier to use, but the WebRequest
class provides additional information I require to perform the download, and update the download progress control. In this download demo application, I have used the WebRequest
to get the size of the file and a WebClient
to download the file using a stream. The user may decide to use the WebRequest.GetResponseStream
method instead, and either way would work.
WebClient wcDownload = new WebClient()
streamResponse = wcDownload.OpenRead(downloadFileName);
while ((bytesSize = streamResponse.Read(
downBuffer, 0, downBuffer.Length)) > 0)
{
fileStream.Write(downBuffer, 0, bytesSize);
totalSize += bytesSize;
}
if (streamResponse != null)
{
streamResponse.Close();
}
Note that this download demo application bundles the entire download step into a separate thread to avoid the user interface being frozen while the download is in progress. Thread-safe calls to Windows forms controls must be used to ensure the download thread can safely access and update the Windows control for status update. This topic is out of the scope of this article but details can be found within the download demo application package.
The download demo application gives users the capability to exclude files from download. The application currently has the limitation of only accepting one exclusion string and one inclusion string, as shown in the following screen shots:
Note
This download demo application can only download the files within a directory. If you have more than one directory that needs to be downloaded, you need to modify the source code to download files from sub-directories. You can also run multiple instances of this application to get files from sub-directories as well.
History
- March 2009 - First release