Introduction
In this article we'll see how to download files off the web. This is
accomplished without too much effort using the WebRequest
and the
WebResponse
classes. These classes offer methods that allow us to access the data from the
web as a stream. Thus we can use any of the various reader/writer classes
available for handling streams. There are two mechanisms that we can use for
downloading files. For small files we can use synchronous mechanism and for
large files or files that are downloaded from servers whose response times
cannot be predicted we can use asynchronous mechanism. I'll demonstrate both
methods in this article.
Synchronous download
void DownloadFile(String* url, String* fpath)
{
WebRequest* wrq = WebRequest::Create(url);
HttpWebResponse* hwr = static_cast<HttpWebResponse*>(wrq->GetResponse());
Stream* strm = hwr->GetResponseStream();
FileStream* fs = new FileStream(fpath,FileMode::Create,FileAccess::Write);
BinaryWriter* br = new BinaryWriter(fs);
int b;
while((b=strm->ReadByte()) != -1)
{
br->Write(Convert::ToByte(b));
}
br->Close();
strm->Close();
}
I've used five classes there in quick succession. I guess that's just what
the BCL
is all about, a lavish abundance of classes.
WebRequest
is an abstract class that allows an user to request internet
data in a protocol independent manner. We use the
static
method Create
to request our file. The WebRequest
class has
a method called GetResponse
which returns a WebResponse
object. Since in our particular case, we have requested for an HTTP file, we
cast our WebResponse
object to an HttpWebResponse
object. One big advantage of using these classes is that they all allow us
stream access. In our case the HttpWebResponse
class has a
GetResponseStream
method that returns a Stream
object that
encapsulates the requested file from the web. The rest of it is simple if you
have used streams before. If not, you can read my article on files and streams
here on CP. We simply read from the stream returned by the HttpWebResponse
object and write the data to a file.
Asynchronous download
This is a little bit more complicated than synchronous downloads. But then,
as you might expect when you are downloading several large files, then this is
the more efficient method. I vaguely remember someone from MS saying that
asynchronous methods use high performance techniques like I/O completion ports
internally.
We create our WebRequest
object just as we did above, but
instead of calling GetResponse
, we call BeginGetResponse
which begins an asynchronous request for an Internet resource. We specify a
response callback function as one of the arguments. We then wait on a
ManualResetEvent
object which is set by the callback, so that our
function will be able to block using a wait call till the entire response is
read and stored. We also pass our WebRequest
object as the state
object for the callback function.
void DownloadFileAsync(String* url, String* fpath)
{
WebRequest* wrq = WebRequest::Create(url);
finished = new ManualResetEvent(false);
m_writeEvent = new AutoResetEvent(true);
buffer = new unsigned char __gc[512];
OutFile = new FileStream(fpath,
FileMode::Create,FileAccess::Write);
wrq->BeginGetResponse(
new AsyncCallback(this,WebStuffDemo::ResponseCallback),
wrq);
finished->WaitOne();
OutFile->Close();
}
Response callback
void ResponseCallback(IAsyncResult* ar)
{
WebRequest* wrq = static_cast<WebRequest*>(ar->AsyncState);
WebResponse* wrp = wrq->EndGetResponse(ar);
Stream* strm = wrp->GetResponseStream();
strm->BeginRead(buffer,0,512,
new AsyncCallback(this,WebStuffDemo::ReadCallBack),strm);
}
The EndGetResponse
method concludes the asynchronous request
that was initiated using the BeginGetResponse
method and returns a
WebResponse
object from which we can use GetResponseStream
to get the underlying stream object. Now we begin our next asynchronous
operation on the stream. We start an asynchronous read operation using
BeginRead
. If you are wondering why we do this, here is a snip from MSDN.
"Using synchronous calls in asynchronous callback methods may result in
severe performance penalties. Internet requests made with WebRequest and its
descendents must use Stream.BeginRead to read the stream returned by the
WebResponse.GetResponseStream method"
Read callback
void ReadCallBack(IAsyncResult* ar)
{
Stream* strm = static_cast<Stream*>(ar->AsyncState);
int count = strm->EndRead(ar);
if(count > 0)
{
__wchar_t Temp __gc[] = new __wchar_t __gc[512];
Decoder* d = Encoding::UTF8->GetDecoder();
d->GetChars(buffer,0,buffer->Length,Temp,0);
String* s = new String(Temp,0,count);
Console::WriteLine(s->Length);
unsigned char wbuff __gc[] = new unsigned char __gc[512];
buffer->CopyTo(wbuff,0);
OutFile->BeginWrite(wbuff,0,count,
new AsyncCallback(this,WebStuffDemo::WriteCallBack),OutFile);
strm->BeginRead(buffer,0,512,
new AsyncCallback(this,WebStuffDemo::ReadCallBack),strm);
}
else
{
strm->Close();
finished->Set();
}
}
We call EndRead
on the stream and get back the count of bytes
that were read from the stream. EndRead
is a blocking call and is
to be called once per BeginRead
call we have initiated already. If
the count of bytes read is greater than zero, then there is more data left.
Otherwise we know that all the data has arrived and we close the stream and also
set the event on which our main function is waiting. Just as we had to use
asynchronous methods to read the data, we must use asynchronous methods
for writing the data to our file, otherwise we'll have blocking calls inside the
asynchronous callback functions which is highly inefficient.
So what we do is we call the BeginRead
method on our output
stream object. We pass our write-callback function as the callback, and pass the
output stream object as the callback function's state object. Once we do this we
call BeginRead
on our input stream object to start another
asynchronous read, as there is still more data left to be retrieved.
Write callback
void WriteCallBack(IAsyncResult* ar)
{
m_writeEvent->WaitOne();
FileStream* out = static_cast<FileStream*>(ar->AsyncState);
out->EndWrite(ar);
m_writeEvent->Set();
}
We call EndWrite
on our output stream which ends an asynchronous
write operation started by BeginWrite
. EndWrite
blocks
till all the data has been written to. Thus we are saved the bother of making
sure that all the data has got written. As you can see, I have use an
AutoResetEvent
object to make sure that two writes don't occur in
parallel and also to ensure that the writes are called in the correct order. If
multiple write callbacks are invoked, they'll all hang at the WaitOne
call and when they are executed, they'll get executed in the order in which they
called WaitOne
.