Introduction
I was interested in knowing how to make a web page that would take a snapshot of another web page if the URL to that page was provided. I had Googled a lot for this, however all I came up with was a host of third party components. It was here on these pages that I found that you actually have a component built-in to your operating system that handles web browsing. And it so happens that this component is shipped into your VS IDE (2005) as the WebBrowser control; I mean this control wraps the functionality provided by this component.
Background
This project is probably unlike any other web project you've done before. To be precise, it not just a web project; it is also a Windows application project. Yes, and you are probably wondering how and why we need one? Wait and see.
The Windows Application (prjSnapShot)
Here is the link to what actually made me think that the process of capturing an image of a web page was possible. However you'd need to carry out this project and tune it according to your needs. It is quite simple to understand. All you need is a WebBrowser control docked on to a Windows Form, and then the code below to get you started. It is almost similar to the project I had referred to, however I have made some changes to suit my needs.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
namespace Snapshot
{
public partial class browserForm : Form
{
public browserForm()
{
InitializeComponent();
}
private string _URL;
private Bitmap docImg;
private bool DownloadingComplete;
public string URL
{
get
{
return _URL;
}
set
{
_URL = value;
}
}
public Image GetSnapshot()
{
if (_URL == null)
throw new Exception("No input url given");
if (_URL.Contains("http://") == false)
throw new Exception("Invalid url input. No http prefix was found");
docImg = new Bitmap(this.Width, this.Height);
Control ctrl = wbCtrl;
wbCtrl.Navigate(_URL);
while (!DownloadingComplete)
Application.DoEvents();
return System.Drawing.Image.FromHbitmap(docImg.GetHbitmap());
}
private void wbCtrl_DocumentCompleted
(object sender, WebBrowserDocumentCompletedEventArgs e)
{
DownloadingComplete = true;
Debug.Print("Download Completed @ " + DateTime.Now.ToString());
docImg = new Bitmap(this.Width, this.Height);
Control ctrl = wbCtrl;
ctrl.DrawToBitmap(docImg, new Rectangle(wbCtrl.Location.X, wbCtrl.Location.Y,
wbCtrl.Width, wbCtrl.Height));
}
}
}
You might need to run this project a couple times; add it to another Windows project like how it is suggested in the article I've linked to. The beauty of the whole thing is, you won't actually make this form visible! After you are done with the testing, convert the project to a DLL. We are going to add this to the next project.
The ASP.NET Project
Add the above DLL to this project. Let me warn you that if you try to use it raw at this point, you might run into a host of issues. Oh, this project is not that much of "doing" by the way. But you've to understand some concepts. Namely how the ASP.NET engine processes your web page. This is a vast topic in itself, however I will only mention a few things:
- Your web page is executed on a Multi-Threaded Apartment (MTA) model. I guess ASP.NET runs this way to optimize the loading of a web page. The resources required by the web page are handled by the thread on which the processing is initiated on. You don't have to look into this deeply. However, ignoring this fact can trigger an exception to your web application.
Here, you have to realize that we have added a Windows project to an ASP.NET project. Most of the control, and probably the entire form itself runs on a Single Thread Apartment (STA) model. Hence you'd get an error saying something like thread not STA.
To overcome this hurdle, all you've to do is set the web page's page AspCompat
page attribute to true
. The ASP.NET engine will at once realize that this web page has to be processed on a STA. In fact, I believe this was how ordinary ASP pages used to get processed.
The next hurdle is something that would really bug you. Everything seems okay, but you get a weird COM exception, and a weird looking error code. Sometimes, the first time you run the application it goes smooth, but every subsequent request throws an error. Googling this error code is up to no use. You need some COM knowledge. However, I am not such an expert in this area, but I can safely tell you that you've to tell your operating system that you are done using your COM object; otherwise other applications using the same object can behave erratically. So ultimately what you've witnessed here is probably the COM not getting de-referenced properly. If you do run into such a problem, stop the local web server (that is hosted for you by the IDE), and re-start it; run the project once more within your IDE.
Using the Code
I hope you've understood all the above. Now using the code is as simple as follows:
protected void btnGenerate_Click(object sender, EventArgs e)
{
browserForm frm = new browserForm();
frm.URL = txtURL.Text;
System.Drawing.Image snapshot;
snapshot = frm.GetSnapshot();
snapshot.Save(Server.MapPath("snap.jpg"), System.Drawing.Imaging.ImageFormat.Jpeg);
frm.Dispose();
Image1.ImageUrl = "snap.jpg";
}
Oh, and all my web form ever had was textbox, a button and an image control.
Things I Want to Know
I have not carried out this exercise in a production environment. I am thinking, may be, the code needs some tuning. However if you folks want to play with this, then you are most welcome!
History
Phew, published my first article after a decade of editing!