(untagged)

Automating web browsing

mitja g

0.00/5 (No votes)

11 Dec 2006

An article on how to automate web browsing: clicking a button, entering data in a text box etc.

Sample Image

Introduction

In this article I want to present a possibility to automate tasks we are executing with the mouse and keyboard in a web browser. This can mean: to open a page, enter data in a text box, press a button etc. These basics can then be extended into more complex operations such as search, analysis of web data, automatic crawling inside a domain, or all this can be used if you would like to build your own test tool for testing web applications.

Usually, the approach for processing web data is something like this:

WebRequest req = WebRequest.Create("www.codeproject.com");
WebResponse res = req.GetResponse( );

By parsing the response HTML, one can examine the raw HTML content of the page, find out about buttons, hidden fields etc. Then one can make further posts and thus imitate a link click or a button click. But we are introducing here a different approach. We want to use the WebBrowser class from the .NET 2.0 framework. This will make it different from using the request-response model. While using the request�response combination, we are dealing with text files. With the WebBrowser control, we will access a page and its elements on a higher level; e.g., links and buttons will become "objects". So we will say something like:

Button btn = browser.ReturnElementByName(�myButton�);
btn.Click();

Background

The WebBrowser ActiveX control is an instance of your local Internet Explorer browser, with all its features and problems. All communication with this control is done via the MSHTML library. This library has interfaces for all controls of the web page and these are for all versions of Windows. Basically, you are always getting back an object and converting it to the interface you need. That�s hard, if you don't know what's the right interface. Currently, there is no complete wrapper for this. But you can see the right interface in the DomElement property of the specific element while debugging the WebBrowser's element collection.

You can read a good description of a similar component here. The article refers to the previous versions of the .NET framework.

Using the code

So let's see how we can open a new page in the browser. It is very simple:

public string OpenPage(WebBrowser browser, string urlToLoad)
{
  browser.Navigate(urlToLoad);
}

An important thing to mention here is: loading a page into the browser is an asynchronous operation. We never know when the page will be loaded. Therefore, if we want to load a page and click a button on this page, our load scenario must wait, until the page is loaded. We can achieve this as follows:

public string OpenPage(WebBrowser browser, string urlToLoad){
  bool loadFinished = false;
  
  browser.DocumentCompleted += delegate { loadFinished = true; };
  browser.Navigate(urlToLoad);
  
  while (!loadFinished && counterTimeOut > 0){
    Thread.Sleep(100);
    Application.DoEvents();
    }
}

So what we have done here: we subscribed to the event that happens when the page is loaded fully in our browser. This will set the flag to true, and our while loop will end. In the while loop, we are just waiting, and in the mean time letting the application to serve events.

To click a button we first load a page into the browser. Then we get a button by its name from WebBrowser.HtmlElementCollection. Then we cast the button to the MSHTML interface object HTMLInputElementClass. We can then click a button as follows:

HTMLInputElementClass iElement = (HTMLInputElementClass) button.DomElement;
iElement.click();

To click a link we must cast the HTMLElement to a different MSHTML object as follows:

HTMLAnchorElementClass linkElement = 
  (HTMLAnchorElementClass) linkToClick.DomElement;
linkElement.click();

To enter data in an input field we do not even need a type cast. We simply get the input box as a HtmlElement from the WebBrowser, and simply say:

element.InnerText = valueToFill;

To select a radio button, we use code as follows:

HTMLInputElementClass iElement = 
   (HTMLInputElementClass)radioToSelect.DomElement;
iElement.@checked = true;

To select a value from a combo, we use:

HTMLSelectElementClass iElement = 
   (HTMLSelectElementClass) dropdown.DomElement;
iElement.value = value;

Saving page as a picture (snapshot) can also be useful in many scenarios. A Web browser enables us to save the current page as a snapshot. We can achieve this by this sample function:

Rectangle rec = new Rectangle();
rec.Offset(0, 0);
rec.Size = browser.Document.Window.Size;

Bitmap bmp = new Bitmap(rec.Width, rec.Height);
browser.DrawToBitmap(bmp, rec);
bmp.Save(�file.path�, ImageFormat.Jpeg);

In the downloads, you have the source code for all these steps, plus a very simple Windows application which presents how the examples can be used. Please note that in order to keep the code sample as simple as possible, examples are without any object oriented principles like inheritance. Running the demo requires the .NET framework 2.0 installed.

Points of interest

From these simple steps, a complex tool for specific purposes can be written very quickly. We have, for example, developed a complete test tool, controlled via an XML file, were you simply write test cases in XML in this manner: go to a page, click a link, enter data, click a button, save page picture for later review etc.

However, the things that we haven't yet addressed are:

frame environment (old ASP sites were usually built using frames),
recording machine (so one can browse a page, and the history of what he did would be saved in a collection of simple tasks).

While using this example, do not forget to use the well known Firefox developer add-on: Web developer toolbar. It helps a lot.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here