(untagged)

A simple .NET based WebClient with JavaScript support

Christian Birkl

0.00/5 (No votes)

26 Mar 2006

A simple .NET based WebClient with JavaScript support.

Introduction

If you want to access web pages within your .NET code, which means having access to the HTML document object model and evaluation of embedded scripts, the only way to go is to wrap the Internet Explorer COM interface (SHDocVw). But this method has some major drawbacks:

The greatest is the inability to capture an alert() or confirm() script call. This means, if you load a web page in your .NET code with the SHDocVw library, and it has an embedded alert('Hello World'); call, your program will wait until the user presses "OK".

Another drawback is the inability to spawn two or more Internet Explorer instances within the same process having different cookie stores. All browsers instantiated in your runtime will share the same resources, therefore you are unable to, e.g., login in as two users at the same time to a cookie based web page. There is a workaround for this (launching via System.Diagnostics.Process two or more iexplore.exe, and hooking them to your COM objects in your runtime), but this method leads to unpredictable behaviour like leaving iexplore.exe instances around if your program doesn't shutdown gracefully.

For my purposes, I needed a simple web client, which allows me to access the basic HTML document object model objects and simple JavaScript code (like the evaluation of the default ASP.NET client side validators).

Prerequisites

For building a JavaScript enabled WebClient, we need at least the following three basic components: A HTTP protocol implementation, a HTML document object model, and a JavaScript engine. Since .NET already offers a lot of built-in types for handling at least similar tasks, we will reuse them as much as possible.

A HTTP protocol implementation

The types System.Net.HttpWebRequest and System.Net.HttpWebResponse implement the HTTP protocol, and are therefore perfect for our job. They can handle GET and POST requests, allow us to read and write HTTP headers, and are capable of making HTTP as well as HTTPS connections.

A HTML document object model

Currently, the .NET framework does not support a HTML object model. But since HTML is very similar to XML, we will use System.Xml.XmlDocument and extend it to our needs.

A JavaScript engine

The .NET framework offers a JavaScript implementation called 'JScript.NET'. It is available through the Microsoft.JScript.Vsa (Visual Studio for Application) Framework, and can be easily used for our purpose.

Implementation

Cb.Web.WebClient

The type Cb.Web.WebClient is very straightforward. It has an empty default constructor, a void Get(string url); method, and the HTML document object model can be accessed via myWebClient.Window.Document.

It has two events for handling dialog stuff like alert (OnAlert) and confirm (OnConfirm) JavaScript calls, and one for capturing JavaScript errors (OnError).

// Basic example

WebClient myWebClient = new WebClient();
myWebClient.OnAlert += new AlertHandler(WebClient_OnAlert);
myWebClient.Get("http://www.thecodeproject.com/");

Cb.Web.Html.HtmlDocument

The HTML document object model can be, as already mentioned, accessed by using the property Window.Document of a WebClient. The interface is basically the default browser DOM interface (i.e., providing methods like GetElementById, GetElementsByTagName, or Body.)

myWebClient.Window.Document.GetElementById("Email").
                       SetAttribute("value", email);
myWebClient.Window.Document.GetElementById("Password").
                       SetAttribute("value", password);
myWebClient.Window.Document.Forms["subForm"].Submit();

By extending System.Xml.XmlDocument we have all the features of an XML document, like for example, X-Path queries (SelectSingleNode, SelectNodes).

The type Cb.Web.Html.HtmlReader, which is used to populate a HtmlDocument, is a very basic HTML parser which is non validating. At its current state, it won't throw any parsing error, but may eat some tags or attributes. I didn't use the SgmlReader as mentioned in the article Convert HTML to XHTML and clean unnecessary tags and attributes, because I couldn't figure out an easy way to populate our DOM with it.

Cb.Web.Scripting

For implementation of our JavaScript engine, I used the Microsoft.JScript.Vsa.VsaEngine. The basic usage is explained in the great article VSA Scripting in .NET by Mark Belles.

The interesting part of this implementation is to hide the .NET signature of our DOM objects and provide the standard DOM level signature, and also to add expando features to all scripting objects. 'Expando' means the ability to attach properties to any object at any given time.

To achieve these features, all objects which are populated to our scripting engine inherit from System.Reflection.IReflect. This way, our objects are forced to provide a method called MemberInfo[] GetMember(name) which then will be used by the VsaEngine to resolve properties and methods.

class HtmlDocument : IReflect {

    // Note: this example is stripped, its main purpose is to show you

    // the idea behind expando objects and property/method resolving

    public MemberInfo[] GetMember(string name) {
        switch (name) {
            case "getElementById": return GetType().GetMember("GetElementById");
        }
        // ... handle expando properties ...

    }
}

Expando properties are implemented by creating a dynamic FieldInfo() in the above method with the given name, which stores or gets its value through an object bound Hashtable. This way, scripts like these will then work:

document.MyVar = "World";
alert("Hello " + document.MyVar);
// Will alert 'Hello World'

Final Notes

In its current state, the attached source code can handle simple web pages and evaluate simple JavaScript sources. It currently has (basically) no CSS support nor any real DOM compliance. Its performance isn't on the edge either, but it works on basic ASP.NET web pages, and can be used in multi threaded environments.

The attached demo shows most of its currently implemented features. It opens "www.thecodeproject.com", searches for one of my articles, and if you modify the source code and provide your username and password, will rate it with 5 points ;-). On a side note, I've removed the try/catch block, which in case of failure would rate it with 1 point...

Versions

Version 1.0 (released on 26.03.2006)

Initial release.

References

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here