Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Web/HTTP Automation with Perl

0.00/5 (No votes)
16 Jul 2007 1  
Covers how to automate web activity that involves Java applets or Flash content making HTTP requests as well as standard web automation

Introduction

There are quite a few solutions out there that deal with web automation. Some are free or open source and some are not. Most of these solutions fall into either of these categories: automate by capturing browser activity -- e.g., clicking links, filling in forms, typing in the browser's address bar -- or automate by capturing HTTP request information, usually via a local proxy. The solution presented in this article follows the latter method, but uses an HTTP header/request capturing tool instead of a proxy.

So what makes this solution worthy of consideration? Well, that depends on your needs. If all you need to do is automate things that the other tools can do, or if you need more features that this solution does not offer, then you should use the other tools. However, there are some cases that those tools fail to address and that this solution does. An example would be automating web activity, which involves Java applets or Flash content making HTTP requests. See the Background section for more details. Furthermore, this solution has additional features that are mentioned in the Features section at the end of the article.

Background

The solution I present came from a wish to automate a particular task: I needed to configure systems at work before they could be tested via an automated test system. Well, should the configuration be automated, too? That's what I thought and that's what I set out to do. The trouble is that the system provided no direct automate-able command line configuration interface. However, I did learn from an engineer who worked on the configuration interface that configuration could be done by interfacing to the ASP pages on the system's web server, which perform the configuration. That was a lead to start with, but then there was no master API documentation and the web configuration interface loads a bunch of Java applets that function as the user interface. After some analysis, I found that these applets communicate with the ASP pages via the HTTP requests GET and POST. So, I sort of found a solution.

With Perl, I could create a user agent to do the same thing as a browser, but I needed a way for the page requests to be automated dynamically, generating a Perl script based on browser activity and avoiding the need to manually script the automation. This probably sounds simple enough, right? I could do this myself, which I did, or find an existing solution on the web. Well, that's where we run into problems.

I searched the web for existing solutions and tried a few, but they did not solve my problem one way or another. The tools that automate by capturing browser activity -- e.g. clicking links, filling in forms, typing in the browser's address bar -- did not catch the Java applet HTTP activity. The tools that automate by capturing HTTP request information via a proxy could not handle forwarding the Java applets from the proxy to the browser. It was so slow that it seemed to never load, so I quit before it ever loaded, assuming that it would load at all. The best software alternative I've tested that somewhat works for my situation is MaxQ. A review of some of the existing solutions I've tried is posted in a thread following this article.

Using the Script

The script is under 200 lines of code and is pretty much self-explanatory if you are familiar with Perl and libwww. Therefore, you can read the code yourself and contact me if you have questions. I'll focus on how to use the script instead.

The included script source is a generalized version of the one that I wrote. It was designed to work with the ieHTTPHeaders tool for Internet Explorer. It can work with other HTTP analysis tools, but may require some editing to match the HTTP trace format of the other tools. It is only designed to make HTTP GET and POST requests and receive the response. HTTP authentication, HTTPS support, advanced cookie & session management, error checking, response data parsing, and saving the response to a file or database will require manual customization. You can customize it pretty easily if you are familiar with Perl. Consult the Perl libwww library documentation on the web for reference.

To use the solution, you first need to download and install ieHTTPHeaders for Internet Explorer or an equivalent tool like livehttpheaders for Mozilla/Firefox. Then launch the browser. Go to View->Explorer Bar->ieHTTPHeaders v1.6 or something similar. This will open the tool. Now simply perform the web browsing activity that you wish to automate. As you do this, HTTP request headers will display in the tool. After you are finished, right-click the tool's display area and select Save to save the HTTP trace to a file. Use the script ParseHTTPTrace.pl to parse the trace's HTTP requests into a dynamically generated Perl user agent script. The format to use the parsing script is:

ParseHTTPTrace.pl [trace input file path] 
    [optional generated output script path]

The default output script is written to C:\Temp\WebAutomaton.pl. Once that is done, you can just run WebAutomaton.pl and it should do what you want. livehttpheaders is now supported, but I won't go over how to use it in this article. It works pretty much the same as ieHTTPHeaders. To use it with livehttpheaders, you need to convert the livehttpheaders trace file to ieHTTPHeaders format with the following tool:

lhhTraceConvert.pl [trace input file path] 
    [optional generated output script path]

The default output is C:\Temp\outTrace.txt, if not specified. Then you can run the trace file through the original parser script to get the desired Perl output script. For applications that don't involve the browser, automation would require a network protocol analyzer like Wireshark (formerly Ethereal) to generate the HTTP trace files.

Script Usage Requirements

Note: The script was written and tested under ActivePerl v5.8.7 and with iehttpheaders tool only. It should work for most Perl releases on any platform. You may use it at your own risk; I can't guarantee that it is bug-free.

  • ieHTTPHeaders, livehttpheaders, Wireshark or other HTTP trace/analysis tools. A proxy could be used for the same thing, but in my experience it doesn't work well with Java applets and such. Note that using any tool other than iehttpheaders or livehttpheaders requires modifying the script to support the tool, so that the headers get parsed in the right format
  • Perl runtime, any OS platform, release 5.8.7+ recommended
  • Perl LWP::UserAgent and HTTP::Request modules (part of libwww) installed and working

Features of this Solution Over Its Alternatives

  • Can be used to automate web activity involving media-rich applications that make HTTP requests, such as Java applets, Flash content, other plug-ins, AJAX and web services.
  • Output scripts can be compiled into executables -- with third-party tools such as the free Perl Archive "PAR" package, perl2exe or tools from ActiveState -- so that the end users can just run them without installing anything.
  • Scripts are fully customizable with Perl, including both the parser/generator script and the output scripts.
  • Scripts run standalone without additional software. Only Perl and the LWP module are required. HTTP analyzer tools are only required for capturing the HTTP requests for parsing by the Perl parser script.
  • Scripts run the automation pretty fast.
  • Uses the well-known and documented Perl LWP module.
  • For testing web applications, it can be used to bypass the need to automate the GUI of websites, etc. in order to automate testing.
  • Useful for automating 100+ HTTP requests in one go. Did you know that a single web email reading session could possibly take up 100+ HTTP requests? Imagine scripting that manually!
  • Solution works seamlessly, like MS Office macro recording and playback, requiring manual editing only for customization and performance tuning. Note that seamless execution assumes the required tools have been set up and are working.

Possible Applications

  • Automate web browser (HTTP) activity that involves Java applets, Flash content, etc. that would be tricky to automate. Could also be used as an alternative to GUI configuration automation for cases that don't involve testing GUI interactivity.
  • Automate web browser-based configuration, particularly for configurations that never change.
  • Automate website feature testing, i.e., HTTP request input and HTTP response output functionality tests.
  • Automate website load testing, i.e., run many scripts on 1 workstation or run from many workstations against a specified target server.
  • Automate REST or REST-CSV based web services testing; this works much like website testing.
  • Automate web browsing activity, i.e., check email, check account balance, check news.

History

  • 08/10/2006 - Initial release
  • 08/11/2006 - Added HTTP GET request support; updated article with usage requirements
  • 10/01/2006 - Updated article
  • 10/31/2006 - Updated article
  • 07/14/2007 - Added support for livehttpheaders tool with extra pre-parsing/trace conversion script

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here