Driving or Automating GUI Applications

daluu

2.36/5 (4 votes)

14 May 2008CPOL10 min read

GUI automation for the purpose of driving or controlling an application via its GUI

Introduction

Sometimes you want or need to control (or automate the use of) an application but the application provides no automation API (command line interface or CLI, COM component, .NET API DLL, web service, etc.), so what do you do? You're left with (graphical) user interface (GUI) automation. Although, in the case of command line based applications, that would be CLI automation. This article covers GUI automation only. For CLI automation, the best approach is to make many calls to the command line application and pass the necessary parameters each time, or redirect input and output (usually to file) for automation.

There are many solutions for GUI automation and GUI automation can be complex depending on what you want to do. There are two categories for GUI automation as well: GUI/UI testing, and driving or controlling GUI/UI applications. Generally, when we say GUI automation, at least in testing/QA field, that refers to the former, which can be troublesome to do as the hard part is verifying the state of a given GUI component. But I'm sure the commercial QA GUI test tools make it not so hard (though still tedious) to verify such components. This article, however, refers to the latter definition. Being concerned about driving applications, automating or driving the GUI is a lot easier. We are generally not concerned about the state of a given GUI component because we are not "testing" it. Verifying that the application has been driven to the appropriate state (such that it will do what we intend it to do) can be verified by either verifying GUI state within the code to drive or automate the application, as necessary, or by having the external application/script that is driving the GUI application do the verification.

This article presents several tips or methods for driving a GUI application:

Detecting and setting the active window
Using keyboard shortcuts and navigation
Get and set values of GUI components
Using relative screen coordinates and mouse events

NOTE: While there is more than one method to drive the application as mentioned above, you may have to use multiple methods to properly drive the application, depending on how it is designed. Read the details on each method to get a better idea.

This article concludes by offering some tips on automating GUI components (by their control IDs), offering ideas on creating an interface to the GUI automation, and mentioning some free GUI automation tools that you can use to drive GUI applications.

Detecting and Setting the Active Window

To drive or automate the application properly, you first must ensure that the window or dialog to automate is active, or if not, make it active. Once activated, you can begin automation. If you are not able to activate or find the active window, then you may be out of luck, or your automation will have to assume the application is active at the time the automation is running. Refer to your GUI automation solution of choice for the exact API call to make. But generally it would be like the following pseudo-API-methods:

WindowActivate("Window name",other parameters)
WaitForActiveWindow("Window name to wait until active",timeout period, other parameters)

The "waiting for active window" method can be used to verify that a certain window appears due to a particular action in driving the application like bringing up the File Open/Save dialog or displaying a popup message box.

Using Keyboard Shortcuts and Navigation

Use keyboard shortcuts whenever possible as it makes driving the application much easier. By keyboard shortcuts, I mean things like Alt+F, Ctrl+O, Shift+A, Ctrl+Shift+S, F1, Alt+Tab, Ctrl+ sc, etc. Other keys for navigation and data entry may also be used like Enter/Return, Tab, Shift+Tab, Page Up/Down, Up/Down/Left/Right, and alphanumeric keys. However, do not overuse navigation and data entry keys like tabbing and navigating through form fields, typing in text as necessary. While this is possible, it makes for crappy code that is hard to follow (like how many Tabs are needed to reach field C or how many Shift+Tabs to go back to field A?). So to conclude this section, use keyboard primarily for keyboard shortcuts, and for keyboard based navigation and text entry, only when absolutely necessary (when the other automation methods do not work).

Refer to your GUI automation solution of choice for the exact API call to make. But generally, it would be some version of a Microsoft Windows SendKeys command.

Get and Set Values of GUI Components

This is the preferred method of automation, where possible. To use this method, you would use a GUI component/control spy tool that can tell you details about each GUI component like its control identifier (control ID), name, class, instance, etc. You are primarily concerned about the control ID. In my experience, I have found that using the control ID, you can easily automate a control. But it often doesn't work when trying with just the class name or other control parameters.

To be sure the control can be automated, you need to verify that particular GUI component you wish to control has a control ID and the ID is static and never changes whenever you run the application. So you need to verify this by spying on the control for its ID, then close the application and repeat (by reopening the application to spy on again). Verify that the ID never changes. You may also wish to do some other things like repeatedly bring up the File Open/Save dialog or popup message box and verify the ID of the control of interest doesn't change, or restart operating system and repeat the spying process to verify the ID doesn't change. If the ID doesn't change, you are good to proceed with this method. If it changes, you will need to use the other automation methods to drive the application.

Once you have the control IDs and you know they don't change, reading the control's value or setting its value is simply a matter of using the proper automation API methods. Refer to your GUI automation solution of choice for the exact API call to make. See section Tips on automating GUI components for additional information on using this method of automation. Use whatever GUI automation solution you prefer, but for the GUI spy tool, I prefer and recommend RanorexSpy.

Using Relative Screen Coordinates and Mouse Events

Sometimes, using the mouse is the best option, even though it is best to avoid the use of the mouse in automation. This applies when there is no alternative keyboard shortcut, or the keyboard navigation alternative requires a lot more navigation steps, and when the GUI component of interest has no control ID or the ID is not static (changes all the time). Counterpath's eyeBeam and X-Lite softphones are a good example of this dilemma as GUI spy tools will not be able to detect most of the GUI controls like the keypad, Flash button, record button, etc. And to make matters worse, the keyboard shortcut commands don't cover the use of Flash and call record features. So for those features, you would have to use mouse events, since you can't detect the control nor easily navigate to it using the keyboard.

To make best use of the mouse in driving the application, use relative coordinates, or more specifically, "coordinates relative to the active window". Using this approach, it doesn't matter where the active window is located, and screen resolution doesn't matter. Depending on your GUI automation solution of choice, the solution may already use relative coordinates by default, or you may have to specifically set or specify use of relative coordinates.

So now, how do you figure out the relative coordinates for automation? By using an image editor. Generally, any good image editor will do, but I prefer to use the free Paint.NET. First, make sure you have the active window focused on the screen, then take a screenshot (Print Screen or better yet Alt + Print Screen). Then paste the screenshot into your image editor of choice. If you took a screenshot of the whole desktop and not just the active window, you will need to crop out the active window. Next, move your mouse to the GUI component or area of interest. Your image editor should have a section/area or tool (often in the status bar) that tells you the coordinates of your mouse relative to the current image you are editing. Jot down the coordinates of interest. Finally, using the GUI automation solution of choice, invoke a mouse click or mouse down event with the given coordinates against the active window. Refer to your GUI automation solution of choice for the exact API call to make.

Tips on Automating GUI Components

I will update this section as I have more comments to make. I haven't done all that much (driving) GUI automation myself, so I haven't encountered all the possible pitfalls of (driving) GUI automation by get/set their values using control IDs. Generally, text fields are the easiest to work with, and possibly radio buttons, checkboxes, and buttons as well. The following controls are a bit trickier to work with:

Select boxes or drop down menus

In my experience, to get this control to work properly, you need to first mouse click the control or do a show drop down list command to make the drop down list appear. Then you can use the list selection command to select the appropriate value. You may then have to do another mouse click or hide drop down list command to finalize the selection. If you don't do this procedure, you can't seem to set the control's value. Using a GUI spy tool, you will see that the drop down menu control ID is the ID you see when the drop down list is not visible. The complete drop down list, when visible, has another control ID. You want to use the ID of the control when the list is not visible or dropped down.

Multiple selection list boxes? Not sure at this point.

Expandable show/hide Tree menu structures? Not sure at this point.

File/folder browsing tree structures? Not sure at this point.

Popup menus? Not sure at this point.

Etc. You are welcome to provide me your experiences so I can update this section.

Interface to GUI Automation

Except for a fixed or single use case automation scenario, you would need to provide an interface to the GUI automation that drives the application because, it is in essence a new "tool" that you developed and not a QA test script. There are several options you could do:

Wrap automation with a command line interface (CLI)
Wrap automation with a COM component interface
Wrap automation with a API DLL library interface like for .NET
Wrap automation with a web service interface

I generally prefer to wrap the automation around a CLI since you can call the CLI from any script or another application and it is cross platform (you can call a CLI over the network using SSH, Telnet, etc.). The other options may be a better fit, depending on your needs and platform of choice.

Resources: Some Tools for GUI Automation

The following are tools you can use to drive or automate GUI applications:

AutoIt - Free and very good GUI automation solution with its own spy tool, custom scripting language, and compiler (to compile scripts to executables). Also supports C++, VB, and VBScript via COM component.
Ranorex - GUI automation solution based on .NET. Free for personal use. Commercial use requires purchasing license. Complete package comes with spy tool and script recorder/generator utility. Supports C++ (on .NET), C#, and Python. Probably works with VB.NET as well. Works very well, if only it could be completely free. The spy tool is available standalone and free for all use.
Perl Win32::GuiTest module - Free GUI automation solution for Perl. Works well but not as full featured as AutoIt or Ranorex. For "real" GUI automation, this tool may or may not work, because some features are lacking or not well supported.
RanorexSpy - Free and very good GUI spy tool for automation use.
WinSpy++ - Another free GUI spy tool for automation use.
Paint.NET - Good free image editor I use to get relative mouse coordinates for automation.
There are quite a few more GUI automation solutions and tools but the ones mentioned here I have tried and are some of the best.

History

Dec 30^th, 2007 - Initial release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)