Contents
Introduction
In this article, we examine methods for reading text displayed on screen by another process (also called screen-scraping). The methods presented here can be useful in programmatically determining the state of another program from the point of view of a user. Though the attached example code is written for Windows Mobile 5 and newer, the concepts presented should transfer to big Windows without difficulty.
Locating the Text to Read
The first thing we must do is determine what text we want to read. Microsoft introduces a useful concept in their Spy++ tool called the Finder which extends handily to our application.
The user can drag the target icon from the Finder to the object containing the text to be read.
In our application, we implement the Finder control by overriding the WM_LBUTTONUP
and WM_COMMAND
messages. We use SetCapture()
so that our application can receive the WM_LBUTTONUP
message even when the cursor moves outside the boundary of our application's dialog. Note that while the image above shows the target cursor over the item, that won't actually happen in Windows Mobile. The image is altered to make its meaning more obvious.
BEGIN_MSG_MAP( CMainDlg )
MESSAGE_HANDLER( WM_INITDIALOG, OnInitDialog )
COMMAND_ID_HANDLER( IDC_FINDER, OnFinder )
MESSAGE_HANDLER( WM_LBUTTONUP, OnLButtonUp )
END_MSG_MAP()
LRESULT OnInitDialog( UINT ,
WPARAM ,
LPARAM ,
BOOL& bHandled )
{
finder_image_.LoadBitmap( MAKEINTRESOURCE( IDB_FINDER ) );
finder_empty_image_.LoadBitmap( MAKEINTRESOURCE( IDB_FINDER_EMPTY ) );
return ( bHandled = FALSE );
}
LRESULT OnFinder( WORD ,
WORD ,
HWND ,
BOOL& )
{
SetCapture();
finder_.SetBitmap( finder_empty_image_ );
return 0;
}
LRESULT OnLButtonUp( UINT ,
WPARAM ,
LPARAM lParam,
BOOL& )
{
if( m_hWnd == GetCapture() )
{
ReleaseCapture();
finder_.SetBitmap( finder_image_ );
}
return 0;
}
CStatic finder_;
CBitmap finder_image_;
CBitmap finder_empty_image_;
Locating the Text's Window
WM_LBUTTONUP
gives us the client coordinates of the point wherever the user releases the stylus or left mouse button. We will use the WindowFromPoint()
function to determine what window lives at those coordinates.
LRESULT OnLButtonUp( UINT ,
WPARAM ,
LPARAM lParam,
BOOL& )
{
POINT finder_point = { GET_X_LPARAM( lParam ),
GET_Y_LPARAM( lParam ) };
ClientToScreen( &finder_point );
HWND target = ::WindowFromPoint( screen_point );
}
Unfortunately, WindowFromPoint()
has a limitation. From its MSDN page:
The WindowFromPoint
function does not retrieve a handle to a hidden or disabled window, even if the point is within the window. An application should use the ChildWindowFromPoint
function for a nonrestrictive search.
To use ChildWindowFromPoint()
as the documentation suggests, we must provide a parent window and client-coordinates relative to that parent. So, our code must be changed to:
{
POINT finder_point = { GET_X_LPARAM( lParam ),
GET_Y_LPARAM( lParam ) };
ClientToScreen( &finder_point );
HWND parent = ::GetParent( ::WindowFromPoint( screen_point ) );
POINT client_point;
HWND target = GetChildMost( parent, screen_point, &client_point );
}
HWND GetChildMost( HWND parent_window,
const POINT& screen_point,
POINT* parent_point )
{
*parent_point = screen_point;
::ScreenToClient( parent_window, parent_point );
HWND child = ::ChildWindowFromPoint( parent_window, *parent_point );
if( NULL == child || child == parent_window )
return parent_window;
return GetChildMost( child, screen_point, parent_point );
}
Now, we will always locate the correct window regardless of its state.
Reading Text from the Static Control
We've located the control with the text we want to read, so we will now examine several methods of extracting that text and discuss the limitations of each method. We will start with the simplest control to read - the Static control or Label. Later in this article, we will examine more complex controls.
Naïve Method
The most obvious and easiest method to get the text of a given window is GetWindowText()
and GetWindowTextLength()
. We could implement that as below:
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
DWORD text_length = ::GetWindowTextLength( target );
if( text_length > 0 )
{
std::vector< wchar_t > window_text_buffer( text_length + 1 );
wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
if( ::GetWindowText( target, window_text, text_length + 1 ) )
{
}
}
}
Unfortunately, this method has a number of limitations. As Raymond Chen points out in The Old New Thing - "The secret life of GetWindowText", GetWindowText()
is mainly used to get the title of a frame. It doesn't work if you're using it from another process to get the text of a control that does custom text management. For that, we need to use WM_GETTEXT
and WM_GETTEXTLENGTH
.
Naïve Method II
Changing from GetWindowText()
to WM_GETTEXT
isn't much work, really. Just replace GetWindowText()
with a couple of SendMessage()
calls, and voilà:
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
DWORD text_length = ::SendMessage( target, WM_GETTEXTLENGTH, 0, 0 );
if( text_length > 0 )
{
std::vector< wchar_t > window_text_buffer( text_length + 1 );
wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
if( ::SendMessage( target,
WM_GETTEXT,
text_length + 1,
reinterpret_cast< LPARAM >( window_text ) ) )
{
}
}
}
Too bad it doesn't work. If we run this code, we find the WM_GETTEXTLENGTH
returns the length of the text string as expected. But, though WM_GETTEXT
succeeds, it returns an empty string.1 Why? Consider what we're doing: We're sending a message to another process and asking it to populate a buffer within our process with data. That's a big no-no. For this to work, we need to access a memory space that can be shared between processes. Memory-mapped files to the rescue!
The Memory Mapped File Standard Allocator
The memory-mapped file gives us access to a piece of the virtual address space that can be used to share either a file or memory between processes. In our case, we aren't likely to need to share enough data to warrant using a file, so we will used "RAM-backed mapping" where the data resides entirely in RAM and is never paged out.
The memory-mapped file API consists of three functions that are relevant to our program:
CreateFileMapping
- Creates a memory mapped file in the shared virtual address space
MapViewOfFile
- Gives us a pointer to the file
UnmapViewOfFile
- Releases and invalidates our pointer to the file
It would be a terrible thing to have to go from the elegance of using a std::vector<>
to having to put all that memory-mapped file code around every call to SendMessage()
. Fortunately, the standard library has a little-used faculty to deal with just this sort of situation. Every standard library container has at least two template parameters. The first (and by far the most commonly used) defines what will be stored in the container. The second parameter, however, defines how the container should allocate space for those objects. We will define an allocator that std::vector<>
can use to allocate space in a memory mapped file.
template< class T >
class MappedFileAllocator
{
public:
typedef T value_type;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
pointer address( reference r ) const { return &r; };
const_pointer address( const_reference r ) const { return &r; };
void construct( pointer p, const_reference val ) { new( p ) T( val ); };
void destroy( pointer p ) { p; p->~T(); };
template< class U >
struct rebind { typedef MappedFileAllocator< U > other; };
MappedFileAllocator() throw() : mapped_file_( INVALID_HANDLE_VALUE )
{
};
template< class U >
explicit MappedFileAllocator( const MappedFileAllocator< U >& other ) throw()
: mapped_file_( INVALID_HANDLE_VALUE )
{
::DuplicateHandle( GetCurrentProcess(),
other.mapped_file_,
GetCurrentProcess(),
&this->mapped_file_,
0,
FALSE,
DUPLICATE_SAME_ACCESS );
};
pointer allocate( size_type n, const void* = 0 )
{
mapped_file_ = ::CreateFileMapping( INVALID_HANDLE_VALUE,
NULL,
PAGE_READWRITE,
0,
n,
NULL );
return reinterpret_cast< T* >( ::MapViewOfFile( mapped_file_,
FILE_MAP_READ | FILE_MAP_WRITE,
0,
0,
n ) );
};
void deallocate( pointer p, size_type n )
{
if( NULL != p )
{
::FlushViewOfFile( p, n * sizeof( T ) );
::UnmapViewOfFile( p );
}
if( INVALID_HANDLE_VALUE != mapped_file_ )
{
::CloseHandle( mapped_file_ );
mapped_file_ = INVALID_HANDLE_VALUE;
}
};
size_type max_size() const throw()
{
return std::numeric_limits< size_type >::max() / sizeof( T );
};
private:
void operator=( const MappedFileAllocator& );
HANDLE mapped_file_;
};
Now, we are able to define a memory buffer with all the advantages of std::vector<>
that can be shared between processes.
typedef std::vector< byte, MappedFileAllocator< byte > > MappedBuffer;
Naïve Method III
Let's revisit our last method with this memory-mapped buffer and see how it works.
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
DWORD text_length = ::SendMessage( target, WM_GETTEXTLENGTH, 0, 0 );
if( text_length > 0 )
{
MappedBuffer window_text_buffer( ( text_length + 1 ) * sizeof( wchar_t ) );
wchar_t* window_text = reinterpret_cast< wchar_t* >( &result_buffer.front() );
if( ::SendMessage( target,
WM_GETTEXT,
text_length + 1,
reinterpret_cast< LPARAM >( window_text ) ) )
{
}
}
}
Our code barely changed at all, but the result is exactly what we want... With one exception. SendMessage()
waits for the target process to respond before returning. What if the target process is frozen? With our code the way it is, we could freeze our application waiting for the other process to get its act together. Fortunately, Microsoft has taken care of this eventuality with SendMessageTimeout()
.
The Final Method
Putting it all together, we end up with an algorithm that can safely retrieve text from any static, button, check-box, combo-box, or edit control.
DWORD timeout = 1000;
if( ( ( ::GetWindowLong( target, GWL_STYLE ) & SS_ICON ) == 0 ) )
{
DWORD text_length = 0;
if( ( ::SendMessageTimeout( target,
WM_GETTEXTLENGTH,
0,
0,
SMTO_NORMAL,
timeout,
&text_length ) ) &&
( text_length > 0 ) )
{
MappedBuffer window_text_buffer( ( text_length + 1 ) * sizeof( wchar_t ) );
wchar_t* window_text =
reinterpret_cast< wchar_t* >( &window_text_buffer.front() );
DWORD copied = 0;
if( ( ::SendMessageTimeout( target,
WM_GETTEXT,
text_length + 1,
reinterpret_cast< LPARAM >( window_text ),
SMTO_NORMAL,
timeout,
&copied ) > 0 ) &&
( copied > 0 ) )
{
}
}
}
Reading Text from the List View Control
Being able to get the text from statics, buttons, check-boxes, combo-boxes, and edit controls is great, but there are lots of other controls out there. Let's take a look at a more complex control, the List View, where WM_GETTEXT
doesn't work. The List View is used in applications like File Explorer and Task Manager. There is a three step process for retrieving its text:
- Verify there are items in the list view -
LVM_GETITEMCOUNT
- Locate the item our cursor is over -
LVM_SUBITEMHITTEST
- Get the text of that item -
LVM_GETITEM
Since there's no sense in our program looking for text in an empty List View, let's first check to see if there are any items in the view.
bool CheckValiditiy( HWND target, DWORD timeout = INFINITE )
{
DWORD item_count = 0;
if( ::SendMessageTimeout( target,
LVM_GETITEMCOUNT,
0,
0,
SMTO_NORMAL,
timeout,
&item_count ) > 0 )
{
return item_count > 0;
}
return false;
};
You may have wondered why our GetChildMost()
function needed to return the mouse point in client coordinates for the child window whose text we were scraping. After all, we didn't need it to get the static control text. But, more complex controls, like the List View, have multiple text elements. We will use the client coordinates to determine which text element we're looking at using a "hit test".
typedef struct {
int item;
int subitem;
} item_type;
bool LocateItem( HWND target,
const POINT& pt,
item_type* item,
DWORD timeout = INFINITE )
{
MappedBuffer hti_buffer( sizeof( LVHITTESTINFO ) );
LVHITTESTINFO* hti =
reinterpret_cast< LVHITTESTINFO* >( &hti_buffer.front() );
hti->pt = pt;
int res = 0;
if( ::SendMessageTimeout( target,
LVM_SUBITEMHITTEST,
0,
reinterpret_cast< LPARAM >( hti ),
SMTO_NORMAL,
timeout,
reinterpret_cast< DWORD* >( &res ) ) > 0 &&
res > -1 )
{
item->item = hti->iItem;
item->subitem = hti->iSubItem;
return true;
}
return false;
};
Now that we know which item and sub item our coordinates point to, we send the List View a LVM_GETITEM
message to receive the text for the selected item in the List View.
bool GetText( HWND target,
const item_type& item,
DWORD length,
std::wstring* text,
DWORD timeout = INFINITE )
{
MappedBuffer lvi_buffer(
sizeof( LV_ITEM ) + sizeof( wchar_t ) * length );
LV_ITEM* lvi =
reinterpret_cast< LV_ITEM* >( &lvi_buffer.front() );
lvi->mask = LVIF_TEXT;
lvi->iItem = item.item;
lvi->iSubItem = item.subitem;
lvi->cchTextMax = length;
lvi->pszText = reinterpret_cast< wchar_t* >(
&lvi_buffer.front() + sizeof( LV_ITEM ) );
BOOL success = FALSE;
if( ::SendMessageTimeout( target,
LVM_GETITEM,
0,
reinterpret_cast< LPARAM >( lvi ),
SMTO_NORMAL,
timeout,
reinterpret_cast< DWORD* >( &success ) ) > 0 &&
success )
{
*text = lvi->pszText;
return true;
}
return false;
};
Reading Text from the Tab Control
Like the List View control, we have a three step process for scraping the text from a Tab control:
- Verify there are items in the tab control -
TCM_GETITEMCOUNT
- Locate the tab our cursor is over -
TCM_HITTEST
- Get the text of that tab -
TCM_GETITEM
As before, we first check to see if there are any tabs in the control.
bool CheckValiditiy( HWND target, DWORD timeout = INFINITE )
{
DWORD item_count = 0;
if( ::SendMessageTimeout( target,
TCM_GETITEMCOUNT,
0,
0,
SMTO_NORMAL,
timeout,
&item_count ) > 0 )
{
return item_count > 0;
}
return false;
};
Then, we determine which tab our pointer is over.
BOOL LocateItem( HWND target,
const POINT& pt,
item_type* item,
DWORD timeout = INFINITE )
{
MappedBuffer tch_buffer( sizeof( TCHITTESTINFO ) );
TCHITTESTINFO* tch =
reinterpret_cast< TCHITTESTINFO* >( &tch_buffer.front() );
tch->pt = pt;
item_type it;
if( ::SendMessageTimeout( target,
TCM_HITTEST,
0,
reinterpret_cast< LPARAM >( tch ),
SMTO_NORMAL,
timeout,
reinterpret_cast< DWORD* >( &it ) ) > 0 )
{
if( it > -1 )
{
*item = it;
return true;
}
}
return false;
};
Lastly, we query the tab control for the text of that tab.
bool GetText( HWND target,
const item_type& item,
DWORD length,
std::wstring* text,
DWORD timeout = INFINITE )
{
MappedBuffer tc_buffer( sizeof( TCITEM ) + sizeof( wchar_t ) * length );
TCITEM* tc = reinterpret_cast< TCITEM* >( &tc_buffer.front() );
tc->cchTextMax = length;
tc->mask = TCIF_TEXT;
tc->pszText = reinterpret_cast< wchar_t* >(
&tc_buffer.front() + sizeof( TCITEM ) );
BOOL success = FALSE;
if( ::SendMessageTimeout( target,
TCM_GETITEM,
item,
reinterpret_cast< LPARAM >( tc ),
SMTO_NORMAL,
timeout,
reinterpret_cast< DWORD* >( &success ) ) > 0 )
{
if( success )
{
*text = tc->pszText;
return true;
}
}
return false;
}
Bringing it all Together
By now, it is obvious a pattern is emerging. We can get the screen text for any control type by following a fairly general procedure:
- Check the validity of the control.
- Locate the text item within the control.
- Get the length of the text.
- Get the text.
We can generalize each of these procedural elements into a 'traits' structure:
struct TabTraits
{
typedef int item_type;
static wchar_t* ClassName() { return WC_TABCONTROL; };
static bool CheckValiditiy( HWND target, DWORD timeout = INFINITE );
static BOOL LocateItem( HWND target,
const POINT& pt,
item_type* item,
DWORD timeout = INFINITE );
static DWORD GetTextLength( HWND target,
const item_type& item,
DWORD timeout = INFINITE );
static bool GetText( HWND target,
const item_type& item,
DWORD length,
std::wstring* text,
DWORD timeout = INFINITE );
};
We supply the 'traits' structure as a template parameter to a generalized algorithm that performs each step.
template< class T >
bool DoReadScreenText( HWND target,
const POINT& client_point,
std::wstring* screen_text,
DWORD timeout = INFINITE )
{
if( T::CheckValiditiy( target, timeout ) )
{
T::item_type item;
if( T::LocateItem( target, client_point, &item, timeout ) )
{
DWORD length = T::GetTextLength( target, item, timeout );
if( length > 0 )
{
return T::GetText( target, item, length, screen_text, timeout );
}
}
}
return false;
}
Using GetClassName()
we can determine the type of control we're reading. This allows us to create a control structure that can read the text from any on-screen control.
bool ReadScreenText( HWND target,
const POINT& client_point,
std::wstring* screen_text,
DWORD timeout )
{
wchar_t class_name[ 257 ] = { 0 };
::GetClassName( target, class_name, _countof( class_name ) );
if( wcsstr( class_name, TabTraits::ClassName() ) )
{
return DoReadScreenText< TabTraits >( target,
client_point,
screen_text,
timeout );
}
else if( wcsstr( class_name, ... ) )
{
}
else if ...
}
The attached code has methods for reading from Static, Tab, List View, and List Box controls. Reading from other control types such as Headers, Menus, Tree Views, Today-Screen plugins, or other custom controls is left as an exercise to the interested reader.
Footnotes
- This isn't strictly true.
WM_GETTEXT
won't always return an empty string. There are three window messages that are treated specially: WM_GETTEXT
, WM_SETTEXT
, and WM_COPYDATA
. The result of sending these messages with process-local memory buffers seems to vary depending on what version of Windows is being used and how that control handles the message. For this to work in the general case, we provide it with a memory-mapped file. It won't hurt in cases where it's not necessary.