Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / ATL

Making Your Browser Talk

4.33/5 (3 votes)
21 Apr 2008CPOL2 min read 1   358  
Use the Speech SDK to make Internet Explorer read documents or portions of documents to the user.
1_small.JPG

2.jpg

Requirements

  • Microsoft Speech SDK 5.1

http://msdn2.microsoft.com/en-us/library/bb250489.aspx (Building a BHO with VS2005) and
http://msdn2.microsoft.com/en-us/library/bb735853(VS.85).aspx#IEAddOnsMenus_topic2 (Creating Menus for IE) are two websites that I used for creating this addon. I will skip how to create a browser helper object and add it to the Internet Explorer Tools Menu and get straight to the interesting stuff. Most of the code to actually speak inside speakThread comes from the MSDN documentation on the Speech API.

The Speech Thread

C++
#include <sapi.h >//we need this
...
DWORD WINAPI speakThread(LPVOID bstr)
{
    if (FAILED(::CoInitialize(NULL)))
        return -1;
    ISpVoice * pVoice = NULL;
    HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, 
                                 (void **)&pVoice);
    if( SUCCEEDED( hr ) )
    {
        hr = pVoice->Speak((BSTR)bstr,0,NULL);
        pVoice->WaitUntilDone(INFINITE);
        pVoice->Release();
        pVoice = NULL;
        ::CoUninitialize();
        return hr;
    }
    ::CoUninitialize();
    return hr;
}

What is happening here is that we are passing the thread a BSTR as an LPVOID. We initialize COM and create an interface of type ISpVoice. We call the Speak routine for this interface and give it the string we received. Next we wait until the routine is done before cleaning up.

C++
STDMETHODIMP CCIESpeechBHO::Exec(const GUID *pguidCmdGroup, 
  DWORD nCmdID, DWORD nCmdExecOpt, 
  VARIANTARG *pvaIn, VARIANTARG *pvaOut) {

    CComPtr<idispatch /> spDoc;
    HRESULT hr  = m_pBrowser->get_Document(&spDoc);

    if(SUCCEEDED(hr))
    {
        CComQIPtr<ihtmldocument2 /> spHTMLDoc = spDoc;

        if(NULL != spHTMLDoc)
        {
            IHTMLElement* pHtmlElement = NULL;
            IHTMLSelectionObject* pHtmlSelectionObject;
            spHTMLDoc->get_selection(&pHtmlSelectionObject);
            BSTR bstr;
            pHtmlSelectionObject->get_type(&bstr);
            if (_tcscmp(_tcsupr(bstr),L"TEXT")!=0)
            {
                spHTMLDoc->get_body(&pHtmlElement);
                pHtmlElement->get_innerText(&bstr);
            }
            else
            {
                IHTMLTxtRange* pHtmlTxtRange;
                pHtmlSelectionObject->createRange((IDispatch**)&pHtmlTxtRange);
                pHtmlTxtRange->get_text(&bstr);
            }
            DWORD dwThreadID=0;
            CreateThread(NULL,0,speakThread,(LPVOID)bstr,0,&dwThreadID);
        }
    }

    return hr;
}

In the Exec function, which the browser calls when the menu item is selected, a couple of things are performed. First we ask the browser for a copy of the HTML. Next we create an IHTMLSelectionObject interface and check whether or not any text is selected. If there is text selected, then we create a range and get the text, if not, we get the body and get the text from the inner body of the HTML. Finally we create a thread and pass it the text we received from our selection or body of the HTML.

Notes

  • A thread must be created or else Internet Explorer will lock up until the text is finished reading.
  • If there is hidden text on the page, this will read all of the hidden text, such as links on MSDN's website.
  • This method allows you to view other pages without waiting for the text to finish reading, however no stopping mechanism is implemented in the API, so the user has to wait until the text is finished before a new selection can be made and read.

Things I Would Like To See

  • An Adobe Acrobat version
  • A Microsoft Word version
  • A method of skipping and replaying paragraphs added

History

  • 21st April, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)