Introduction
Sometimes you need to retrieve/submit information from the web in your
applications but you don't want to write a full
library for it. You would rather focus on your specific need, assuming you
already got the information from the
web in a HTML or mshtml DOM format.
So, KUMO is done for you. You
can call your own objects in the macro, defined as plug Ins, and you can
export your
web macro as DLLs or .EXE objects.
Background
In 1998 Compaq introduced a web language to automate actions on the web.
http://research.compaq.com/SRC/WebL/. The project stopped and a
few commercial software or Java frameworks are now providing web
automation functionalities. Unfortunately nothing serious never appeared on
.NET.
Using the code
The code is based on the KUMO web macro methodology: a web macro is written in
modified C# instructions. The modified
C# instructions of the macro are the ## instructions that simply mean
that the macro has to wait for the browser to have finished
other work to move further. Another property of the ## instructions is that the
return type does not need to be declared.
The web macro uses 3 objects: SPBrowser
, SPBrowserObject
,
SPBrowserCollection
. SPBrowser
represents the current
browser,
whereas SPBrowserObject
is a wrapper of a mshtml.IHTMLElement
object, a SPBrowserCollection
is an array of SPBrowserObject
.
By writing your own .NET DLL implementing the KUMOFrwk.Plugin.IPlugin
interface and putting it in the /Plugins directory under
the installation folder of KUMO, you will be able to add your own custom
methods on the 3 objects SPBrowser
, SPBrowserObject
,
SPBrowserCollection
. You will be able to see the methods in KUMO
editor that has an AutoComplete feature that recognizes plugins.
To give a simple example I implement the getEmails()
function of the plugin ContactPlugin
that I describe later :
## browser.goToURL("http://www.google.com/jobs/eng.html");
## emails = browser.getEmails();
if (emails.Length>0)
{
MessageBox.Show(emails[0]);
}
The plugIn source code is available under the Download Source code. The
important part is the function doFunction
that will be launched
by KUMO. The function defined here will search in all objects of the current
web page those that look like an email. Of course there are
several way to optimize this function to get faster results, but this is not
the point of this article.
public object doFunction(params object[] allparameters)
{
string[] allEmails;
string strRegex = @"^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}"
+ @"\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\" +
@".)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$";
Regex emailReg = new Regex(strRegex);
doc = (mshtml.HTMLDocument)localBrowser.Document;
mshtml.IHTMLElementCollection allTags = doc.all;
System.Collections.Queue aQueue = new Queue();
foreach (mshtml.IHTMLElement anObj in allTags)
{
if (anObj.innerText != null)
{
if (anObj.innerText != "")
{
if ((emailReg.IsMatch(anObj.innerText))&
(anObj.innerText!="")) aQueue.Enqueue(
anObj.innerText);
}
}
}
allEmails = new string[aQueue.Count];
for (int i=0; i<aQueue.Count;i++)
{
allEmails[i]=(string)aQueue.Dequeue();
}
return allEmails;
}
Points of Interest
Download KUMO on www.softmorning.net