Introduction
Regular expression engine.
Background
This code is able to search for keywords on google.com and return position result. It also supports multi keywords search and competitors list for comparing with your domain.
The project builds in three different parts:
- core: This is the logic file containing support functions for the incoming date from the gui layer. It's also able to remove the dependency to specific gui, so you can customize it for any application type (web application, WPF).
- keys match: The gui layer, all the checks and validations are made there.
- keysmatch_setup: The setup project for the application.
What kind of experience do you get in this article:
- Use Webclient object
- Use Regular Expression mechanism
- Save files and settings
- Log viewer
- Setup project
Transfer Objects
The objects transfer between the layers and keep the parameters language in the project:
public class client_obj
{
string domain;
public string Domain { get { return domain; } set { domain = value; } }
ArrayList keyword = new ArrayList();
public ArrayList KeyWord { get { return keyword; } set { keyword = value; }}
ArrayList competiters = new ArrayList();
public ArrayList Competiters { get { return competiters; } set { competiters = value; } }
}
public class keyword_obj
{
string keyword;
public string KeyWord { get { return keyword; } set { keyword = value; } }
int position;
public int Position { get { return position; } set { position = value; } }
int page;
public int Page { get { return page; } set { page = value; } }
}
public class competitors
{
string domain;
public string Domain { get { return domain; } set { domain = value; } }
ArrayList keyword = new ArrayList();
public ArrayList KeyWord { get { return keyword; } set { keyword = value; } }
}
competiters
: Represents the competitors values
keyword_obj
: Represents the keywords values
client_obj
: Contains the client search values
Using the HtmlParse Function
The program is based on simple Regular Expression function, the function creating arraylist for the return results of your search by getting the HTML page from the WebClient
object.
private ArrayList HtmlParse(string html)
{
Match m;
ArrayList LinkArray = new ArrayList();
string HRefPattern = "<h3 class=r><a href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))";
m = Regex.Match(html, HRefPattern, RegexOptions.IgnoreCase);
while (m.Success)
{
string TempLink = m.Groups[1].Value;
LinkArray.Add(TempLink);
m = m.NextMatch();
}
return LinkArray;
}
HrefPattern
var contains regular expression language (what to search in string
; as you see I'm looking for domain address in H3
tag class. In Google, it represents the result address on the page.
Web Client Function
This function creates the connection for the site and gets a string
containing all the generated HTML from the request site.
private string RuunerGoogleHtml(string key, string RunStep)
{
WebClient Client = new WebClient();
Client.Encoding = Encoding.GetEncoding("windows-1255");
string fixUrl = "http://www.google.com/search?q=" +
key + "&hl=iw&as_qdr=all&num=10&start=" + RunStep + "&sa=N";
string html = Client.Encoding.GetString(Client.DownloadData(fixUrl));
Client.Dispose();
return html;
}
fixurl
contains the search parameters in Google. You can see it in the address box in your explorer when you do a search.
Log Engine
The application creates log files for any search. You can save the output file for history and also save your settings search.
For more information on the functions, see class file_manager.cs.
Gui Layer
This gets the users' inputs from the controls and makes the request by sending the request parameters to the core layer:
private void btn_start_Click(object sender, EventArgs e)
{
if (chk_google.Checked)
{
if (txb_client_domain.Text != "http://")
{
if (txb_keyword.Text != "")
{
lbl_Status.Text = "";
myClient = null;
MyKeyWords = null;
myClient = new client_obj();
MyKeyWords = new ArrayList();
SaveInput();
SetGoogle_progress_bar();
StopStatus = false;
googleWorker = new BackgroundWorker();
googleWorker.WorkerSupportsCancellation = true;
googleWorker.DoWork += new DoWorkEventHandler(googleWorker_DoWork);
googleWorker.RunWorkerCompleted += new RunWorkerCompletedEventHandler
(googleWorker_RunWorkerCompleted);
googleWorker.RunWorkerAsync();
}
else
{
lbl_Status.Text = "Enter keywords for search";
}
}
else
{
lbl_Status.Text = "Insert domain for search";
}
}
else
{
lbl_Status.Text = "Active your search engine";
}
}
void googleWorker_DoWork(object sender, DoWorkEventArgs e)
{
for (int a = 0; a < MyKeyWords.Count; a++)
{
if (!StopStatus)
{
dal_regex myRegex = new dal_regex();
keyword_obj mykey = new keyword_obj();
mykey = (keyword_obj)MyKeyWords[a];
for (int b = 0; b < TotalPage2Search; b++)
{
MethodInvoker Invoker = delegate
{
" Page Position: " + b.ToString() + " Hit: " + hit + "- your domain result";
myClient = myRegex.CatchValues(myClient, mykey.KeyWord, b);
try
{
google_progress.Value = google_progress.Value + 1;
}
catch
{
google_progress.Value = google_progress.Maximum;
}
google_progress.Refresh();
lbl_page.Text = (b + 1).ToString();
lbl_Status.Text = "In progress...";
lbl_page.Refresh();
lbl_key.Text = mykey.KeyWord;
lbl_key.Refresh();
};
this.Invoke(Invoker);
}
}
}
googleworker
is a type of BackGroundWorker
belonging to the Threads family. It starts a new instance of the search procedure according to the page search value to not affect the main thread of the program.
Conclusion
I hope to see more versions of this project.
Enjoy coding.
Don't forget to comment. :)