Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Google Spider .NET

0.00/5 (No votes)
9 Jul 2009 1  
Search your keyword against your competitors

Introduction

Regular expression engine.

Background

This code is able to search for keywords on google.com and return position result. It also supports multi keywords search and competitors list for comparing with your domain.

The project builds in three different parts:

  • core: This is the logic file containing support functions for the incoming date from the gui layer. It's also able to remove the dependency to specific gui, so you can customize it for any application type (web application, WPF).
  • keys match: The gui layer, all the checks and validations are made there.
  • keysmatch_setup: The setup project for the application.

schema.jpg

What kind of experience do you get in this article:

  1. Use Webclient object
  2. Use Regular Expression mechanism
  3. Save files and settings
  4. Log viewer
  5. Setup project

Transfer Objects

The objects transfer between the layers and keep the parameters language in the project:

public class client_obj
{
string domain;
public string Domain { get { return domain; } set { domain = value; } }
ArrayList keyword = new ArrayList();
public ArrayList KeyWord { get { return keyword; } set { keyword = value; }}
ArrayList competiters = new ArrayList();
public ArrayList Competiters { get { return competiters; } set { competiters = value; } }
}
public class keyword_obj
{
string keyword;
public string KeyWord { get { return keyword; } set { keyword = value; } }
int position;
public int Position { get { return position; } set { position = value; } }
int page;
public int Page { get { return page; } set { page = value; } }
}
public class competitors
{
string domain;
public string Domain { get { return domain; } set { domain = value; } }

ArrayList keyword = new ArrayList();
public ArrayList KeyWord { get { return keyword; } set { keyword = value; } }
}
  • competiters: Represents the competitors values
  • keyword_obj: Represents the keywords values
  • client_obj: Contains the client search values

Using the HtmlParse Function

The program is based on simple Regular Expression function, the function creating arraylist for the return results of your search by getting the HTML page from the WebClient object.

private ArrayList HtmlParse(string html)
{
Match m;
ArrayList LinkArray = new ArrayList();
string HRefPattern = "<h3 class=r><a href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))";
m = Regex.Match(html, HRefPattern, RegexOptions.IgnoreCase);
while (m.Success)
{
string TempLink = m.Groups[1].Value;
LinkArray.Add(TempLink);
m = m.NextMatch();
}
return LinkArray;
}

HrefPattern var contains regular expression language (what to search in string; as you see I'm looking for domain address in H3 tag class. In Google, it represents the result address on the page.

Web Client Function

This function creates the connection for the site and gets a string containing all the generated HTML from the request site.

private string RuunerGoogleHtml(string key, string RunStep)
{
WebClient Client = new WebClient();
Client.Encoding = Encoding.GetEncoding("windows-1255");
 
string fixUrl = "http://www.google.com/search?q=" + 
	key + "&hl=iw&as_qdr=all&num=10&start=" + RunStep + "&sa=N";
string html = Client.Encoding.GetString(Client.DownloadData(fixUrl));
Client.Dispose();
return html;
}

fixurl contains the search parameters in Google. You can see it in the address box in your explorer when you do a search.

Log Engine

The application creates log files for any search. You can save the output file for history and also save your settings search.

For more information on the functions, see class file_manager.cs.

Gui Layer

This gets the users' inputs from the controls and makes the request by sending the request parameters to the core layer:

private void btn_start_Click(object sender, EventArgs e)
{
if (chk_google.Checked)
{
if (txb_client_domain.Text != "http://")
{
if (txb_keyword.Text != "")
{
lbl_Status.Text = "";
myClient = null;
MyKeyWords = null;
myClient = new client_obj();
MyKeyWords = new ArrayList();
SaveInput();
SetGoogle_progress_bar();
StopStatus = false;
googleWorker = new BackgroundWorker();
googleWorker.WorkerSupportsCancellation = true;
googleWorker.DoWork += new DoWorkEventHandler(googleWorker_DoWork);
googleWorker.RunWorkerCompleted += new RunWorkerCompletedEventHandler
					(googleWorker_RunWorkerCompleted);
googleWorker.RunWorkerAsync();
}
else
{
lbl_Status.Text = "Enter keywords for search";
}
}
else
{
lbl_Status.Text = "Insert domain for search";
}
}
else
{
lbl_Status.Text = "Active your search engine";
}
}
void googleWorker_DoWork(object sender, DoWorkEventArgs e)
{
for (int a = 0; a < MyKeyWords.Count; a++)
{
if (!StopStatus)
{
dal_regex myRegex = new dal_regex();
keyword_obj mykey = new keyword_obj();
mykey = (keyword_obj)MyKeyWords[a];
for (int b = 0; b < TotalPage2Search; b++)
{
MethodInvoker Invoker = delegate
{
//lbl_Status.Text = "Search: " + tempKey.KeyWord + 
	" Page Position: " + b.ToString() + " Hit: " + hit + "- your domain result";
myClient = myRegex.CatchValues(myClient, mykey.KeyWord, b);
try
{
google_progress.Value = google_progress.Value + 1;
}
catch
{
google_progress.Value = google_progress.Maximum;
}
google_progress.Refresh();
lbl_page.Text = (b + 1).ToString();
lbl_Status.Text = "In progress...";
lbl_page.Refresh();
lbl_key.Text = mykey.KeyWord;
lbl_key.Refresh();
};
this.Invoke(Invoker);
}
}
}

googleworker is a type of BackGroundWorker belonging to the Threads family. It starts a new instance of the search procedure according to the page search value to not affect the main thread of the program.

Conclusion

I hope to see more versions of this project.

Enjoy coding.

Don't forget to comment. :)

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here