Introduction
This article was written purely out of curiosity really. I had a friend who built a pub crawl site (barcrawl) which uses a little web spider to spider into a mapping site and extract map co-ordinates. This is something I had not done before, so I decided to give this a try.
But first, I need to have a site that I wanted to get some data out of, so being totally into The Code Project, I decided to pick it. The next question I had was, what data do I want to fetch. Well, being the vain sort of chap, I'm sure we can all be, at times, I thought I'll write a web spider that fetches my articles summary area, similar to the area shown below, which can be found by navigating to any CodeProject user's article area using a URL like: http://www.codeproject.com/script/articles/list_articles.asp?userid=userId.
So that is what this article is trying to grab. In trying to do this, I feel this article shows how to grab useful information out of what is potentially a vast area of data.
So How is it All Done
There are a couple of steps involved with the VainWebSpider application, which are as follows:
- Get a user ID so that the full URL can be formed to fetch the articles for.
- Store this user ID within the Registry to allow the VainWebSpider application to know which user to fetch data for the next time it's run.
- Grab the entire web page based on the full URL, such as http://www.codeproject.com/script/articles/list_articles.asp?userid=569009.
- Store the web page in a suitable object, a string is fine.
- Split the string to grab only the area that is of interest, largely reducing the size of the string in memory. We are only interested in the article summary details; anything else, we don't care about.
- Use Regular Expressions to grab the data we want.
These 6 steps are the basis of this article.
Step 6 is really the most interesting as it allows us to pluck the data we want out of the junk. Let's have a look at an example, shall we?
Using Regular Expressions to Grab the Data
<hr size=1 noshade><a name='#Author0'></a><h2>Articles
by Sacha Barber (10 articles found)</h2><h4>Average article rating:
4.5</h4><h3>C# Algorithms</h3>
<p><a href='http://www.codeproject.com/cs/algorithms/#Evolutional'>Evolutional</a></p>
<div class=smallText style='width:600;margin-left:40px;margin-top:10px'>
<a href='http://www.codeproject.com/cs/algorithms/Genetic_Algorithm.asp'>
<b>AI - Simple Genetic
Algorithm (GA) to solve a card problem</b></a><div style='font-size:8pt;
color:#666666'>Last Updated: 8 Nov 2006 Page views: 7,164 Rating:
4.7/5 Votes: 17 Popularity: 5.8</div>
<div style='margin-top:3;font-size:8pt;'>A simple Genetic Algorithm (GA)
to solve a card problem.</div>
</div>
The actual web site content for an article is shown above. Let's say, we only want to grab the number of views for the article. How can this be done? Well, quite simply, actually. We just create a well formed Regular Expression such as:
private List<long> getViews()
{
string pattern = "Page views: [0-9,]*";
MatchCollection matches = Regex.Matches(this.webContent,
pattern, RegexOptions.ExplicitCapture);
List<long> lViews = new List<long>();
foreach (Match m in matches)
{
int idx = m.Value.LastIndexOf(":") + 2;
lViews.Add(long.Parse(m.Value.Substring(idx).Replace(",", "")));
}
return lViews;
}
This nifty little bit of code is enough to match all the Page views: XXXX entries in the web content. The matches
object would end up containing the view values, such as 7,164, for the example above. From here, it's easy; we simply repeat this for all the parts of the web content we are interested in.
We end up with Regular Expressions within the WebScreenScraper
class to grab the following details:
- Article Views
- Article Votes
- Article Popularity
- Article Ratings
- Article URLs
All this is done inside the WebScreenScraper
class. Once we have the results, they are simply made available as a standard ADO.NET DataTable
to allow the main interface (frmMain
) to show them in a nice manner.
Class Diagram
The VainWebSpider
class diagram is as follows:
Code Listings
The code to do this is basically as follows:
Program Class
This class holds various pop-ups and commonly used functions, as well as the Main
method:
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using Microsoft.Win32;
namespace VainWebSpider
{
#region Program CLASS
public static class Program
{
#region Instance fields
private static long userId;
#endregion
#region Public Methods/Properties
public static long UserID
{
get { return Program.userId; }
set
{
Program.userId = value;
Program.writeToRegistry(value);
}
}
public static void writeToRegistry(long userId)
{
try
{
RegistryKey hklm = Registry.LocalMachine;
RegistryKey hkSoftware =
hklm.OpenSubKey("Software", true);
RegistryKey hkVainWebSpider =
hkSoftware.CreateSubKey("VainWebSpider");
hkVainWebSpider.SetValue("userId", userId);
}
catch (Exception ex)
{
Program.ErrorBox(
"There was a problem creating " +
"the Registry key for VainWebSpider");
}
}
public static long readFromRegistry()
{
try
{
RegistryKey hklm = Registry.LocalMachine;
RegistryKey hkSoftware = hklm.OpenSubKey("Software");
RegistryKey hkVainWebSpider =
hkSoftware.OpenSubKey("VainWebSpider");
return long.Parse(hkVainWebSpider.GetValue("userId").ToString());
}
catch (Exception ex)
{
return -1;
}
}
public static string InputBox(string prompt,
string title, string defaultValue)
{
InputBoxDialog ib = new InputBoxDialog();
ib.FormPrompt = prompt;
ib.FormCaption = title;
ib.DefaultValue = defaultValue;
ib.ShowDialog();
string s = ib.InputResponse;
ib.Close();
return s;
}
public static void ErrorBox(string error)
{
MessageBox.Show(error,"Error",
MessageBoxButtons.OK,MessageBoxIcon.Error);
}
public static void InfoBox(string info)
{
MessageBox.Show(info, "Information",
MessageBoxButtons.OK, MessageBoxIcon.Information);
}
public static DialogResult YesNoBox(string query)
{
return MessageBox.Show(query,
"Confirmation", MessageBoxButtons.YesNo,
MessageBoxIcon.Question);
}
#endregion
#region MAIN THREAD
[STAThread]
static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
Application.Run(new frmLoader());
}
#endregion
}
#endregion
}
WebScreenScraper Class
This class does the hob of fetching and extracting the data from the relevant CodeProject web page:
using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Text.RegularExpressions;
using System.IO;
using System.Net;
using System.Windows.Forms;
namespace VainWebSpider
{
#region WebScreenScraper CLASS
public class WebScreenScraper
{
#region Instance Fields
private List<CPArticle> cpArticlesForUser;
private bool hasArticles;
private string authorName;
private long userId;
public string webContent;
public event EventHandler EndParse;
public event EventHandler StartParse;
#endregion
#region Constructor
public WebScreenScraper(long userId)
{
this.hasArticles = true;
this.cpArticlesForUser = new List<CPArticle>();
this.userId = userId;
}
#endregion
#region Public Properties / Methods
public void getInitialData()
{
this.OnStartParse(this, new EventArgs());
this.readInSiteContents();
this.getArticleSummaryArea();
}
public DataTable getWebData()
{
List<long> lViews = this.getViews();
List<string> lRatings = this.getRatings();
List<int> lVotes = this.getVotes();
List<float> lPopularity = this.getPopularity();
List<string> lURLS = this.getArticleURLS();
for (int i = 0; i < lViews.Count; i++)
{
this.cpArticlesForUser.Add(new CPArticle(
lViews[i],
lRatings[i],
lVotes[i],
lPopularity[i],
lURLS[i]));
}
this.OnEndParse(this, new EventArgs());
return this.createDataSet();
}
public bool HasArticles
{
get { return this.hasArticles; }
}
public int NoOfArticles
{
get { return this.cpArticlesForUser.Count; }
}
public string AuthorName
{
get { return this.authorName; }
}
#endregion
#region Events
public void OnEndParse(object sender, EventArgs e)
{
if (this.EndParse != null)
{
this.EndParse(this, e);
}
}
public void OnStartParse(object sender, EventArgs e)
{
if (this.StartParse != null)
{
this.StartParse(this, e);
}
}
#endregion
#region Private Methods
private DataTable createDataSet()
{
DataTable dt = new DataTable("CPArticles");
dt.Columns.Add("ArticleURL", Type.GetType("System.String"));
dt.Columns.Add("Views", Type.GetType("System.Int64"));
dt.Columns.Add("Ratings", Type.GetType("System.String"));
dt.Columns.Add("Votes", Type.GetType("System.Int32"));
dt.Columns.Add("Popularity", Type.GetType("System.Single"));
foreach (CPArticle cpa in this.cpArticlesForUser)
{
DataRow row = dt.NewRow();
row["ArticleURL"] = cpa.ArticleURL;
row["Views"] = cpa.Views;
row["Ratings"] = cpa.Ratings;
row["Votes"] = cpa.Votes;
row["Popularity"] = cpa.Popularity;
dt.Rows.Add(row);
}
return dt;
}
private void getArticleSummaryArea()
{
this.cpArticlesForUser.Clear();
if (this.webContent.Contains("(No articles found)"))
{
this.webContent = "";
this.hasArticles = false;
this.authorName = "";
}
else
{
int idx = this.webContent.IndexOf("<a name='#Author0'>", 0);
if (idx > 0)
{
this.webContent = this.webContent.Substring(idx);
this.hasArticles = true;
this.authorName = getAuthor();
}
else
{
this.webContent = "";
this.hasArticles = false;
this.authorName = "";
}
}
}
private string getAuthor()
{
string pattern = @"Articles by [a-z\sA-Z]*";
MatchCollection matches = Regex.Matches(this.webContent,
pattern, RegexOptions.ExplicitCapture);
List<string> author = new List<string>();
foreach (Match m in matches)
{
int idx = m.Value.LastIndexOf("by") + "by ".Length;
author.Add(m.Value.Substring(idx));
}
return author[0].Trim();
}
private List<string> getArticleURLS()
{
string pattern = "<a href='([-a-zA-Z_/#0-9]*).asp'>";
MatchCollection matches = Regex.Matches(this.webContent,
pattern, RegexOptions.ExplicitCapture);
List<string> urls = new List<string>();
foreach (Match m in matches)
{
urls.Add(m.Value.Replace("<a href='", "").Replace("'>", ""));
}
return urls;
}
private List<float> getPopularity()
{
string pattern = "Popularity: [0-9.]*";
MatchCollection matches = Regex.Matches(this.webContent,
pattern, RegexOptions.ExplicitCapture);
List<float> lPopularity = new List<float>();
foreach (Match m in matches)
{
int idx = m.Value.LastIndexOf(":") + 2;
lPopularity.Add(float.Parse(m.Value.Substring(idx)));
}
return lPopularity;
}
private List<string> getRatings()
{
string pattern = "Rating: [0-9./]*";
MatchCollection matches = Regex.Matches(this.webContent,
pattern, RegexOptions.ExplicitCapture);
List<string> lRatings = new List<string>();
foreach (Match m in matches)
{
int idx = m.Value.LastIndexOf(":") + 2;
lRatings.Add(m.Value.Substring(idx));
}
return lRatings;
}
private List<long> getViews()
{
string pattern = "Page views: [0-9,]*";
MatchCollection matches = Regex.Matches(this.webContent,
pattern, RegexOptions.ExplicitCapture);
List<long> lViews = new List<long>();
foreach (Match m in matches)
{
int idx = m.Value.LastIndexOf(":") + 2;
lViews.Add(long.Parse(m.Value.Substring(idx).Replace(",", "")));
}
return lViews;
}
private List<int> getVotes()
{
string pattern = "Votes: [0-9]*";
MatchCollection collection1 = Regex.Matches(this.webContent,
pattern, RegexOptions.ExplicitCapture);
List<int> lVotes = new List<int>();
foreach (Match m in collection1)
{
int num1 = m.Value.LastIndexOf(":") + 2;
lVotes.Add(int.Parse(m.Value.Substring(num1)));
}
return lVotes;
}
private void readInSiteContents()
{
WebClient wc = null;
Stream strm = null;
try
{
wc = new WebClient();
strm = wc.OpenRead(
"http://www.codeproject.com/script/articles/" +
"list_articles.asp?userid=" + this.userId);
using (StreamReader reader = new StreamReader(strm))
{
string line;
StringBuilder sBuilder = new StringBuilder();
while ((line = reader.ReadLine()) != null)
{
sBuilder.AppendLine(line);
}
this.webContent = sBuilder.ToString();
}
}
catch (Exception)
{
Program.ErrorBox(
"Could not access web site http://www.codeproject.com/script/" +
"articles/list_articles.asp?userid=" + this.userId);
}
finally
{
if (wc != null) { wc.Dispose(); }
if (strm != null) { strm.Close(); }
}
}
#endregion
}
#endregion
}
CPArticle Class
This class represents a CodeProject article:
using System;
using System.Collections.Generic;
using System.Text;
namespace VainWebSpider
{
#region CPArticle CLASS
public class CPArticle
{
#region Instance fields
private string articleURL;
private float popularity;
private string ratings;
private long views;
private int votes;
#endregion
#region Constructor
public CPArticle(long views, string ratings, int votes,
float popularity, string articleURL)
{
this.views = views;
this.ratings = ratings;
this.votes = votes;
this.popularity = popularity;
this.articleURL = articleURL;
}
#endregion
#region Public Properties
public long Views
{
get { return this.views; }
}
public string Ratings
{
get { return this.ratings; }
}
public int Votes
{
get { return this.votes; }
}
public float Popularity
{
get { return this.popularity; }
}
public string ArticleURL
{
get { return this.articleURL; }
}
#endregion
}
#endregion
}
frmLoader Class
This class is the the initial form shown (for the complete Designer listing, refer to the attached application).
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using Microsoft.Win32;
namespace VainWebSpider
{
#region frmLoader CLASS
public partial class frmLoader : Form
{
#region Contructor
public frmLoader()
{
InitializeComponent();
}
#endregion
#region Private Methods
private void lnkChangeUser_LinkClicked(object sender,
LinkLabelLinkClickedEventArgs e)
{
string stringEntered =
Program.InputBox("Enter a new user ID to examine",
"Enter a new user ID", "");
if (stringEntered.Equals(string.Empty))
{
Program.ErrorBox("You must enter a value for the userId");
}
else
{
try
{
long userId = long.Parse(stringEntered);
if (userId > 0)
{
Program.UserID = userId;
lblCurrentUser.Text =
"Currently set-up to fetch articles for user ID : " +
Program.UserID.ToString();
}
else
{
Program.ErrorBox("User ID must be a postive value");
}
}
catch(Exception ex)
{
Program.ErrorBox("The value you entered was not valid\r\n" +
"The user ID must be a number");
}
}
}
private void frmLoader_Load(object sender, EventArgs e)
{
long userId = Program.readFromRegistry();
Program.UserID = userId;
if (userId != -1)
{
lblCurrentUser.Text = "Currently set-up to fetch " +
"articles for user ID : " + userId.ToString();
}
else
{
lblCurrentUser.Text = "Not setup for any user as yet, " +
"use the link to pick a new user";
}
}
private void lnkLoadMainForm_LinkClicked(object sender,
LinkLabelLinkClickedEventArgs e)
{
frmMain fMain = new frmMain();
this.Hide();
fMain.ShowDialog(this);
}
#endregion
}
#endregion
}
frmMain Class
This class is the main interface (for the complete Designer listing, refer to the attached application).
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using Microsoft.Win32;
namespace VainWebSpider
{
#region frmMain CLASS
public partial class frmMain : Form
{
#region Instance Fields
private Boolean formShown = true;
#endregion
#region Contructor
public frmMain()
{
InitializeComponent();
}
#endregion
#region Private Methods
private void nfIcon_DoubleClick(object sender, EventArgs e)
{
if (formShown)
{
this.Hide();
formShown = false;
}
else
{
this.Show();
formShown = true;
}
}
private void showFormToolStripMenuItem_Click(object sender, EventArgs e)
{
this.Show();
}
private void hideFormToolStripMenuItem_Click(object sender, EventArgs e)
{
this.Hide();
}
private void exitToolStripMenuItem_Click(object sender, EventArgs e)
{
DialogResult dr = MessageBox.Show("Are you sure you want to quit.\r\n" +
"There may be client connected at present", "Exit",
MessageBoxButtons.YesNo, MessageBoxIcon.Question);
if (dr.Equals(DialogResult.Yes))
{
Application.Exit();
}
}
private void bgw_DoWork(object sender, DoWorkEventArgs e)
{
WebScreenScraper wss = new WebScreenScraper(Program.UserID);
wss.StartParse += new EventHandler(wss_StartParse);
wss.EndParse += new EventHandler(wss_EndParse);
wss.getInitialData();
if (this.InvokeRequired)
{
this.Invoke(new EventHandler(delegate
{
if (wss.HasArticles)
{
DataTable dt = wss.getWebData();
lblCurrentUser.Text = wss.AuthorName + " " +
wss.NoOfArticles + " articles available";
if (dt.Rows.Count > 0)
{
dgArticles.Columns.Clear();
dgArticles.DataSource = dt;
alterColumns();
resizeColumns();
dgArticles.Visible = true;
pnlResults.Visible = true;
this.Invalidate();
Application.DoEvents();
}
else
{
dgArticles.Visible = false;
pnlResults.Visible = false;
this.Invalidate();
Application.DoEvents();
}
}
else
{
pnlResults.Visible = false;
lblCurrentUser.Text = "Unknown Or Unpublished Author";
lblProgress.Visible = false;
prgBar.Visible = false;
dgArticles.Visible = false;
pnlResults.Visible = false;
pnlUser.Visible = true;
this.Invalidate();
Application.DoEvents();
Program.InfoBox(
"There are no CodeProject articles avaialble for user ("
+ Program.UserID + ")");
}
}));
}
}
private void alterColumns()
{
try
{
dgArticles.Columns.Remove("ArticleURL");
}
catch (Exception ex)
{
}
DataGridViewImageColumn imgs = new DataGridViewImageColumn();
imgs.Image = global::VaneWebSpider.FormResources.LinkIcon;
imgs.DisplayIndex = 0;
imgs.Width = 40;
dgArticles.Columns.Add(imgs);
DataGridViewLinkColumn links = new DataGridViewLinkColumn();
links.HeaderText = "ArticleURL";
links.DataPropertyName = "ArticleURL";
links.ActiveLinkColor = Color.Blue;
links.LinkBehavior = LinkBehavior.SystemDefault;
links.LinkColor = Color.Blue;
links.SortMode = DataGridViewColumnSortMode.Automatic;
links.TrackVisitedState = true;
links.VisitedLinkColor = Color.Blue;
links.DisplayIndex = 1;
links.Width = 300;
dgArticles.Columns.Add(links);
}
private void resizeColumns()
{
dgArticles.Columns[0].Width = 60; dgArticles.Columns[1].Width = 60; dgArticles.Columns[2].Width = 60; dgArticles.Columns[3].Width = 60; }
private void wss_EndParse(object sender, EventArgs e)
{
lblProgress.Visible = false;
prgBar.Visible = false;
pnlUser.Visible = true;
pnlGridMainFill.Visible = true;
this.Invalidate();
Application.DoEvents();
}
private void wss_StartParse(object sender, EventArgs e)
{
if (this.InvokeRequired)
{
this.Invoke(new EventHandler(delegate
{
lblProgress.Visible = true;
prgBar.Visible = true;
this.Invalidate();
Application.DoEvents();
}));
}
}
private void dgArticles_CellContentClick(object sender,
DataGridViewCellEventArgs e)
{
int LINK_COLUMN_INDEX = 5;
if (e.ColumnIndex == LINK_COLUMN_INDEX)
{
startProcess(@"http://www.codeproject.com" +
dgArticles[e.ColumnIndex, e.RowIndex].Value.ToString());
}
}
private void startProcess(string target)
{
if (null != target && (target.StartsWith("www") ||
target.StartsWith("http")))
{
try
{
System.Diagnostics.Process.Start(target);
}
catch (Exception ex)
{
Program.ErrorBox("Problem with starting process " + target);
}
}
}
private void frmMain_Load(object sender, EventArgs e)
{
pnlUser.Visible = false;
pnlGridMainFill.Visible = false;
BackgroundWorker bgw = new BackgroundWorker();
bgw.DoWork += new DoWorkEventHandler(bgw_DoWork);
bgw.RunWorkerAsync(Program.UserID);
}
private void lnkChangeUser_LinkClicked(object sender,
LinkLabelLinkClickedEventArgs e)
{
string stringEntered =
Program.InputBox("Enter a new user ID to examine",
"Enter a new user ID", "");
if (stringEntered.Equals(string.Empty))
{
Program.ErrorBox("You must enter a value for the userId");
}
else
{
try
{
long uId = long.Parse(stringEntered);
if (uId > 0)
{
Program.UserID = uId;
BackgroundWorker bgw = new BackgroundWorker();
bgw.DoWork += new DoWorkEventHandler(bgw_DoWork);
bgw.RunWorkerAsync(Program.UserID);
}
else
{
Program.ErrorBox("User ID must be a postive value");
}
}
catch(Exception ex)
{
Program.ErrorBox("The value you entered was not valid\r\n" +
"The user ID must be a number");
}
}
}
private void frmMain_FormClosed(object sender, FormClosedEventArgs e)
{
nfIcon.Visible = false;
Application.Exit();
}
private void lnkResults_LinkClicked(object sender,
LinkLabelLinkClickedEventArgs e)
{
frmPie fPie = new frmPie();
fPie.GridIsUse = dgArticles;
fPie.AuthorString = lblCurrentUser.Text;
this.Hide();
fPie.ShowDialog(this);
this.Show();
}
#endregion
}
#endregion
}
frmPie Class
This class is the the pie chart display window (for the complete Designer listing, refer to the attached application). This form makes use of a third party DLL, which is by Julijan Sribar, and is available right here at CodeProject at the following URL: pie library. Credit where credit is due. Thanks Julijan, great work.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using Microsoft.Win32;
using System.Drawing.PieChart;
namespace VainWebSpider
{
#region frmPie CLASS
public partial class frmPie : Form
{
#region Instance Fields
private DataGridView gridInUse;
private int CLUSTERED_THRESHOLD = 20;
#endregion
#region Contructor
public frmPie()
{
InitializeComponent();
}
#endregion
#region Public Properties
public DataGridView GridIsUse
{
set { gridInUse = value; }
}
public string AuthorString
{
set { lblCurrentUser.Text = value; }
}
#endregion
#region Private Methods
private void setupPie()
{
populatePieData();
pnlPie.Font = new Font("Arial", 8F);
pnlPie.ForeColor = SystemColors.WindowText;
pnlPie.EdgeColorType = EdgeColorType.DarkerThanSurface;
pnlPie.LeftMargin = 10F;
pnlPie.RightMargin = 10F;
pnlPie.TopMargin = 10F;
pnlPie.BottomMargin = 10F;
pnlPie.SliceRelativeHeight = 0.25F;
pnlPie.InitialAngle = -90F;
}
private void populatePieData()
{
switch (cmbViewData.SelectedItem.ToString().ToLower())
{
case "views" :
getGridData("views", 0);
break;
case "votes":
getGridData("votes", 2);
break;
case "popularity":
getGridData("popularity", 3);
break;
case "ratings":
getGridData("ratings", 1);
break;
default:
getGridData("views", 0);
break;
}
}
private void getGridData(string type, int column)
{
try
{
int qty = gridInUse.RowCount;
decimal[] results = new decimal[qty];
string[] pieToolTips = new string[qty];
string[] pieText = new string[qty];
float[] pieRelativeDisplacements = new float[qty];
Color[] pieColors = new Color[qty];
int alpha = 60;
Random rnd = new Random();
Color[] colorsAvailable = new Color[] { Color.FromArgb(alpha, Color.Red),
Color.FromArgb(alpha, Color.Green),
Color.FromArgb(alpha, Color.Yellow),
Color.FromArgb(alpha, Color.Blue),
Color.FromArgb(alpha, Color.CornflowerBlue),
Color.FromArgb(alpha, Color.Cyan),
Color.FromArgb(alpha, Color.DarkGreen),
Color.FromArgb(alpha, Color.PeachPuff),
Color.FromArgb(alpha, Color.Plum),
Color.FromArgb(alpha, Color.Peru) };
for (int i = 0; i < gridInUse.RowCount; i++)
{
pieToolTips[i] = "URL " + gridInUse[5, i].Value.ToString() + " " +
"Views " + gridInUse[0, i].Value.ToString() + " " +
"Rating " + gridInUse[1, i].Value.ToString() + " " +
"Votes " + gridInUse[2, i].Value.ToString() + " " +
"Popularity " + gridInUse[3, i].Value.ToString();
if (type.Equals("ratings"))
{
string val = gridInUse[column, i].Value.ToString();
int idx = val.LastIndexOf("/");
string sNewValue = val.Substring(0, idx);
results[i] = decimal.Parse(sNewValue);
}
else
{
results[i] = decimal.Parse(gridInUse[column, i].Value.ToString());
}
if (gridInUse.RowCount < CLUSTERED_THRESHOLD)
{
pieText[i] = gridInUse[column, i].Value.ToString();
}
else
{
pieText[i] = " ";
}
pieRelativeDisplacements[i] = 0.1F;
int idxColor = rnd.Next(0, colorsAvailable.Length - 1);
pieColors[i] = colorsAvailable[idxColor];
}
pnlPie.ToolTips = pieToolTips;
pnlPie.Texts = pieText;
pnlPie.SliceRelativeDisplacements = pieRelativeDisplacements;
pnlPie.Colors = pieColors;
pnlPie.Values = results;
}
catch (Exception ex)
{
}
}
private void frmPie_Load(object sender, EventArgs e)
{
cmbViewData.SelectedIndex = 1;
setupPie();
}
private void cmbViewData_SelectedValueChanged(object sender, EventArgs e)
{
setupPie();
}
#endregion
}
#endregion
}
InputBoxDialog Class
This class is a simple input box.
using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
namespace VainWebSpider
{
#region InputBoxDialog CLASS
public class InputBoxDialog : System.Windows.Forms.Form
{
#region Instance Fields
string formCaption = string.Empty;
string formPrompt = string.Empty;
string inputResponse = string.Empty;
string defaultValue = string.Empty;
private System.Windows.Forms.Label lblPrompt;
private System.Windows.Forms.Button btnOK;
private System.Windows.Forms.Button btnCancel;
private System.Windows.Forms.TextBox txtInput;
private System.ComponentModel.Container components = null;
#endregion
#region Constructor
public InputBoxDialog()
{
InitializeComponent();
}
#endregion
#region Windows Form Designer generated code
private void InitializeComponent()
{
this.lblPrompt = new System.Windows.Forms.Label();
this.btnOK = new System.Windows.Forms.Button();
this.btnCancel = new System.Windows.Forms.Button();
this.txtInput = new System.Windows.Forms.TextBox();
this.SuspendLayout();
this.lblPrompt.Anchor = (
(System.Windows.Forms.AnchorStyles)((((
System.Windows.Forms.AnchorStyles.Top |
System.Windows.Forms.AnchorStyles.Bottom)
| System.Windows.Forms.AnchorStyles.Left)
| System.Windows.Forms.AnchorStyles.Right)));
this.lblPrompt.BackColor = System.Drawing.SystemColors.Control;
this.lblPrompt.Font = new System.Drawing.Font("Microsoft Sans Serif",
8.25F, System.Drawing.FontStyle.Regular,
System.Drawing.GraphicsUnit.Point, ((byte)(0)));
this.lblPrompt.Location = new System.Drawing.Point(9, 35);
this.lblPrompt.Name = "lblPrompt";
this.lblPrompt.Size = new System.Drawing.Size(302, 22);
this.lblPrompt.TabIndex = 3;
this.btnOK.DialogResult = System.Windows.Forms.DialogResult.OK;
this.btnOK.Location = new System.Drawing.Point(265, 59);
this.btnOK.Name = "btnOK";
this.btnOK.Size = new System.Drawing.Size(60, 20);
this.btnOK.TabIndex = 1;
this.btnOK.Text = "Ok";
this.btnOK.Click += new System.EventHandler(this.btnOK_Click);
this.btnCancel.DialogResult = System.Windows.Forms.DialogResult.Cancel;
this.btnCancel.Location = new System.Drawing.Point(331, 59);
this.btnCancel.Name = "btnCancel";
this.btnCancel.Size = new System.Drawing.Size(60, 20);
this.btnCancel.TabIndex = 2;
this.btnCancel.Text = "Cancel";
this.btnCancel.Click += new System.EventHandler(this.btnCancel_Click);
this.txtInput.Location = new System.Drawing.Point(8, 59);
this.txtInput.MaxLength = 40;
this.txtInput.Name = "txtInput";
this.txtInput.Size = new System.Drawing.Size(251, 20);
this.txtInput.TabIndex = 0;
this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
this.ClientSize = new System.Drawing.Size(398, 103);
this.Controls.Add(this.txtInput);
this.Controls.Add(this.btnCancel);
this.Controls.Add(this.btnOK);
this.Controls.Add(this.lblPrompt);
this.FormBorderStyle = System.Windows.Forms.FormBorderStyle.FixedDialog;
this.KeyPreview = true;
this.MaximizeBox = false;
this.MinimizeBox = false;
this.Name = "InputBoxDialog";
this.StartPosition = System.Windows.Forms.FormStartPosition.CenterScreen;
this.Text = "InputBox";
this.KeyDown += new System.Windows.Forms.KeyEventHandler(
this.InputBoxDialog_KeyDown);
this.Load += new System.EventHandler(this.InputBox_Load);
this.ResumeLayout(false);
this.PerformLayout();
}
#region Dispose
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (components != null)
{
components.Dispose();
}
}
base.Dispose(disposing);
}
#endregion
#endregion
#region Public Properties
public string FormCaption
{
get { return formCaption; }
set { formCaption = value; }
}
public string FormPrompt
{
get { return formPrompt; }
set { formPrompt = value; }
}
public string InputResponse
{
get { return inputResponse; }
set { inputResponse = value; }
}
public string DefaultValue
{
get { return defaultValue; }
set { defaultValue = value; }
}
#endregion
#region Form and Control Events
private void InputBox_Load(object sender, System.EventArgs e)
{
this.txtInput.Text = defaultValue;
this.lblPrompt.Text = formPrompt;
this.Text = formCaption;
this.txtInput.SelectionStart = 0;
this.txtInput.SelectionLength = this.txtInput.Text.Length;
this.txtInput.Focus();
}
private void btnOK_Click(object sender, System.EventArgs e)
{
InputResponse = this.txtInput.Text;
this.Close();
}
private void btnCancel_Click(object sender, System.EventArgs e)
{
this.Close();
}
private void InputBoxDialog_KeyDown(object sender, KeyEventArgs e)
{
if (e.KeyCode == Keys.Enter)
{
InputResponse = this.txtInput.Text;
this.Close();
}
}
#endregion
}
#endregion
}
Demonstration Screenshots
The first screen that is shown is the form (frmLoader
) which looks as follows:
The VainWebSpider user can either select another CodeProject user to get articles about, or can simply proceed to the main interface using the links provided.
It can be seen in the screenshot above that there is already a user configured for the application. This user ID is stored within the Registry. Anytime a new user ID is picked, the Registry is updated.
The VainWebSpider key and its associated values are stored under HKEY_LOCAL_MACHINE\SOFTWARE\; a new folder "VainWebSpider" will be created. And the currently selected user ID will be stored in the VainWebSpider key. This allows the VainWebSpider application to know at start up which user it was using last time, or even if there was a user last time; if it's the first time the VainWebSpider application has been run, there won't be any Registry key or associated values. They will, of course, be created as soon as a user ID is selected.
The main interface (frmMain
) when loaded looks as shown above, where all the articles for the currently selected user are presented in a standard ADO.NET DataGridView
. The user may sort these entries using any of the column headers; they may also open the article by clicking on the hyperlink provided for each article.
The main interface (frmMain
) also provides a notiy icon within the system tray to allow the VainWebSpider user to hide/show the main interface (frmMain
) or exit the application entirely.
From the main interface (frmMain
), the VainWebSpider user may also choose to examine the web results, using pie charts (big thanks to Julijan Sribar for his great (award winning even) pie chart library which is available here), which I simply had to find a use for. The VainWebSpider user may choose what results are shown within the pie chart. The tooltips show all the web results as the user hovers over the pie chunks.
The VainWebSpider user may also select a new user from the main interface (frmMain
) using the "Select a different user" link provided. The application will grab the new entry via the use of the input box (inputboxdialog
). If the value entered is a positive integer, the relevant website is queried and the new data extracted. This is shown below for CodeProject user number 1, that Chris Maunder, the originator of CodeProject.
It can be seen that Chris Maunder has quite a few articles, 102 when this article was published. As such, the pie diagram does not include any text on the pie chunks. This is due to visual clarity issues that arise when dealing with a CodeProject user that has loads of articles. The pie simply becomes too cluttered.
What Do You Think?
That's it. I would just like to ask, if you liked the article, please vote for it.
Conclusion
I think this article shows just how easy it is to trawl what could potentially be a very, very large amount of data, to extract the data required. On a personal level, I am fairly happy with how it all works, and will probably use it, as it's quicker to use than me firing up Firefox and going to my articles, then examining them, and it also shows it in the nice pie charts (again, thanks to Julijan Sribar for his great (award winning even) pie chart library which is available here).
Bugs
None that I know of.
History
- v1.0: 22/12/06: Initial issue.