Download CPAM3.zip - 807KB (last updated 05/11/2011)
Introduction
This article describes an assembly I've developed to help me keep a watchful eye over my articles, tips, and reputation points. Since it's primary discovery mechanism involves the act of "scraping" the CodeProject web site for data, its day-to-day viability is subject to the whims of the CodeProject gods, and those whims they are a changin'.
Back in 2008, I wrote an article that described a program that performed essentially the same process - scraping the CodeProject web site in order that someone might retrieve the current vote status of their articles. That version of the code has been made significantly outdated by CodeProject's continuing evolution, and the code needed to scrape relevant data from the site is so significantly different that I chose to write another article instead of performing a massive edit on the earlier one. I also wanted to give people the opportunity to compare code between the two versions if they were so inclined.
CPAMLib - General Architecture
This library of code will allow you to scrape the articles, blogs, and tips, as well as keep track of your reputation scores (support for blogs, tips, and reputation points are the new items in this version of the code). The scraping code itself is multi-threaded and posts progress events (as you'll see in the sample app described in Part 2 of this article series).
In essence, this library maintains a collection of Articles and a collection of reputation objects, as well as tracking changes to the objects in those lists. Once scraped, the articles and reputations are persisted into a XML file. This way, you can track changes on application startup (if you wish to do so).
To make things easy on the WPF guys, all of the collections are derived from
ObservableCollection
Beyond that, however, there is no direct support for WPF, but that shouldn't prevent you from writing a WPF application that uses this library. In all actuality, I started to do it, but got more interested in seeing some results, so I kind of abandoned the WPF project. I'll talk more about that in Part 2.
Finally, I'm not going to say any of this stuff is really very clever, but it does work, and most of the time, that's really all that counts.
The Article Class
This class represents one of three items - an article, a tip/trip, or a blog entry. It's task is simply to contain the properties for those items, and determine if those properties values have changed since the prior scrape.
Initialization
Although it wasn't really necessary, I provided four overloads for the constructor. One accepts no parameters (and really shouldn't be used, so it's access level is private
, one accepts an XElement
parameter (for use when loading the properties from a XML data file), and the other three are identical except for the last parameter, which allows you to set the group the article object belongs to (article, blog, or tip). This parameter can be specified as the appropriate enum
, an integer representing the ordinal value of the enum
, or as a string that represents the enum
item's name.
private Article()
public Article(XElement value)
public Article(string title, string desc, string url, DateTime posted, DateTime updated,
int votes, int views, int bookmarks, decimal rating, decimal popularity,
int downloads, ItemGroup group)
public Article(string title, string desc, string url, DateTime posted, DateTime updated,
int votes, int views, int bookmarks, decimal rating, decimal popularity,
int downloads, int group)
public Article(string title, string desc, string url, DateTime posted, DateTime updated,
int votes, int views, int bookmarks, decimal rating, decimal popularity,
int downloads, string group)
Since all of the overloads do exactly the same thing, they all call the InitCommon
method, which sets properties using the specified parameters.
private void InitObject(string title, string desc, string url, DateTime posted, DateTime updated,
int votes, int views, int bookmarks, decimal rating, decimal popularity,
int downloads, ItemGroup group)
{
this.RecentChanges = new ChangesDictionary();
this.Title = title;
this.Description = desc;
this.Url = url;
this.DatePosted = posted;
this.LastUpdated = updated;
this.Votes = votes;
this.Views = views;
this.Bookmarks = bookmarks;
this.Rating = rating;
this.Popularity = popularity;
this.Group = group;
this.Downloads = downloads;
this.TimeUpdated = new DateTime(0);
}
Tracking Property Values
All trackable properties (rating, views, popularity, etc) have their current value as well as their prior value (if any) represented. This allows you to illustrate value changes in your application. Because these values are being tracked, we can also determine which of the articles in a given group have the highest rating or popularity, or for any of the other tracked properties. When the article is scraped, the values are updated with the ApplyChanges
method:
public void ApplyChanges(Article incoming, DateTime updateTime)
{
if (this.Url.ToLower() == incoming.Url.ToLower())
{
RecentChanges.AddUpdate(DataItem.Bookmarks, ChangedValue(this.Bookmarks, incoming.Bookmarks));
RecentChanges.AddUpdate(DataItem.Description, ChangedValue(this.Description, incoming.Description));
RecentChanges.AddUpdate(DataItem.Downloads, ChangedValue(this.Downloads, incoming.Downloads));
RecentChanges.AddUpdate(DataItem.LastUpdated, ChangedValue(this.LastUpdated, incoming.LastUpdated));
RecentChanges.AddUpdate(DataItem.Popularity, ChangedValue(this.Popularity, incoming.Popularity));
RecentChanges.AddUpdate(DataItem.Rating, ChangedValue(this.Rating, incoming.Rating));
RecentChanges.AddUpdate(DataItem.Votes, ChangedValue(this.Votes, incoming.Votes));
RecentChanges.AddUpdate(DataItem.Views, ChangedValue(this.Views, incoming.Views, this.ViewChangeThreshold));
RecentChanges.AddUpdate(DataItem.Title, ChangedValue(this.Title, incoming.Title));
this.Downloads = incoming.Downloads;
this.Bookmarks = incoming.Bookmarks;
this.LastUpdated = incoming.LastUpdated;
this.Popularity = incoming.Popularity;
this.Rating = incoming.Rating;
this.Votes = incoming.Votes;
this.Views = incoming.Views;
this.Title = incoming.Title;
this.Description = incoming.Description;
this.TimeUpdated = updateTime;
}
}
The RecentChanges
property represents a Dictionary
collection which allows us to maintain any trackable property that we might want to track. I used a Dictionary
because in its simplest form, the data was representable as a key/value pair (the property and its value).
Since I wanted to make the library usable by a WPF application, I used a class I found on the internet called ObservableDictionary
Please see the section below that talks about code I used that I didn't write.
The previous method makes a call to the ChangedValue
method for each tracked property so that it can be determined whether or not a value has changed, and in what direction. This info is used by the UI to determine what to display and how to display it after an update scrape.
private ChangeType ChangedValue(int currentValue, int newValue, int threshold)
{
ChangeType changeType = ChangeType.None;
int changeAmount = newValue - currentValue;
if (changeAmount >= threshold)
{
changeType = ChangeType.Up;
}
else if (changeAmount < 0 && Math.Abs(changeAmount) > threshold)
{
changeType = ChangeType.Down;
}
return changeType;
}
private ChangeType ChangedValue(int currentValue, int newValue)
{
return DetermineChangeType(currentValue.CompareTo(newValue));
}
private ChangeType ChangedValue(DateTime currentValue, DateTime newValue)
{
return DetermineChangeType(currentValue.CompareTo(newValue));
}
private ChangeType ChangedValue(decimal currentValue, decimal newValue)
{
return DetermineChangeType(currentValue.CompareTo(newValue));
}
private ChangeType ChangedValue(string currentValue, string newValue)
{
ChangeType changeType = ChangeType.None;
if (newValue != currentValue)
{
changeType = ChangeType.Changed;
}
return changeType;
}
private ChangeType DetermineChangeType(int compareResult)
{
ChangeType changeType = ChangeType.None;
switch (compareResult)
{
case -1 : changeType = ChangeType.Up; break;
case 0 : changeType = ChangeType.None; break;
case 1 : changeType = ChangeType.Down; break;
}
return changeType;
}
To support the UI, the following method determines if the specified item has a new value.
public bool GetFieldChanged(DataItem dataItem)
{
ChangeType changeType = ChangeType.None;
try
{
if (RecentChanges.ContainsKey(dataItem))
{
changeType = RecentChanges[dataItem];
}
else
{
RecentChanges.AddUpdate(dataItem, ChangeType.None);
changeType = ChangeType.None;
}
}
catch
{
throw new Exception (string.Format("DataItem {0} is invalid.", dataItem.ToString()));
}
return (changeType != ChangeType.None);
}
As you can see, the article object simply contains the data we'll be interested in from the UI point of view.
The ArticleManager Class
This class represents a list of article objects. It is responsible saving to and loading them from a file on a local disk. It is also responsible for scraping the info from CodeProject.
Initialization is a straightforward affair. First the constructor, which calls the discrete initialization methods:
public ArticleManager()
{
Clear();
InitReputationList();
InitAverages();
InitScraperURLs();
this.SaveScrapeResults = true;
this.ScrapePosted = true;
}
The initialization methods simply prepare the object for work. It's really not very interesting, but I did leave the method comments in the code block below so you can see what's what.
private void InitScraperURLs()
{
if (this.m_scraperURLs == null)
{
this.m_scraperURLs = new Dictionary<scrapertype,>();
this.m_scraperURLs.Add(ScraperType.MyArticles, "http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid={0}");
this.m_scraperURLs.Add(ScraperType.User, "http://www.codeproject.com/script/Membership/View.aspx?mid={0}");
}
}
private void InitAverages()
{
if (Averages == null)
{
Averages = new AveragesDictionary();
Averages.Add("ArticlesRating", 0M);
Averages.Add("ArticlesPopularity", 0M);
Averages.Add("TipsRating", 0M);
Averages.Add("TipsPopularity", 0M);
Averages.Add("BlogsRating", 0M);
Averages.Add("BlogsPopularity", 0M);
Averages.Add("OverallRating", 0M);
Averages.Add("OverallPopularity", 0M);
}
}
private void InitReputationList()
{
if (this.Reputations == null)
{
this.Reputations = new ReputationList();
}
}</scrapertype,>
The actual act of scraping CodeProject is run in a thread so it can be interrupted. To that end, I set up a thread method. I was having problems with CodeProject taking too long to respond and failing to retrieve info as a result, and this spurred me into implementing the automatic retries as well as
the HtmlAgilityPack code.
public void ScrapeWebPage(string userID)
{
this.m_maxConnectTries = (this.AutoRefresh) ? 10 : 3;
this.UserID = userID;
if (this.ScraperThread != null)
{
if (this.ScraperThread.ThreadState == System.Threading.ThreadState.Running)
{
try
{
this.ScraperThread.Abort();
}
catch (Exception)
{
}
}
this.ScraperThread = null;
}
this.ScraperThread = new Thread(new ThreadStart(ScrapeArticles));
this.ScraperThread.Start();
}
private void ScrapeArticles()
{
bool alreadyHasData = (this.Count > 0);
this.LastScrapeResult = ScrapeResult.Fail;
if (!ValidUserID())
{
return;
}
this.m_validUser = true;
GetUserInfo(true);
GetArticles();
RaiseEventThreadComplete();
}
public void Abort()
{
if (this.ScraperThread != null && this.ScraperThread.ThreadState == System.Threading.ThreadState.Running)
{
this.ScraperThread.Abort();
}
}
Parsing the data is a drawn out affair. Here's an overview of how it goes:
- The user's profile page is scraped to make sure the ID is valid. If it isn't, all further scraping is aborted.
- User info (reputation scores) is retrieved.
- Articles, blogs and tips are retrieved. Due to the nature of the page layout, all of this info is scraped at one time, but the info is parsed for each group separately.
- If necessary, the article original post dates are scraped from each individual article page. This is only done if this is the first time the article appears in the list. One item of interest is that Chris has offered to start including the posted dates on the My Articles page with all of the other info. I've modified the code to take advantage of this when/if it actually happens, and until then, the previously described method will continue to work.
- Article group averages are calculated (for use by the UI if desired).
- Article high values are calculated (for use by the UI if desired).
- During the processing, periodic progress events are posted so that the UI can update itself to reflect what's happening.
Due to the breadth of the code necessary to parse scraped info, I decided it would be best to refer you to the code itself.
The ReputationCategory Class
This class represents a reputation category, such as Author, Editor, etc. This item is responsible for determining it's status (platinum, gold, etc), and providing the appropriate display color (for the UI). It does this by maintaining status levels in a dictionary collection, and manipulating that
collection:
private void SetCurrentStatus()
{
if (this.StatusLevels == null)
{
BuildStatusLevels();
}
this.Status = (this.Points < StatusLevels[ReputationLevel.Platinum]) ? ReputationLevel.Gold : ReputationLevel.Platinum;
this.Status = (this.Points < StatusLevels[ReputationLevel.Gold]) ? ReputationLevel.Silver : this.Status;
this.Status = (this.Points < StatusLevels[ReputationLevel.Silver]) ? ReputationLevel.Bronze : this.Status;
this.Status = (this.Points < StatusLevels[ReputationLevel.Bronze]) ? ReputationLevel.None : this.Status;
}
private void BuildStatusLevels()
{
if (this.StatusLevels == null)
{
int categoryType = 1;
switch (this.Category)
{
case ReputationCategoryType.Author:
case ReputationCategoryType.Authority:
categoryType = 2;
break;
}
this.StatusLevels = new ReputationStatusLevels();
this.StatusLevels.Add(ReputationLevel.None, 0);
this.StatusLevels.Add(ReputationLevel.Bronze, (categoryType == 1) ? 100 : 50);
this.StatusLevels.Add(ReputationLevel.Silver, (categoryType == 1) ? 500 : 1000);
this.StatusLevels.Add(ReputationLevel.Gold, (categoryType == 1) ? 1500 : 5000);
this.StatusLevels.Add(ReputationLevel.Platinum, (categoryType == 1) ? 2500 : 10000);
}
}
public string GetStatusColorForBrowser()
{
string color;
switch (this.Status)
{
case ReputationLevel.Bronze : color = "#F4A460"; break;
case ReputationLevel.Silver : color = "#D3D3D3"; break;
case ReputationLevel.Gold : color = "#FFD700"; break;
case ReputationLevel.Platinum : color = "#ADD8E6"; break;
default : color = "#FFFFFF"; break;
}
return color;
}
The ReputationList Class
This class allows us toadd or update current values for each status item. In all actuality, this class is prettty boring and contains few methods that would bear any real scrutiny in this article, but in the ineterst of completeness, here's the code.
Here are the obligatory overloaded AddOrUpdate methods which actually add the reputation category to the collection.
public void AddOrUpdate(ReputationCategoryType categoryType, int points)
{
ReputationCategory item = Find(categoryType);
if (item == null)
{
item = new ReputationCategory(categoryType, points);
this.Add(item);
}
else
{
item.Points = points;
}
}
{
ReputationCategoryType categoryType = Globals.StringToEnum(element.GetValue("Name",
ReputationCategoryType.Unknown.ToString()),
ReputationCategoryType.Unknown);
int points = Convert.ToInt32(element.GetValue("Points", "0"));
AddOrUpdate(categoryType, points);
}
This method lets us find (and return) a category in the list
public ReputationCategory Find(ReputationCategoryType categoryType)
{
foreach (ReputationCategory item in this)
{
if (item.Category == categoryType)
{
return item;
}
}
return null;
}
And this method allows us to total the points for all categories.
public void SetTotalPoints()
{
int total = 0;
foreach(ReputationCategory category in this)
{
total += category.Points;
}
this.TotalPoints = total;
}
The ChangesDictionary Class
I added a method to this class that allows me to add new objects to, *or* update existing objects in the collection. Here's the method:
public void AddUpdate(DataItem dataItem, ChangeType changeType)
{
if (this.ContainsKey(dataItem))
{
this[dataItem] = changeType;
}
else
{
this.Add(dataItem, changeType);
}
}
Other Points of Interest
I hate typing, especially pointy brackets. For that reason, I almost always create a class derived from the appropriate collection type. This serves a double purpose of giving me a place to extend the collection to support the specific object type contained in the collection. This keeps me from having to cast my ass off (a pretty inefficient operation in .Net). Many times, there is
no need to add any functionality, but that's okay - I think it's much easier to read the code when you're referring to something like this:
ChangesDictionary m_changesDictionary;
rather than this:
Dictionary<string,> m_changesDictionary;</string,>
The ExtensionMethods class
I like extnsion methods - they let you extend any class that comes with the .Net framework. In my case, I extended the XElement and ObservableCollection classes. First, I wanted to make it simpler to get values from an XElement without having all that ugly code in the code that's closer to the programmer. Since it's kind of self-defeating to simply process an exception when
something doesn't happen as expected, I came up with this extension method for getting the value of the element and returning the specified default if the child element value doesn't exist. I have additional code (from another project) that overloads this method for various types other than a string object, but I didn't need them in this project, so they aren't included. If you want them, just ask.
public static string GetValue(this XElement root, string name, string defaultValue)
{
return (string)root.Elements(name).FirstOrDefault() ?? defaultValue;
}
As far as ObservableCollection
is concerned, I need to be able to sort it, but it had no functionality for sorting, so I had to improvise. A quick web serach resulted in the following code being found on the "SDN forums". I tried to find the code again, but I couldn't, so you'll just have to trust me that I didn't come up with the code, but I did find it elsewhere. One caveat is that the items in the collection must be inherited from IComparable.
public static void Sort<t>(this ObservableCollection<t> collection) where T : IComparable
{
List<t> sorted = collection.OrderBy(x => x).ToList();
for (int i = 0; i < sorted.Count(); i++)
{
collection.Move(collection.IndexOf(sorted[i]), i);
}
}
public static void Sort<t>(this ObservableCollection<t> collection, GenericComparer<t> comparer) where T : IComparable
{
List<t> sorted = collection.ToList();
sorted.Sort(comparer);
for (int i = 0; i < sorted.Count(); i++)
{
collection.Move(collection.IndexOf(sorted[i]), i);
}
}
</t></t></t></t></t></t></t>
Other Code Not of My Own Design
I'm lazy. I don't want to have to write any more code than necessary, so I often scour the web for code that might already have been implemented. If it workds for me, I use it. This project is no different, and here are the pieces I found somewhere else:
The HtmlAgilitypack Library
I earlier discussed the HtmlAgilityPack assembly and the bservableDictionary class. In the case of the HtmlAgilityPack, I only included the compiled DLL. It's file date is Feb 2010, so you might want to see if there's an update to it. The version I supply with my code works with my code, and I have no personal desire to make sure I have the latest and greatest.
As of 12/2010, you could find the latest code here:
HtmlAgilityPack on CodePlex
The GenericComparer Class
I honestly don't recall where I found this, but when I did a google search for "GenericComparer", I got back over 6500 results. If you want to find the nearest version to satisfy your curiosity, be my guest. I'm only including this text so that nobody can accuse me of claiming something as mine
that is not, in fact, my own invention.
The ObservableDictionary Classes
In a desire to be as compatible with WPF as I could possibly be, I searched for and found the ObservableDictionary class. I *think* it came from Dr. WPF's blog. Here's the URL to the applicable Blog entry: Dr. WPF -
Can I bind my ItemsControl to a dictionary?
My project contains a folder with this code in it, as well as a text file containing some observations and notes about the class. Keep in mind that this class generates warnings in VS2010 that I haven't bothered to address (once again, I remind you that I'm lazy, and since nothing bad appears to be happening as a result of the warnings, I've chosen to ignore them.
Final Comments About CPAMLib
This library is currently up-to-date (as of 10 December 2010) with all of the latest format and content changes made to the user profile page, artcles/tips/blogs list page, and the individual article pages themselves. I don't personally have any blog entries, so I admit to a gratuitous amount of laziness where testing this aspect of the scraping is concerned. According to theory, it *should* work, but I haven't extensively tested it in that area.
I also put in support for a change that I *think* the CP dudes are going to implement regarding the original posted date of articles, tips, and blogs. Even if they don't do it, the code will still work fine (and can still scrape the article posted date, albeit much slower).
The Sample Application
To keep things simple, I decided to go with a WinForms application. This allowed me to borrow heavily from the original application in terms of layout, controls, and content. Afterall, there's really no point in completely re-inventing the wheel, and besides, it's REALLY hard to imrpove on perfection (grin). In other words, the old version of this app was essentially a pig beggin' for some lipstick and rouge.
The program grabs data scraped from the CodeProject web site (via the CPAMLib assembly) and displays it in a WebBrowser control. It also keeps track of changes, allows you to sort the displayed data on the various properties, and auto-scrape the web site every hour. I don't personally let it run long
enough to auto-scrape, but someone else may want that feature, so there it is.
Visual Presentation
If you're familiar with the original CPAM application, this one should look familiar.
There are three primary areas of the window:
0) The Settings panel - this panel allows the user to select sorting options, what is/isn't displayed, and control the actual scraping of the data.
1) The Averages panel - This panel shows current scores of articles, tips, and logs, as well as average popularity of those groups of items.
2) The Main panel - this panel shows the list of articles, tips, and/or blogs, along appropriate status and statistic information for each individual item.
By default, the application is configured to show articles, blogs and tips, arranged in their individual groups, showing all of those items for the selected userID, and sorting them in descending order by rating. It is also initially configured to show the user's reputation scores.
Since all the truly dirty stuff is hidden inside the CPAM3Lib assembly, all we have to do is start the scrape process, and wait for it to finish. and then display the results.
Data Members
The following data members are needed. As you can see, there isn't a lot to manage:
#region Data Members
private BackgroundWorker m_refreshWorker = new BackgroundWorker();
private bool m_hasNavigateMsgHandler = false;
#endregion Data Members
#region Custom Events and Delegates
private enum ScrapeEvents { Progress, Fail, Complete, Start};
private event TimeToGoEventHandler TimeToGo = delegate {};
private delegate void DelegateUpdateForm(ScrapeEvents scrapeEvent, ScrapeEventArgs e);
private delegate void DelegateUpdateStatusStripResult();
private delegate void DelegateUpdateStatusStripProgress(ScrapeEventArgs e);
private delegate void DelegateUpdateStatusStripTimeToGo(TimeToGoArgs e);
#endregion Custom Events and Delegates
Initialization
The construct performs some necessary duties such as initializing the article manager object
(declared as a static data member of the globals class) reading historic data to the data file,
hooking into the article managers event pump, and initializing the controls on the form.
public Form1()
{
InitializeComponent();
Globals.CreateAppDataFolder("CPAM3");
Globals.Manager.AppDataPath = Globals.AppDataFolder;
Globals.Manager.LoadData();
this.comboBoxSortCategory.SelectedIndex = this.comboBoxSortCategory.FindStringExact("Rating");
InitListView();
InitRefreshWorker();
Globals.Manager.ScrapeComplete += new ScraperEventHandler(articleManager_ScrapeComplete);
Globals.Manager.ScrapeProgress += new ScraperEventHandler(articleManager_ScrapeProgress);
Globals.Manager.ScrapeFail += new ScraperEventHandler(articleManager_ScrapeFail);
this.TimeToGo += new TimeToGoEventHandler(Form1_TimeToGo);
InitFormControls();
}
private void InitFormControls()
{
this.textboxUserID.Text = CPAM3Browser.Settings.Default.UserID;
this.checkBoxShowArticles.Checked = CPAM3Browser.Settings.Default.ShowArticles;
this.checkBoxShowTips.Checked = CPAM3Browser.Settings.Default.ShowTips;
this.checkBoxShowBlogs.Checked = CPAM3Browser.Settings.Default.ShowBlogs;
this.checkBoxShowInGroups.Checked = CPAM3Browser.Settings.Default.ShowInGroups;
this.checkNewInfo.Checked = CPAM3Browser.Settings.Default.ShowChangesOnly;
this.checkShowIcons.Checked = CPAM3Browser.Settings.Default.ShowIcons;
this.checkShowIconLegend.Checked = CPAM3Browser.Settings.Default.ShowIconLegend;
this.checkShowReputation.Checked = CPAM3Browser.Settings.Default.ShowReputation;
this.checkboxSortDescending.Checked = CPAM3Browser.Settings.Default.SortDescending;
this.checkAutoRefresh.Checked = (string.IsNullOrEmpty(this.textboxUserID.Text))
? false
: CPAM3Browser.Settings.Default.AutoRefresh;
this.comboBoxSortCategory.SelectedIndex = CPAM3Browser.Settings.Default.SortColumn;
}
Automatic Refresh
You have the option of having the application automatically refresh the results ever 60 minutes. If this is turned on, a background worker object sits and spins, kicking off a refresh at the top of every hour. I've enabled all of the events for the background worker, but the app currently only handles the progress event.
private void InitRefreshWorker()
{
this.m_refreshWorker.WorkerReportsProgress = true;
this.m_refreshWorker.WorkerSupportsCancellation = true;
this.m_refreshWorker.RunWorkerCompleted += new RunWorkerCompletedEventHandler(refreshWorker_RunWorkerCompleted);
this.m_refreshWorker.ProgressChanged += new ProgressChangedEventHandler(refreshWorker_ProgressChanged);
this.m_refreshWorker.DoWork += new DoWorkEventHandler(refreshWorker_DoWork);
}
void refreshWorker_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
DateTime now = DateTime.Now;
DateTime nextTime = new DateTime(0);
int interval = 1000;
int updateMinutes = 60;
do
{
if (!worker.CancellationPending && now >= nextTime)
{
DelegateUpdateForm method = new DelegateUpdateForm(UpdateFormControls);
Invoke(method, ScrapeEvents.Start, new ScrapeEventArgs(""));
Globals.Manager.ScrapeWebPage(this.textboxUserID.Text);
if (nextTime.Ticks == 0)
{
TimeSpan span = new TimeSpan(0, updateMinutes - now.Minute, 0);
if (updateMinutes > 5)
{
nextTime = now.AddMinutes((span.Minutes < 5)
? updateMinutes + span.Minutes
: span.Minutes);
}
else
{
nextTime = now.AddMinutes(span.Minutes);
}
}
else
{
nextTime = now.AddMinutes(updateMinutes);
}
}
if (!worker.CancellationPending)
{
RaiseEventTimeToGo(nextTime);
Thread.Sleep(interval);
now = DateTime.Now;
}
} while (!worker.CancellationPending);
}
void refreshWorker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
}
void refreshWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
}
The WebBrowser Control
The meat and potatoes of the application is the WebBrowser control. I figured HTML was a simple way to show the data without too much fuss on my part. Boy, was I wrong. First, we have to initialize the control. We use the about:blank
URL to give the control someplace to browse to, and then we update the control.
private void InitListView()
{
this.webBrowser1.Navigate("about:blank");
HtmlDocument doc = this.webBrowser1.Document;
doc.Write(string.Empty);
if (Globals.Manager.Count > 0)
{
UpdateListView(true);
}
}
private void DisplayWebBrowser_Load(object sender, EventArgs e)
{
this.Anchor = AnchorStyles.Bottom |
AnchorStyles.Left |
AnchorStyles.Right |
AnchorStyles.Top;
}
public void UpdateListView(bool startingWithData)
{
if (!startingWithData)
{
this.Text = string.Format("(last update - {0}",
Globals.Manager.UpdateTime.ToString("yyyy/MM/dd at hh:mm"));
}
if (m_hasNavigateMsgHandler)
{
this.webBrowser1.Navigating -= new WebBrowserNavigatingEventHandler(webBrowser1_Navigating);
}
ShowArticles = this.checkBoxShowArticles.Checked;
ShowBlogs = this.checkBoxShowBlogs.Checked;
ShowTips = this.checkBoxShowTips.Checked;
ShowChanges = this.checkNewInfo.Checked;
ShowIcons = this.checkShowIcons.Checked;
ShowIconLegends = this.checkShowIconLegend.Checked;
ShowReputation = this.checkShowReputation.Checked;
bool foundGroup = false;
string html = "";
StringBuilder htmlAll = new StringBuilder("");
if (Globals.Manager.Count <= 0)
{
htmlAll.Append("<html><body style='font-family:arial;'>");
htmlAll.Append("No articles found. CodeProject might be temporarily unavailable.");
htmlAll.Append("</body></html>");
}
else
{
htmlAll.Append(BuildHtmlHeader());
htmlAll.Append(BuildReputationHtml());
if (ShowArticles)
{
html = BuildArticleHtml(ItemGroup.Articles, ref foundGroup);
if (foundGroup)
{
htmlAll.Append(html);
}
}
if (this.ShowBlogs)
{
html = BuildArticleHtml(ItemGroup.Blogs, ref foundGroup);
if (foundGroup)
{
htmlAll.Append(html);
}
}
if (this.ShowTips)
{
html = BuildArticleHtml(ItemGroup.Tips, ref foundGroup);
if (foundGroup)
{
htmlAll.Append(html);
}
}
htmlAll.Append(BuildHtmlFooter());
}
this.webBrowser1.DocumentText = htmlAll.ToString();
this.webBrowser1.Navigating += new WebBrowserNavigatingEventHandler(webBrowser1_Navigating);
}
In the interest of brevity, I'm not going to show the code that actually builds the HTML (you can see them being used in the code snippet above). The app is simply building output with the data contained in the article manager. I will mention though, that to keep memory use down to a dull roar, I used StringBuilder
objects to hold the HTML as it's being constructed.
What The Icons Mean
In the legend below, "article" should be taken to mean either an article, tip, or blog entry. All of the groups are individually analyzed, and therefore, each group has its own icons.
- Indicates a new article. All articles will display as new when you initially start the application.
- Indicates the article with the best rating.
- Indicates the article with the worst rating.
- Indicates the article with the most votes.
- Indicates the article with the most page views.
- Indicates the most popular article.
- Indicates the article with the most bookmarks.
- Indicates the article with the most downloads (not provided by CP yet)
- Indicates that the associated field increased in value.
- Indicates that the associated field decreased in value.
Features of the Sample Application
- Scrape status, time to next scrape event, and current scrape progress are all reported in the status bar at the bottom of the window.
- To keep the reported changes to something that I consider sane, the number of views that have to be made to report as "changed" to the statistics is 10. You can increase or decrease this value via the private
Article.ViewChangeThreshold
data member.
- Initially, the program displays article in order of rating in descending order. You can change this at any time by selecting a different property in the combo box at the top of the form.
- Articles that report changes are displayed with backgrounds that are shades of blue, as opposed to unchanged articles that are white/gray. Just so you can see the differences side-by-side, here's a screen shot:
History
- 11/13/2011 - Another site change that a) renamed some element IDs, and b) exposed a coding error involving cleaning the data before parsing it.
- 05/11/2011 - I fixed the titlebar (it was showing weird info), and fixed an issue where alternate tips were being counted in the tip/trick rating and popularity averages showing at the top of the form. Since they don't acquire votes, they're not supposed to be counted in the average values.
- 05/05/2011 - It seems that the averages show in the statistics box at the top right corner of the form was displaying questionable values. I modified the way the data source was being initialized, and jerked the averages back into their proper alignment.
- 05/04/2011 - Added support for sorting by number of downloads. I also added a new line to each group table that shows the total votes, views, bookmarks, and downloads for each group (articles, blogs, and tips).
- 03/27/2011 - Added support for number of downloads. It *should* adapt with no problems, but if things look strange, just delete C:\Program Data\CPAM3\*.xml, and rerun the program. It will create a new file, and you'll be good to go.
- 03/22/2011 - The fix I made yesterday was thwarted by a simple misspelling of the word "Organiser" (CP spells it wrong - it should be "Organizer" - but I have to conform to their usage if I want my code to work). So if you've already run the old new code once, you have to run this version twice before it fixes the display problem. The problem is that a new reputation points category called "Unknown" is displayed in the first table, and the Organizer points aren't updated, leading to an incorrect total points value. In any case, this fixes those problems, but remember, you have to run it twice before you see it completely fixed.
- 03/22/2011 - The user profile page was changed a little, and that broke this application. I had to completely change
ArticleManager.ParseReputationScores
to effect the necessary modifications. If you have any problems, let me know in the forum below.
- 12/29/2010 - I discovered that alternate tip/tricks do NOT accumulate a view count (yet), so I had to add a method that normalized empty strings to contain at least a valid numeric value of zero ("0"). This same fix also addressed the issue that was causing the posted dates on tips/tricks to always come up as 01/01/1990.
- 12/13/2010 - Solved a problem that was causing the application to completely fail after a site update was made. Also fixed the bug that caused problems for people that weren't in the US (thanks Petr!). Many thanks to Chris Maunder for helping out with the format stuff as well. I was really surprised that he'd gotten around to it so quickly.
- 12/12/2010 - The article list page format was changed, and I'm working out the issues with CP admins. In the meantime, the download for this article has been disabled until the problems are resolved.
- 12/11/2010 - Original submission.