Introduction
Here we are with another fun word cloud applet. In my previous article, I demonstrated receiving Twitter tweets and displaying the results in a word cloud. That got me curious about what's going on on GitHub. The above screenshot shows three word clouds:
-
Events
-
Project descriptions
-
Languages
Much of the code used in the Twitter word cloud article is re-used here, so I will only show the code relevant to this application. If you're interested in how a Force Directed Graph was used to generate the word cloud, I suggest you read the previous article.
The Source Code
The source code can be obtained by cloning:
https:
Authenticating with GitHub
This was easier said than done, though the code itself ends up being very simple.
The Easy Way
Ironically, I came across this information after figuring out the hard way and as I was just sitting down to write this article. So, to do the easy way:
-
follow these instructions
-
put the generated token into the file "authorization.txt" as the third line (the first two lines can be blank) in the bin\Debug and/or bin\Release folders.
The Hard Way
If you want to things the hard way,
-
create a GitHub application (you'll see that option on the same page as step #1 above)
-
put your client ID and client secret tokens as the first two lines, respectively in the file "authorization.txt" in the bin\Debug and/or bin\Release folders. Leave the third line blank.
Now, when you run the application, it will bring up a web browser for you to log in to your GitHub account and authenticate the application.
Behind the scenes, a dialog is displayed containing a WebBrowser control and the code wires up the Navigated event. The code also takes you to GitHub's oath login page:
auth = new Authorize();
auth.Show();
auth.browser.Navigated += OnNavigated;
auth.browser.Navigate("https://github.com/login/oauth/authorize?scope=user:notifications&client_id=" + clientId);
Notice how here we use just the client ID of the applicaiton.
After you log in, GitHub will attempt to navigate to the URL provided in your application setting. Here we employ trick #1, intercepting the navigation event:
protected void OnNavigated(object sender, WebBrowserNavigatedEventArgs e)
{
if (e.Url.Query.Contains("?code"))
{
authCode = e.Url.Query.RightOf("=");
WebClient wc = new WebClient();
accessToken = wc.DownloadString("https://github.com/login/oauth/access_token?client_id=" +
clientId + "&client_secret=" + secretId + "&code=" + authCode + "&accept=json").Between("=", "&");
auth.Close();
File.WriteAllLines("authorization.txt", new string[] { clientId, secretId, accessToken });
StartQueryThread();
}
}
This gives us a secret code which we can then use to acquire the the access token. Notice how the above code uses both the application's client ID and the client secret as well as the authentication code. From here on out, we can use the access token.
Querying GitHub
Querying GitHub is done in a worker task. I could have used Task objects, async/await, but that all seemed overly complicated. What we need is a continuous background process that queries the "events" API and coordinates updating the word cloud with the main application thread. So we simply create a background thread:
protected void StartQueryThread()
{
queryThread = new Thread(new ThreadStart(QueryGitHubThread));
queryThread.IsBackground = true;
queryThread.Start();
}
Using the Access Token
Without the access token, you are allowed to access the API only 60 times an hour. With the access token, you can access the API 5000 times an hour. By the way, to see how many accesses you have left, run this on the command line:
curl -i https:
In the header, you will see two fields you can use to verify your access rate limit and remaining accesses:
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
The Worker Thread
The worker thread ensures that we do not exceed this limit by restricting API calls to, at most, once a second (as there are 3600 seconds in an hour, we will be under the 5000 accesses per hour limit.)
protected void QueryGitHubThread()
{
then = DateTime.Now;
while (true)
{
ElapseOneSecond();
string data = GetData("https://api.github.com/events");
if (!String.IsNullOrEmpty(data))
{
ProcessEvents(data);
}
}
}
The method ElapsedOneSecond
does the time check:
protected void ElapseOneSecond()
{
int msToSleep = 1000 - (int)(DateTime.Now - then).TotalMilliseconds;
then = DateTime.Now;
if (msToSleep > 0)
{
Thread.Sleep(msToSleep);
}
}
Acquiring the Event Data
Figuring out the "trick" here took several hours of digging, and I also left in the comments of proposed solutions that did not work, at least in this case. The key piece of information is that the UserAgent
property must be set. It was quite frustrating that this was not clearly described somewhere in the GitHub documentation!
protected string GetData(string url)
{
string ret = String.Empty;
HttpWebRequest request = WebRequest.Create(url + "?access_token=" + accessToken) as HttpWebRequest;
request.Method = "GET";
request.UserAgent = "Hello There";
try
{
using (WebResponse response = request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
ret = reader.ReadToEnd();
}
}
}
catch(Exception ex)
{
Console.WriteLine(ex);
}
return ret;
}
Processing the JSON
Newtonsoft.Json comes to the rescue again, as we can pass in the event information and receive a dynamic
object that we can use to extract a few key pieces. We then also inspect the project page (again, making sure we stay within one second queries) for the project description and language:
protected void ProcessEvents(string html)
{
dynamic events = JsonConvert.DeserializeObject<List<Object>>(html);
foreach (dynamic ev in events)
{
string id = ev.id.ToString();
if (!eventIdTypeMap.ContainsKey(id))
{
string eventType = ev.type.ToString();
eventIdTypeMap[id] = eventType;
string repoUrl = ev.repo.url.ToString();
ElapseOneSecond();
string repoData = GetData(repoUrl);
if (!String.IsNullOrEmpty(repoData))
{
dynamic repoInfo = JsonConvert.DeserializeObject(repoData);
string description = repoInfo.description;
string language = repoInfo.language;
lock (this)
{
if (!String.IsNullOrEmpty(eventType)) ++totalEvents;
AddOrUpdateNode(eventType, rootNodeEvents, eventsWordNodeMap, () => totalEvents);
if (!String.IsNullOrEmpty(language)) ++totalLanguages;
AddOrUpdateNode(language, rootNodeLanguages, languagesWordNodeMap, () => totalLanguages);
if (!String.IsNullOrEmpty(description))
{
description.Split(' ').ForEach(w =>
{
if (!EliminateWord(w))
{
if (descriptionsWordNodeMap.Count > 100)
{
RemoveAStaleWord(descriptionsWordNodeMap);
}
if (!String.IsNullOrEmpty(w)) ++totalDescriptionWords;
AddOrUpdateNode(w, rootNodeDescriptions, descriptionsWordNodeMap, () => totalDescriptionWords);
}
});
}
}
}
}
}
}
This drives the three word clouds, events, descriptions, and languages.
Figuring Out Node Color and Font Size
A fun little thing is how to figure out the node color and font size, as a percent of hit counts given all the counts for the particular word cloud:
public override void DrawNode(Graphics gr, RectangleF bounds)
{
int percent = 100 * count / getTotalWords();
Font font;
int fontSize = Math.Min(8 + 16 * percent / 100, 24);
if (!fontSizeMap.TryGetValue(fontSize, out font))
{
font = new Font(FontFamily.GenericSansSerif, fontSize);
fontSizeMap[fontSize] = font;
}
if (count >= GitHubDashboard.Dashboard.CountThreshold)
{
int red = 255 * percent / 100;
int blue = 255 - red;
Brush brush = new SolidBrush(Color.FromArgb(red, 0, blue));
SizeF strSize = gr.MeasureString(text, font);
PointF textCenter = PointF.Subtract(bounds.Location,
new Size((int)strSize.Width / 2 - 5, (int)strSize.Height / 2 - 5));
Region = Rectangle.FromLTRB((int)textCenter.X,
(int)textCenter.Y,
(int)(textCenter.X + strSize.Width),
(int)(textCenter.Y + strSize.Height));
gr.DrawString(text, font, brush, textCenter);
brush.Dispose();
}
}
Because C# doesn't natively support pointers, I can't pass into the node "here's a pointer to the total word counter for the collection that you belong to." Instead, we can pass in to the construct a function that returns the count:
public TextNode(string text, PointF location, Func<int> getTotalWords)
and, if you noticed in the code that processes the event, we instantiating nodes with the function that returned the count appropriate for the node's collection, either:
() => totalEvents
or
() => totalLanguages
or
() => totalDescriptionWords
Conclusion
That's it! It's quite interested to see the GitHub activity. For the most part, there seems to be a very even distribution of languages, however (and strangely) I've noticed that in the evenings there is more work done on JavaScript, and there are also more fork events. During the day:
there are more push events and a lot more language diversity!