Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Monitor GitHub Activity With Event, Project Description, and Language Word Clouds

0.00/5 (No votes)
21 Feb 2015 1  
Watch github events in more-or-less real time as we display a word cloud of events, project descriptions, and project languages.

Image 1

Introduction

Here we are with another fun word cloud applet.  In my previous article, I demonstrated receiving Twitter tweets and displaying the results in a word cloud.  That got me curious about what's going on on GitHub.  The above screenshot shows three word clouds:

  • Events

  • Project descriptions

  • Languages

Much of the code used in the Twitter word cloud article is re-used here, so I will only show the code relevant to this application.  If you're interested in how a Force Directed Graph was used to generate the word cloud, I suggest you read the previous article.

The Source Code

The source code can be obtained by cloning:

https://github.com/cliftonm/githubdashboard.git

Authenticating with GitHub

This was easier said than done, though the code itself ends up being very simple. 

The Easy Way

Ironically, I came across this information after figuring out the hard way and as I was just sitting down to write this article.  So, to do the easy way:

  1. follow these instructions

  2. put the generated token into the file "authorization.txt" as the third line (the first two lines can be blank) in the bin\Debug and/or bin\Release folders.

The Hard Way

If you want to things the hard way,

  1. create a GitHub application (you'll see that option on the same page as step #1 above)

  2. put your client ID and client secret tokens as the first two lines, respectively in the file "authorization.txt" in the bin\Debug and/or bin\Release folders.  Leave the third line blank.

Now, when you run the application, it will bring up a web browser for you to log in to your GitHub account and authenticate the application.

Behind the scenes, a dialog is displayed containing a WebBrowser control and the code wires up the Navigated event.  The code also takes you to GitHub's oath login page:

auth = new Authorize();
auth.Show();
auth.browser.Navigated += OnNavigated;
auth.browser.Navigate("https://github.com/login/oauth/authorize?scope=user:notifications&client_id=" + clientId);

Notice how here we use just the client ID of the applicaiton.

After you log in, GitHub will attempt to navigate to the URL provided in your application setting.  Here we employ trick #1, intercepting the navigation event:

/// <summary>
/// Once the user authorizes the application, we get a "code" back from GitHub
/// We use that code to obtain the access token.
/// </summary>
protected void OnNavigated(object sender, WebBrowserNavigatedEventArgs e)
{
  if (e.Url.Query.Contains("?code"))
  {
    authCode = e.Url.Query.RightOf("=");
    WebClient wc = new WebClient();
    accessToken = wc.DownloadString("https://github.com/login/oauth/access_token?client_id=" + 
       clientId + "&client_secret=" + secretId + "&code=" + authCode + "&accept=json").Between("=", "&");
    auth.Close();

    File.WriteAllLines("authorization.txt", new string[] { clientId, secretId, accessToken });

    StartQueryThread();
  }
}

This gives us a secret code which we can then use to acquire the the access token.  Notice how the above code uses both the application's client ID and the client secret as well as the authentication code.  From here on out, we can use the access token.

Querying GitHub

Querying GitHub is done in a worker task.  I could have used Task objects, async/await, but that all seemed overly complicated.  What we need is a continuous background process that queries the "events" API and coordinates updating the word cloud with the main application thread.  So we simply create a background thread:

protected void StartQueryThread()
{
  queryThread = new Thread(new ThreadStart(QueryGitHubThread));
  queryThread.IsBackground = true;
  queryThread.Start();
}

Using the Access Token

Without the access token, you are allowed to access the API only 60 times an hour.  With the access token, you can access the API 5000 times an hour.  By the way, to see how many accesses you have left, run this on the command line:

curl -i https://api.github.com/events?access_token=[your access token]

In the header, you will see two fields you can use to verify your access rate limit and remaining accesses:

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999

The Worker Thread

The worker thread ensures that we do not exceed this limit by restricting API calls to, at most, once a second (as there are 3600 seconds in an hour, we will be under the 5000 accesses per hour limit.)

protected void QueryGitHubThread()
{
  then = DateTime.Now;

  while (true)
  {
    ElapseOneSecond();
    string data = GetData("https://api.github.com/events");

    if (!String.IsNullOrEmpty(data))
    {
      ProcessEvents(data);
    }
  }
}

The method ElapsedOneSecond does the time check:

/// <summary>
/// To avoid exceeding the 5000 requests per hour limit, we ensure that we only 
/// make one request a second (3600 requests per hour)
/// </summary>
protected void ElapseOneSecond()
{
  int msToSleep = 1000 - (int)(DateTime.Now - then).TotalMilliseconds;
  then = DateTime.Now;

  // If there's any time remaining to sleep before our next query, do so now.
  if (msToSleep > 0)
  {
    Thread.Sleep(msToSleep);
  }
}

Acquiring the Event Data

Figuring out the "trick" here took several hours of digging, and I also left in the comments of proposed solutions that did not work, at least in this case.  The key piece of information is that the UserAgent property must be set.  It was quite frustrating that this was not clearly described somewhere in the GitHub documentation!

protected string GetData(string url)
{
  string ret = String.Empty;
  HttpWebRequest request = WebRequest.Create(url + "?access_token=" + accessToken) as HttpWebRequest;
  request.Method = "GET";
  // After 3 hours of googling and reading answers on SO, I found that this is necessary. Thank you Budda for posting that info.
  request.UserAgent = "Hello There"; 

  // Other answers I found regarding the server error response, but that did not solve the problem:

  // This is unnecessary:
  // request.Accept = "application/json; charset=utf-8";
  // request.KeepAlive = false;
  // request.ContentType = "application/json; charset=utf-8";
  // request.UseDefaultCredentials = true;

  // Also this, in app.config, was not necessary:
  //<system.net>
  // <settings>
  // <httpWebRequest useUnsafeHeaderParsing="true" />
  // </settings>
  //</system.net>

  try
  {
    using (WebResponse response = request.GetResponse())
    {
      using (StreamReader reader = new StreamReader(response.GetResponseStream()))
      {
        ret = reader.ReadToEnd();
      }
    }
  }
  catch(Exception ex)
  {
    Console.WriteLine(ex);
  }

  return ret;
}

Processing the JSON

Newtonsoft.Json comes to the rescue again, as we can pass in the event information and receive a dynamic object that we can use to extract a few key pieces.  We then also inspect the project page (again, making sure we stay within one second queries) for the project description and language:

/// <summary>
/// Process the event information. Here, we extract the ID so we don't process the same event multiple times.
/// We also get the repo URL and query the API for the repo's description and language, which are used to
/// populate the other two word clouds.
/// </summary>
/// <param name="html"></param>
protected void ProcessEvents(string html)
{
  dynamic events = JsonConvert.DeserializeObject<List<Object>>(html);

  foreach (dynamic ev in events)
  {
    string id = ev.id.ToString();

    if (!eventIdTypeMap.ContainsKey(id))
    {
      string eventType = ev.type.ToString();
      eventIdTypeMap[id] = eventType;

      string repoUrl = ev.repo.url.ToString();
      ElapseOneSecond(); // Again, don't overtax the API.
      string repoData = GetData(repoUrl);

      if (!String.IsNullOrEmpty(repoData))
      {
        dynamic repoInfo = JsonConvert.DeserializeObject(repoData);
        string description = repoInfo.description;
        string language = repoInfo.language;

        // Don't collide with the WinForm thread's Paint functions.
        // TODO: Could be optimized a bit to spend less time in the locked state.
        lock (this)
        {
          if (!String.IsNullOrEmpty(eventType)) ++totalEvents;
          AddOrUpdateNode(eventType, rootNodeEvents, eventsWordNodeMap, () => totalEvents);

          if (!String.IsNullOrEmpty(language)) ++totalLanguages;
          AddOrUpdateNode(language, rootNodeLanguages, languagesWordNodeMap, () => totalLanguages);

          if (!String.IsNullOrEmpty(description))
          {
            description.Split(' ').ForEach(w =>
            {
              if (!EliminateWord(w))
              {
                // We never show more than 100 description words.
                if (descriptionsWordNodeMap.Count > 100)
                {
                  RemoveAStaleWord(descriptionsWordNodeMap);
                }

                if (!String.IsNullOrEmpty(w)) ++totalDescriptionWords;
                AddOrUpdateNode(w, rootNodeDescriptions, descriptionsWordNodeMap, () => totalDescriptionWords);
              }
            });
          }
        }
      }
    }
  }
}

This drives the three word clouds, events, descriptions, and languages.

Figuring Out Node Color and Font Size

A fun little thing is how to figure out the node color and font size, as a percent of hit counts given all the counts for the particular word cloud:

public override void DrawNode(Graphics gr, RectangleF bounds)
{
  int percent = 100 * count / getTotalWords();

  Font font;
  int fontSize = Math.Min(8 + 16 * percent / 100, 24);

  if (!fontSizeMap.TryGetValue(fontSize, out font))
  {
    font = new Font(FontFamily.GenericSansSerif, fontSize);
    fontSizeMap[fontSize] = font;
  }

  if (count >= GitHubDashboard.Dashboard.CountThreshold)
  {
    // Create a color based on count, from 1 to a max of 24
    int red = 255 * percent / 100;
    int blue = 255 - red;
    Brush brush = new SolidBrush(Color.FromArgb(red, 0, blue));

    SizeF strSize = gr.MeasureString(text, font);
    PointF textCenter = PointF.Subtract(bounds.Location, 
                        new Size((int)strSize.Width / 2 - 5, (int)strSize.Height / 2 - 5));
    Region = Rectangle.FromLTRB((int)textCenter.X, 
             (int)textCenter.Y, 
             (int)(textCenter.X + strSize.Width), 
             (int)(textCenter.Y + strSize.Height));

    gr.DrawString(text, font, brush, textCenter);

    brush.Dispose();
  }
}  

Because C# doesn't natively support pointers, I can't pass into the node "here's a pointer to the total word counter for the collection that you belong to."  Instead, we can pass in to the construct a function that returns the count:

public TextNode(string text, PointF location, Func<int> getTotalWords)

and, if you noticed in the code that processes the event, we instantiating nodes with the function that returned the count appropriate for the node's collection, either:

() => totalEvents

or

() => totalLanguages

or

() => totalDescriptionWords

Conclusion

That's it!  It's quite interested to see the GitHub activity.  For the most part, there seems to be a very even distribution of languages, however (and strangely) I've noticed that in the evenings there is more work done on JavaScript, and there are also more fork events.  During the day:

Image 2

there are more push events and a lot more language diversity!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here