Have you ever visited Wikipedia and simply just gotten lost in the sheer vastness of knowledge that is available there? If only something existed to allow you to easily create complex queries that would provide you with exactly what you needed using syntax that were familiar with (such as LINQ)? Well then this may be just the post for you!
Introducing LINQ-to-Wiki
LINQ-to-Wiki is a library designed by Petr Onderka to query any sites running MediaWiki (which includes Wikipedia) through any available .NET language. It provides extensive functionality to allow complex queries to be performed and is not limited to just reading wiki pages, but it can also perform edits, content additions and more. You can request a variety of different items that would otherwise normally require a significant amount of scrolling, clicking and result in the eventual “how did I get here” several hours later. All of this after losing focus on your original goal because of sheer magnitude and borderline addiction to knowledge the site can evoke.
A few of the many things related to Wikipedia content that can be accessed through queries in LINQ-to-Wiki are :
- Listing all of the articles within a category
- Listing all of the links contained within a page
- Grabbing images and related articles
- Full query and search support
LINQ-to-Wiki uses traditional LINQ queries that any .NET developer would be accustomed to and then the library translates these into API Requests through MediaWiki for whatever big plans that you are trying to conquer the world with.
Getting Started
LINQ-to-Wiki can be accessed in the following two methods :
Once you have added the appropriate references to the LINQ-to-Wiki files to your project, then you are ready to get started!
Your First Query
Querying is really where LINQ-to-Wiki shines (as you could imagine with the cosmos of data within Wikipedia)! The actual querying process is very straight-forward and really doesn’t differ much from using a traditional DataContext that you would be accustomed to working with in any other flavor of LINQ-to-X (SQL, Entities etc.).
You’ll first need to initialize a Wiki class that will act as your DataContext and the source of all of your queries. You can initialize it using actual Login information (if you plan on editing and performing more advanced actions) but in this demonstration we will just be focusing on querying, so feel free to make up your own credentials :
var wikipedia = new Wiki("Example");
Once you have created your necessary Wiki object, then you will basically be ready to start querying. However, Wikipedia is a huge, complex data-filled cosmos and before we start adventuring around in our LINQ-powered spaceship, let’s take a look at a map to see where we can go.
Exploring the Cosmos of Wikipedia
Before we delve to deep into some serious querying, let’s review over some of the properties and collections that we can use from our Wiki object. Since this post is primarily concerned with querying, we will be looking at the Query property of our Wiki object.
var query = wikipedia.Query.AdventurePlaceholder;
Some of the major properties that we will be concerned with regarding querying of our Query object are :
- allcategories – This is an enumeration of all of the available Categories
- allimages - This is an enumeration of all of the available Images
- alllinks - This is an enumeration of all of the available Links
- categorymembers – This lists all of the pages in a given category
- backlinks – This finds all pages that link back to a specific page.
- search – This allows a full-text search to be performed
From each of these we can use the LINQ methods that we all know and love such as .Where() and .Select() and then we wrap everything up to execute our query using the .AsEnumerable() method. Each of these items will also have specific properties that can be accessed within your inner clauses to further narrow your search, so don’t neglect how wonderful Intellisense can be.
Blasting off into the Cosmos (Finally!)
So let’s start out with a simple query to get ourselves off the launch pad. We will query Wikipedia for all of the images that start with “Microsoft” and return the title of each :
var query = wikipedia.Query.allimages().Where(i => i.prefix == "Microsoft").Select(s => s.title).ToEnumerable();
That’s it! Using a simple Controller Action within MVC (for this example) we can output each of our results to a basic list within our View :
public ActionResult QueryWiki()
{
var wikipedia = new Wiki("Example")
var query = wikipedia.Query.allimages().Where(i => i.prefix == "Microsoft").Select(s => s.title).ToEnumerable();
return View(query);
}
along with this simple View :
<ul>
@foreach (var image in Model){
<li>@image</li>
}
</ul>
will result in a huge (and very ugly) list of all of the images within Wikipedia that begin with “Microsoft”.
Query results containing all Wikipedia Images that begin with “Microsoft”
Let’s spice it up a bit (because just text is boring)
Let’s make things a little more appealing to the eyes by pulling some additional properties besides the title of the images. We can use the url, height and width properties available from our images to create a similar list that will feature images of each of these items instead of just a plain-jane unordered list.
First, we will create a very simple class that will store the properties that we are concerned about that we can pass across to the View for display :
public class WikiImage
{
public string Url { get; set; }
public int Height { get; set; }
public int Width { get; set; }
public WikiImage(string url, int height, int width)
{
Url = url;
Height = height;
Width = width;
}
}
Using our new and improved query (which will select the url, height and width properties from our image)
var query = wikipedia.Query.allimages()
.Where(i => i.prefix == "Microsoft")
.Select(s => new WikiImage(s.url,s.height,s.width)).ToList();
along with a few minor adjustments to the View (the controller action remains basically the same),
@foreach (var image in Model){
<img src='@image.Url' height='@image.Height' width='@image.Width' /><br />
}
gives us our result…
(err the result is too big to easily display full-size. I’ll adjust the height and width in the view to provide a better example)
*ahem* And gives us our result!
Results from our new query to grab all of the images that start with “Microsoft” on Wikipedia
Additional Complexity Coming Soon!
This post is a just a simple example of some of the things that you can do using LINQ-to-Wiki. Next time, we will be covering using some of the more advanced features such as using PageResults to create even more complex queries and pulling some additional data and who knows what else!
For More Information (if you just can’t wait to dig in)
If you are interested in learning a bit more about LINQ-to-Wiki, visit the github page where you can find a plethora of documentation detailing each of the individual methods and properties that you can query against. I would also highly recommend downloading the LINQ-to-Wiki Samples project, which contains all kinds of samples to get you started.
You can also download this example from github from the link below :