Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

MyStream: Social Lifestreaming with ASP.NET 4.0

0.00/5 (No votes)
11 Aug 2009 2  
Turn your current static website or blog into a lifestreaming portal with all your social activities with ASP.NET 4.0, C# 4.0, PLINQ, Task Parallel Library, Dependency Injection, and plug-in architecture.

Introduction

There are just way too many social networking sites out there to keep up and that's where lifestreaming services come in. Lifestream is a time ordered stream of activities, that functions as a diary of your electronic "social" life. In this Web 2.0 (or more?) era, you get to use Twitter, Facebook, Delicious, Flickr, write blog posts, keep your friends up to date, and you are curious about what your friends are up to, too. There are many social websites that will help you to aggregate all of your social activities into one place. FriendFeed is such and one of the most popular services that does lifestreaming for you. Those sites are still externally hosted that cannot replace your blog or current website, even though they may offer cooler features than your blog or site. Many may want to argue, but I strongly believe portals that host all your online (and social too) activities is going to be the next generation of personal sites, and will replace your blogs and current static pages.

MyStream is a fully customizable, themable, easy to configure, and Open Source portal framework built on top of ASP.NET 4, that you can host on your own server. You have got the full freedom to upgrade to new, more themes yet to come, and maintain your existing personal branding by customizing every little piece of it. The current codebase of this pluggable architecture can stream your blogposts by RSS/Atom, Delicious bookmarks, Flickr photostream, and Tweets. You can write your own social plug-in in minutes using this framework. This article is surely going to deliver some of the experiences I gained during the practical development on ASP.NET 4 and the new C# 4.0 features to those who are still wondering about, since it was made possible by Visual Studio 2010 Beta 1. Let us see a screenshot of what I have ended up with so far:

Figure 1

The right column has the list of social accounts you are currently subscribing to. Depending on the theme you choose, this list might appear some part else. Each activity shown in the left column has one or more links to visit. If you have blog posts Delicious bookmarked, you will see plain link entries to them. If you are subscribed to Flickr, you can see the photo embedded. However, of course for tweets, you will see all possible links appearing such as the person's profile who tweets, in this case the site owner, hashtags used, mentioned user names in this screenshot, which is Microsoft, and other natural URLs. Also notice the order of activities displayed. The stream is sorted in descending order.

Administration

Subscription to social accounts and global site settings can be modified over a password protected web administration interface. You can change site title, slogan, cache interval, theme, and add/delete subscriptions. And by the way, the default password is: password. One great feature is you can add as many subscriptions to your social accounts as you want. All of the social plug-ins as of now on MyStream do not require you to type password. You need to type in your user name only. MyStream will automatically discover the required stream such as Twitter, Flickr, and Delicious feeds and add to your page. That makes it possible to add public feeds, other people's Twitter, Flickr, etc., services to your own page! The administrative URL would be: http://localhost:9001/Admin/.

Figure 2

Although plug-ins are written in separate assemblies than the Web project for maintainability, easier and decoupled development, the Web UI is totally aware of what needs to be rendered. For example, if you choose one of the subscription types, you will be redirected to a page to configure the subscription. The screenshot below is a sample configuration space for Flickr. This Flickr specific UI is totally generated from the plug-in model, and validation results into plug-ins becoming independent components.

Figure 3

Adding Custom Pages and Controls

You can add custom pages with the current theme selected for the site. You will have to add the content page to DefaultPage.master and inherit from ThemedPageBase which ensures the page elements will be consistent with the theme selected. If you create such a content page, you will have a blank page with no content, yet the styles with it. Take a look at the following screenshot which shows the segments of the master page for you to add new controls to them. Social Accounts below is an User Control, you can remove it, or add new controls before/after.

Figure 4

If you would like to add a new menu item, you may want to have a look at the following menu creation in MVC style, which also ensures automatic selection of the current menu item based on the URL being rendered on the browser:

<ul>
    <%= Html.MenuItem("Home", "Default.aspx") %>
    <%= Html.MenuItem("About me", "About.aspx") %>
    <%= Html.MenuItem("Contact", "Contact.aspx") %>
</ul> 

Themes

You can add your own themes to the App_Themes folder. There is only one theme as of now. A theme consists of a stylesheet file and the necessary images. Whichever theme you choose from Administration, you get to see that reflected immediately after page refresh by ThemedPageBase. The diagram below elaborates how ThemedPageBase is being used:

Figure 5

Plug-in Architecture and Dependency Injection

Plug-ins or add-ons are sort of extensions that interact with the host application to provide specific functionalities on demand. Host applications can be full blown software such as Firefox, also can be programs. Applications support plug-ins for various reasons such as to enable third party developers to create capabilities to extend the application, to support features yet unforeseen. A host application is often independent from its plug-in components. Plug-ins are not totally extensions, because host applications can operate with the plug-ins, and extensions extend current behaviors of the host application.

Dependency Injection is a process of supplying external dependencies to an application. The dependant is an object that requires the dependencies' help to serve specific needs to the application. The container is a component of DI, that is capable of composing the dependant and resolving its dependencies, preparing them for ready to use state and injecting them where needed. It also manages the life-cycle of objects, instantiation, and disposal. I read somewhere online that "Dependency Injection" is a 25-dollar term for a 5-cent concept. So be not afraid if you are new to this space. Here is a popular analogy of DI from Wikipedia: We can think of a car as the dependent, the engine as the dependency, and a car factory as a provider. A car does not know how to install an engine on itself, but it needs an engine to run. The installation of an engine on a car is a car factory responsibility.

Since the objective of this article is not to explore DI rather to show how we use it to facilitate our plug-ins, for more details on Dependency Injection, you may want to read one of the classics of Martin Fowler, the father of DI: http://martinfowler.com/articles/injection.html.

Why Dependency Injection Comes to Play?

Dependency Injection is suitable for plug-in architectures because we can define the contract for the services a plug-in must provide and are required by the host application. An added advantage is it provides with more testability attributes due to its loosely coupled existence with the host application. DI facilitates a level of abstraction through a public interface and removes dependencies on plug-ins. Meaning that the plug-ins are tied together by the architecture instead of being tied together themselves. Duties of creating and linking themselves have been transferred from themselves to DI. The host operates independently of the plug-ins which makes it possible for third parties to add them more dynamically without accommodating changes, recompiling, and redeploying the host project. We kept all the plug-ins in MyStream.Plugins project. So whenever we need to add new or modify plug-ins, we need to recompile the MyStream.Plugins project and just copy the MyStream.Plugins.dll to the web project's bin folder. It's that easy!

Exploring MyStream.Plugins

There are many variations of DI and various frameworks, design patterns that facilitate this pattern such as Abstract Factory and Service Locator. There are also several DI frameworks for .NET available now such as Castle MicroKernel/Windsor, Autofac, and Unity. I have chosen Unity for this project just because it is part of the Microsoft Enterprise Library which I am more familiar with. Like in the explanation above, we have a contract that all plug-ins in our project should follow: IPlugin. It contains signatures of the essential methods that a plug-in should have such as Execute, which will return the social updates for that plug-in to the dependent, in our case MyStream.Business, which contains all the business code. There are also Subscribe, GetFriendlyName, GetIconPath, GetShortName, and GetTypeName methods. Subscribe is used to subscribe to the social account for the first time. This method fetches details about that social account, and eventually saves the subscription to the database so that social updates can be fetched later on based on these information. Take a look at the plug-ins hierarchy now:

Figure 6

They all implement IPlugin and expose public properties. For example, FlickrUserName in FlickrPlugin. From the class diagram, you will find PluginParameterAttribute which contains the metadata for a plug-in property.

[PluginParameter("flickr_username", parameterType: 
         PluginParameterType.Required, friendlyName: "Flickr Username")]
public string FlickrUserName { get; set; }

A user interface like Figure 3 was made possible by the use of this attribute. flickr_username is the unique ID for the text HTML element in the form, PluginParameterType.Required decides that the parameter is not optional, and friendlyName gives the label for the text field. These information are used to render the UI, and directs the JavaScript validation logic which Text fields are not optional. The following code block digs out the parameters used inside the plug-in:

public static List<PluginParameterAttribute> GetParameters(IPlugin plugin)
{
    var list = new List<PluginParameterAttribute>();
    var type = plugin.GetType();

    var properties = type.GetProperties();
    foreach (var property in properties)
    {
        object[] attrs = property.GetCustomAttributes(
                            typeof(PluginParameterAttribute), false);
        if (attrs.Count() > 0)
        {
            list.Add(((PluginParameterAttribute[])attrs).Single());
        }
    }

    return list;
}

The Unity Container

Now that the dependencies, the plug-ins are ready, the contract is also there. They are just waiting for the dependent to consume. Here, the container, in our case Unity, comes to play. First of all, we need to tell Unity to register the types that we need it to inject whenever needed. To perform this sort of task, Application_Start in Global.asax.cs is a good place to put in:

private static IUnityContainer _Container;
public static IUnityContainer Container
{
    get { return _Container; }
}

protected void Application_Start(object sender, EventArgs e)
{
    if (Container == null)
        _Container = new UnityContainer();

    MyStream.Business.BootstrapTasks.Bootstrapper.Run(_Container);
}

Among other Bootstrapper tasks, there is one that initializes the Unity container for us like this:

public static void Run(IUnityContainer container)
{
    _Container = container;

    // Repository Registration
    _Container.RegisterType<ISiteInfoRepository, SiteInfoRepository>()
        .RegisterType<IStreamDataRepository, StreamDataRepository>()
        .RegisterType<ISubscriptionsRepository, SubscriptionsRepository>()

    // Register plugins - this list will determine the order when shown as list 
        .RegisterType<IPlugin, RssPlugin>(RssPlugin.TYPE_NAME)
        .RegisterType<IPlugin, TwitterPlugin>(TwitterPlugin.TYPE_NAME)
        .RegisterType<IPlugin, DeliciousPlugin>(DeliciousPlugin.TYPE_NAME)
        .RegisterType<IPlugin, FlickrPlugin>(FlickrPlugin.TYPE_NAME);
}

This sets up the data repositories we are using as well as the plug-ins. There are two other ways to register types unlike registering them explicitly from code. One is by XML configuration, and the other one is by the container configuration API to provide custom configuration for the container. What the code above means is, "If I ask Container, give me an implementation of ISubscriptionRepository, the Container should create and give me an instance of SubscriptionRepository." For the plug-ins, it means slightly different since we identify each type by passing a TYPE_NAME constant. For plug-ins, we are saying "If we ask for an implementation of IPlugin by name Type.TYPE_NAME, give us an instance of that Type." For simplicity, we are resolving all instances of IPlugin that we registered in Run before, in the Plugins field in Facade:

Plugins = IoC.ResolveAll<iplugin>();

When we need one of them by name, we select one by using a LINQ query like the following:

public IPlugin LoadPlugin(string type)
{
    return Plugins.AsParallel().SingleOrDefault<IPlugin>(p => p.GetTypeName() == type);
}

If you are curious about how we use repositories with Unity, here is a sample Facade constructor for you:

public Facade() : 
    this(IoC.Resolve<ISiteInfoRepository>(), 
         IoC.Resolve<ISubscriptionsRepository>(), 
         IoC.Resolve<IStreamDataRepository>())
{
    if (CurrentSiteInfo == null)
    {
        ReloadCurrentSiteInfo();
    }

    Context = System.Web.HttpContext.Current;
}

Parallel Programming

Microsoft invested a lot of effort lately in .NET 4.0 to allow developers to easily and safely implement parallel programming concepts into applications. The reason is definitely massive upgrades in processor speed and increase in cores in recent years. Moore's law tells us CPU processor speed doubles every two years. It was the fact until multicore processors reached end users. We needed more speed, but the rate of growing transistor density already started to decrease. This drove processor makers to push multiple and multicore processors to the consumers. However, not all applications and compilers are capable of automatically scaling in multicore systems. The multicore approach works only when the software can make use of different cores at the same time.

The parallel programming goal, unlike multithreading, is to make sure an application takes advantage of multiple cores for improving computational performance, which will continue over time as more cores are added. Multithreading is nothing new in .NET, and it is not hard either. However, it is often difficult to synchronize access to shared resources, and it is time consuming and hard to find random crashes and freezing problems at times. Parallel programming in .NET offers the same multithreading capabilities and performance scaling with no hardware configuration specific code. In most cases, it automatically handles forking, merging, managing partitioning of work, and hence minimizes the user code to be written.

Amdahl's Law

In the case of parallelization, Amdahl's law states that if P is the proportion of a program that can be made parallel (i.e., benefit from parallelization), and (1 − P) is the proportion that cannot be parallelized (remains serial), then the maximum speedup that can be achieved by using N processors is:

Performance gain = 1 / ((1-P) + (P/N))

Meaning that, if P is 90%, (1-P) = 10%, and there are 2 cores, your program will run 1.81 times faster than its sequential implementation. The reason behind discussing Amdahl's law is to give you a sane idea about how much or less can be achieved from parallelization.

Parallel LINQ

Parallel LINQ is an implementation of the Standard Query Operators which uses parallel execution algorithms and techniques under the covers of the usual LINQ programming model. PLINQ can automatically choose the appropriate algorithm, and determine the degrees of parallelism possible. PLINQ supports all sorts of LINQ usage already in your application. You can query XML documents, arrays, List<T>, or any kind of IEnumerable<T> simply by concatenating the AsParallel() Extension Method. LINQ to Anything will still be executed by the respective query providers, but PLINQ is there for supporting query in-memory results fetched from them - sorting, selecting, joining, all can happen in parallel. Let us see an example of how easy it is:

list = (from item in feed.Element("channel").Elements("item").AsParallel()
        select new StreamItem
        {
            Title = XmlHelper.StripTags(
                     item.Element("title").Value, 160).ToTweet("embedded-anchor"),
            Url = item.Element("link").Value,
            Icon = _Subscription.Icon,
            Timestamp = Utilities.Rfc822DateTime.FromString(
                           (item.Element("pubDate").Value))
        }).ToList();

No wonder if you have a question, is it really feasible to run actions in parallel for 10 tweets retrieved from my account and opening up 10 threads for that? Are we spawning a thread for each item? Isn't it going to cost more than sequentially processing it? Let us try to find out the answer. Well, partitioning of data is, of course, a very tricky business. Generally, we can think of two fairly simple concepts for partitioning. One way, we could lock the data source and partition into chunks consisting of a constant number of items. Another way, we could lock the data source and partition into a number of cores/processors available in the system. If we have 2 million records, we could split into 1 million each for a 2 cores CPU. The latter is quite an efficient algorithm for simple queries and is known as range partitioning.

PLINQ automatically determines the type of partitioning during runtime. Range partitioning is a very common scheme in PLINQ, and mostly used for array or list interface implementations where the exact number of items is known. There is no way to determine at a given point whether IEnumerable<T> does not have any item unexplored. To work with such types of collections, which are non-indexable data sources, PLINQ uses chunk partitioning. This algorithm is the load balancer of the cores you have. It dynamically scales with the growing number of items in the collection. This algorithm ensures all the cores work at the same level. There are other algorithms that PLINQ uses we are not covering here, but the point that I tried to make is, PLINQ decides the appropriate degrees of parallelism in the query and the best algorithm for it.

Task Parallel Library

Visual Studio 2010 provides you with task parallelization through the Task Parallel Library. TPL makes it very easy to write code that can automatically distribute work among multiple processors/cores. Whereas PLINQ is focused only on data, TPL lets you focus on both data and task parallelism. There are many ways to split up your sequential code block into code parallelism. Most common use of this, is with an array or a collection of objects. I will show such an example here. Take a look at the block of code below. We are taking each item from the subscriptions list, and executing its respective plug-in to receive social updates from it.

public List<StreamItem> GetStreamItems()
{
    var subscriptions = GetAllSubscriptions();
    var list = new List<StreamItem>();
    subscriptions.ForEach(s =>
    {
        var plugin = GetPluginFromSubscription(s);
        var items = new List<StreamItem>();
        var cachedItems = Context.Cache[s.ID.ToString()];
        items = cachedItems != null ? (List<StreamItem>)cachedItems : plugin.Execute(s);
        if (items != null && items.Count > 0)
        {
            Context.Cache.Add(s.ID.ToString(), items.ToList(), null,
                DateTime.Now.AddSeconds(CurrentSiteInfo.CacheDuration),
                System.Web.Caching.Cache.NoSlidingExpiration,
                System.Web.Caching.CacheItemPriority.Normal, null);
            lock (_lock)
            {
                items.ForEach(item => list.Add(item));
            }
        }
    });

    return list.Where(i => i != null && i.Timestamp != null)
        .OrderByDescending(i => i.Timestamp).ToList();
}

Simply put by a diagram:

Figure 7

This sequential code block can be converted into code parallelism using Parallel.ForEach:

public List<StreamItem> GetStreamItems()
{
    var list = new List<StreamItem>();

    var subscriptions = GetAllSubscriptions();

    Parallel.ForEach(subscriptions, s =>
    {
        var plugin = GetPluginFromSubscription(s);
        var items = new List<StreamItem>();
        var cachedItems = Context.Cache[s.ID.ToString()];

        items = cachedItems != null ? (List<StreamItem>)cachedItems : plugin.Execute(s);

        if (items != null && items.Count > 0)
        {
            Context.Cache.Add(s.ID.ToString(), items.ToList(), null,
                DateTime.Now.AddSeconds(CurrentSiteInfo.CacheDuration),
                System.Web.Caching.Cache.NoSlidingExpiration,
                System.Web.Caching.CacheItemPriority.Normal, null);

            list.AddRange(items);
        }
    });

    return list.Where(i => i != null && i.Timestamp != null)
        .OrderByDescending(i => i.Timestamp).ToList();
}

It is the same code except for Parallel.ForEach. What happens now is, TPL determines the appropriate degrees of parallelization and the suitable algorithm depending on the size of the list, number of cores available, as well as the type of action being carried out for each item so that forking, executing, and merging would not cost more than its sequential counterpart. However, one inevitable error from the code above would be, even though TPL manages a lot of stuff for us, at runtime, it will complain about modifying the list collection while it was being accessed by some other threads. To prevent that error from happening, we can lock a dummy object to ensure it's not being modified by some other parties while changing its content by a current thread.

lock (_lock)
{
    list.AddRange(items);
}

To efficiently resolve this shared resource access problem, there is a new class named ConcurrentBag in the System.Collections.Concurrent namespace. This is a thread-safe collection that lets you add objects to it without worrying about race conditions. One necessary method is missing though, which is AddRange, so that we would not have to iterate through each item and add them to the collection one by one. There are other types of thread-safe collections in the System.Collections.Concurrent namespace such as ConcurrentDictionary. However, of course, there are many other classes and features in TPL we are not covering here.

public List<StreamItem> GetStreamItems()
{
    var list = new System.Collections.Concurrent.ConcurrentBag<StreamItem>();
    var subscriptions = GetAllSubscriptions();

    Parallel.ForEach(subscriptions, s =>
    {
        var plugin = GetPluginFromSubscription(s);
        var items = new List<StreamItem>();
        var cachedItems = Context.Cache[s.ID.ToString()];

        items = cachedItems != null ? (List<StreamItem>)cachedItems : plugin.Execute(s);

        if (items != null && items.Count > 0)
        {
            Context.Cache.Add(s.ID.ToString(), items.ToList(), null,
                DateTime.Now.AddSeconds(CurrentSiteInfo.CacheDuration),
                System.Web.Caching.Cache.NoSlidingExpiration,
                System.Web.Caching.CacheItemPriority.Normal, null);

            items.ForEach(item => list.Add(item));
        }
    });

    return list.Where(i => i != null && i.Timestamp != null)
        .OrderByDescending(i => i.Timestamp).ToList();
}

After parallelization magic:

Figure 8

When NOT to parallel

Even though the Task Parallel Library takes a lot of our headache away, it's always better to test, profile, and measure your code to ensure the effectiveness of using it. Overwhelming use of parallelism might make your program less effective than its sequential counterpart. After all, not all problems are the same. In some cases, you may not want to think about parallelism at all. See the following code:

public static void Run(IUnityContainer container)
{
    var list = new List<IBootstrapTasks>()
    {
        new SetupIoC(container),
        new EnsureInitialData()
    };

    list.AsParallel().ForAll(h => h.Run());
}

In this block of code, if EnsureInitialData gets picked up first by the scheduler, it will try to get access to the objects which have not been initialized at all. In this case, our IoC container has not been setup yet. Hence it will crash.

Named and Optional Parameters

These are long requested features of C# 4.0. They together give you the flexibility to omit parameters if you want, and ignore the position of the parameters when calling methods or delegates. Optional arguments require you to put them last in the list, with default values specified to use if omitted during the calls. However, default values should be constant. Remember, string.Empty is not constant at compile-time. Parameters with ref or out cannot be optional. One may ask, what about params? Well, this can, of course, occur after optional parameters, but cannot be treated as optional parameters, nor can we specify any default value. Meaning that if the caller omits it, it will receive an empty array. Let us see by example. The Extension Method listed below for string converts any tweet into a well-formatted HTML which keeps existing hyperlinks inside the tweet itself as well as renders hyperlinks for Tweeter, Mentions, and Hashtags. Note the optional parameter anchorCssClass which allows it to render a specific CSS class for the hyperlinks, if specified. You can simply omit this parameter when you call the Extension Method.

public static string ToTweet(this string s, string anchorCssClass = "")
{ 
    var status = HttpUtility.HtmlEncode(s);

    status = Regex.Replace(status, "[A-Za-z]+://[A-Za-z0-9-_]+." + 
                           "[A-Za-z0-9-_:%&?/.=]+", delegate(Match m)
    {
        return string.Format("<a class=\"{1}\" href=\"{0}\">{0}</a>", 
                             m.Value, anchorCssClass);
    }, RegexOptions.Compiled);

    status = Regex.Replace(status, "[@]+[A-Za-z0-9-_]+", delegate(Match m)
    {
        var user = m.Value.Replace("@", "");
        return string.Format("@<a class=\"{1}\" href=\"http://twitter.com/{0}\">{0}</a>", 
                             user, anchorCssClass);
    }, RegexOptions.Compiled);

    status = Regex.Replace(status, "[#]+[A-Za-z0-9-_]+", delegate(Match m)
    {
        var tag = m.Value.Replace("#", "");
        return string.Format("<a class=\"{1}\" href=\"http://search." + 
                             "twitter.com/search?q=%23{0}\">#{0}</a>", 
                             tag, anchorCssClass);
    }, RegexOptions.Compiled);

    var name = status.Substring(0, status.IndexOf(": "));
    return string.Format("<a class=\"{1}\" href=\"http://twitter.com/{0}\">{0}</a>{2}", 
                         name, anchorCssClass, status.Substring(name.Length));
}

You can put named parameters in any position like JSON style. You can also use positional and named arguments altogether. In that case, you have to maintain the order for positional ones, then you can have named ones. All required parameters should be specified during invocation by position or name. Let's look at an example:

public PluginParameterAttribute(string name = "", string friendlyName = "", 
       string description = "", 
       PluginParameterType parameterType = PluginParameterType.Optional)
{
    Name = name;
    FriendlyName = friendlyName;
    Description = description;
    ParameterType = parameterType;
}

The attribute defined above is being used by the FlickrUserName property in FlickrPlugin. Notice that the Name property is being set without name, so we keep it at first place, then we specify the other ones by name using colon (:).

[PluginParameter("flickr_username", 
   parameterType: PluginParameterType.Required, 
   friendlyName: "Flickr Username")]
public string FlickrUserName { get; set; }

Conclusion

It's understandable that hosting companies will take more time to ready their servers for .NET 4.0, since the framework itself is still on Beta 1 at the time of writing. I could not even find a hosting or less expensive VPS for myself to host this project, may be in future. I hope this article on developing a complete lifestream portal can be a helpful resource for enthusiasts. You are more than welcome to join the MyStream development team, add new plug-ins, improve and scale the architecture, fix bugs, and design attractive themes. I look forward to your feedback, contributions to the project, and being part of development of the next generation personal sites.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here