Lightning-Fast Nested User Groups in C#

Daniel Miller

5.00/5 (28 votes)

5 Dec 2015CPOL14 min read

43.6K

807

This article describes the C# implementation for a high-performance user/role security principal hierarchy.

Download demo source - 3.4 MB

Introduction

This is the first in a series of follow-ups to a previous article titled "Lightning-Fast Access Control Lists in C#". It described a solution for building and querying an access control list as part of an application security model. The solution was fast and lightweight, and it had no dependencies upon a specific database schema.

Many readers requested a follow-up article to propose a database schema suitable for supporting that solution. Before I propose a schema, there is some additional groundwork that needs be done. This will help to guarantee the schema does not impede a high-performance solution.

As most of you probably know, application security is a difficult problem, and solutions (both good and bad) often bear a striking resemblance to black magic. Software security models have the potential for enormously complex requirements, with priorities that often conflict with one another. For example, flexibility, robustness, fault-tolerance, performance, ease of development, ease of administration, ease of use... any (and all) of these can be important aspects.

A good security model has many different moving parts, and a solid access control list is just one of the primary parts. The purpose of this article is to describe a solution for another primary part that is closely related: managing a hierarchy of security principals.

Paired with an Access Control List (ACL), a user/role hierarchy (or Principal Model) needs to satisfy four basic functional requirements:

A permission is assigned to a principal.
A principal is an individual user or a group of users.
A user may be assigned to multiple groups.
Groups may be nested into a hierarchy with any number of levels.

(This article won't cover nested Operations or nested Resources; that topic will be discussed in a future article.)

In keeping with the theme of the previous article, here, we'll aim for an elegant solution that is also blisteringly fast.

Background

The code makes some foundational assumptions, which should be described before we get into the details.

Most important, it is assumed that every permission is an object with three properties (Principal, Operation, Resource) so it can be read as a declarative statement having this form:

[Principal] is granted permission to [Operation] on [Resource]

You can find definitions and examples for these three properties in the previous article.

The code in this article is focused exclusively on a model to manage and query users assigned to nested groups. The object we're concerned with here is a security Principal Model.

A Principal is understood to represent the identity of a specific user or group of users. Here, we will use the term User to identify a specific individual person, and we will use the term Role to identify a specific group of users. Both Users and Roles are Principals, so a skeletal class model might look like this:

class Principal { };
class User : Principal { };
class Role : Principal { };

Fair warning: I won't be doing that here. :-)

Instead, the focus here is on the overall data structure and algorithm design. At this stage, I am not concerned with Principal, User, or Role class definitions. Those can come later. Instead, I want to keep the discussion (and the initial code) as elementary as possible. The problem we are trying to solve is difficult enough on its own, and at this early stage, we don't want to get lost in secondary details. Therefore, the code here assumes the following deliberate simplifications:

A user is uniquely identified by a string value (i.e., email address).
A role is uniquely identified by a string value (i.e. name). No two roles share the same name, regardless of their positions in the hierarchy.
A role may have no parent role, or it may have one parent role. It cannot have multiple parents.
A role may have no users, or it may have multiple users.
A user may be assigned to any number of roles. (Users are not leaf-nodes in a tree. This is an important distinction.)

The logical model looks like this:

Path notation is a useful shorthand to identify nested roles, in the same way that path notation is a useful way to reference files in a file system or pages on a web site.

For Example...

The path "Springfield Residents/Simpson Family Members/Parents" implies the existence of three roles: Springfield Residents, Simpson Family Members, and Parents. It also defines relationships: Simpson Family Members are a subset of Springfield Residents, and Parents are a subset of Simpson Family Residents.

In this example, it is important to note the assumptions I have described disallow the existence of a role "Springfield Residents/Flanders Family Members/Parents", because the role named "Parents" cannot appear in multiple locations within the hierarchy. Roles names must be unique, and roles can have only one parent.

This constraint exists solely for the sake of simplicity, In a production system, you will most likely use something other than the name of a role to uniquely identify it, and in that case, you might choose to permit multiple roles with the same name (in different locations) differentiated by some other identifier. To illustrate:

What are Nested Roles?

Nested roles define subset (or containment) relationships in this model. If A contains B and B contains C, then A indirectly contains C. This can be visualized using a Venn diagram or a directed graph:

One More Constraint

There is one final constraint, again for the sake of simplicity: Role names should not include commas, slashes, colons, hyphens or bars. It's up to you if you want to enforce this. I like to troubleshoot role hierarchies by dumping them to plain text. I also like to test and debug the code with plain text input. These tasks are easier with some basic role-naming conventions to facilitate plain text.

For example, the string value "curious.george@yellowhathouse.com:Animals/Monkeys,Adventurers" can be used to declare the existence of a user named Curious George, and three roles named Animals, Monkeys, and Adventurers, where Monkeys are a subset of Animals, and the role of Adventurer is not related to Animals or Monkeys.

As another example, troubleshooting both code and data is easier when you can display a group hierarchy in plain text like this:

|-Root
| |-Alpha
| |-Beta
| \-Gamma
|   |-Delta
|   | \-Epsilon
|   |-Zeta
|   \-Eta
\-Theta
  \-Iota
    |-Kappa
    \-Lambda

That should be enough preamble... let's jump into the code...

Into the Code

What's the Goal?

Performance is still the highest priority with this solution. A close second is the requirement to support minimalist calls from client code. The API should be clean and compact. For example, when a principal model and an access control list are paired with one another, a permission check should look something like this:

if (principals.IsUserInRole("Alice","Explorers")) 
{ 
    // Proceed if Alice is assigned to the Explorers role - or to any role that is a superset of
    // (i.e. ancestor to) the Explorers role.
}

Here is another example, in which we want to obtain a collection of all the roles to which Alice is assigned:

var roles = principals.GetUserRoles("Alice");

This should include Alice herself (obviously), and the roles to which she is assigned directly, and the roles to which she is assigned indirectly - which could then be used to query an ACL:

if ( acl.IsGranted( principals.GetUserRoles("Alice"), "Drink", "Mysterious Potion" ) ) 
{
    // Proceed if Alice is granted permission to drink the mysterious potion. This permission 
    // might be granted directly to Alice, or it might be granted to a role that she is directly 
    // assigned to, or it might be granted to a role that contains one of the role to which she 
    // is assigned.
}

Creating and loading a principal model has to be simple as well, so less-experienced developers and administrators can re-use it. Ideally, the calls should look something like this:

// Invoke principals::AddRole( child, parent )
// All Harmless Lunatics are Mad Tea Party Attendees. Not not all attendees are lunatics.
principals.AddRole( "Harmless Lunatics", "Mad Tea Party Attendees" );

To maintain a loose coupling with the rest of the application (front-end and back-end), the code needs to include methods for loading a principal model from a DataTable or a CSV:

model.LoadRoles( "C:\Files\Roles.csv" )
model.LoadUsers( "C:\Files\Users.csv" )

In the end, I settled on two interfaces: one that is responsible for storing the data model, and one that is responsible for navigating and interrogating model. Towards the end of the article, I will explain the reason for this separation.

public interface IPrincipalModel
{
    GroupTree Groups { get; }
    MembershipList Roles { get; }
    MembershipList Users { get; }
    void AddGroup(string name);
    void AddGroup(string child, string parent);
    void AssignUserToRole(string user, string role);
}

public interface IPrincipalFinder
{
    MembershipResultSet FindAllRoles(string user, MembershipList users, IGroupTree groups);
}

Building the Model

GroupTree

First, the principal model needs to store the group hierarchy. This is the full set of roles, outside the context of any particular user's membership. This is our "master list" of groups and the parent/child relationships connecting them. For the sake of example, suppose we have the following roles defined:

Animals
Humans/Explorers
Harmless Lunatics/Mad Tea Party Attendees

Here is another plaint-text representation of the same hierarchy:

# Animals
# Humans
## Explorers
# Harmless Lunatics
## Mad Tea Party Attendees

We can visualize this in a diagram and add a few users:

Notice the entire set of groups is represented in the section titled "Roles", including those roles to which no users are assigned. Relationships between roles are also captured here. The diagram illustrates a number of facts about our principals:

Alice and Dora are both Explorers (directly) and Humans (indirectly).
Neither Alice nor Dora is a Harmless Lunatic.
Dora is not a Mad Tea Party Attendee or an Animal.
Mad Hatter and March Hare are both Harmless Lunatics (directly) and Mad Tea Party Attendees (indirectly).
All Harmless Lunatics are Mad Tea Party Attendees.
Not all Mad Tea Party Attendees are Harmless Lunatics.

Trees are the obvious choice for a data structure to represent a hierarchy of groups, and I have borrowed heavily from an excellent implementation in another article. (For reference, please see "A Generic Tree Collection".) My GroupTree class looks like this:

// The IGroupNode interface is a wrapper for the INode interface, exposing only the properties 
// and methods required by this solution. 
public interface IGroupNode
{
    INode<string> this[string group] { get; }
    void Add(string group);
    bool Contains(string group);
}

// The GroupTree class is a wrapper for the ITree interface, exposing only the properties and
// methods required by this solution. It is context-insensitive with respect to user membership.
public class GroupTree : IGroupNode
{
    // The internal representation for the hierarchy of groups.
    private readonly ITree<string> _tree;
    
    // The root of the tree.
    public INode<string> Root { get { return _tree.Root; } }
        
    // Enables indexing by group name.
    public INode<string> this[string group] { get { return _tree[group]; } }
    
    // Initializes the tree. Two groups are considered equal if they have the same name. Names
    // are sanitized by the StringComparer.Sanitize method.
    public GroupTree()
    {
        var comparer = new StringComparer();
        _tree = NodeTree<string>.NewTree(comparer);
    }
    
    // Adds a new group to the hierarchy. Duplicates are not permitted.
    public void Add(string group)
    {
        if (_tree.Root.Contains(group))
            throw new Exception("Duplicate Not Allowed: " + group);
        _tree.AddChild(group);
    }
    
    // Returns true if the hierarchy already contains the group.
    public bool Contains(string group)
    {
        return _tree.Contains(group);
    }
    
    // Adds a child group to a parent group.
    public void Add(string child, string parent)
    {
        string error;

        // A child cannot be its own parent.
        if (_tree.DataComparer.Equals(child, parent))
        {
            error = string.Format(
                "Self-Reference Not Allowed: {0} cannot be its own parent", child);
            throw new Exception(error);
        }

        // If the child is already in the tree, then the parent must also be in the tree,
        // and the child must be one of its descendents.
        if (Contains(child))
        {
            var childNode = _tree[child];
            if (!Contains(parent))
            {
                error = string.Format("Duplicate Not Allowed: {0} is already in the tree", child);
                throw new Exception(error);
            }
            var parentNode = _tree[parent];

            // If the child is already in the hierarchy under the parent then return quietly.
            if (parentNode.AllChildren.Nodes.Contains(childNode))
                return;
                
            // Throw an exception otherwise. A child can't be assigned to multiple locations in 
            // the hierarchy.
            error = string.Format(
                "Multiple Parents Not Allowed: {0} is already assigned to {1}",
                child,
                parent);
            throw new Exception(error);
        }
            
        // Otherwise, add the parent if it is not already in the tree, and then add the child.
        if (!Contains(parent))
            Add(parent);

        _tree[parent].AddChild(child);    
    }

MembershipList

Next, the security principal model needs a structure in which to store the list of roles to which a user is directly assigned. We need the ability to invoke a method like this:

list.AssignUserToRole("Marge","Painters")
list.AssignUserToRole("Lisa","Musicians")
list.AssignUserToRole("Bleeding Gums Murphy", "Musicians")

Essentially, these are declarative statements about membership. For example, the user named Marge is in the role named Painters. At the same time, the Painters role has Marge as a member. The Dictionary class is a perfect fit here. We can store a collection of roles for each user, and we can store a collection of users for each role. One class can satisfy both requirements. Here is a simplified version of the MembershipList class, which can answer basic questions about user/role membership:

public class MembershipList
{
    // Stores a collection of set members (e.g. roles per user, or users per role).
    private readonly Dictionary<string, commadelimitedstringcollection> _members;
    
    // Constructs an empty list.
    public MembershipList() 
    { 
        _members = new Dictionary<string, commadelimitedstringcollection>(); 
    }
        
    // Adds a member to a set.
    public void Add(string member, string set)
    {
        member = StringHelper.Sanitize(member);
        if (member == null)
            throw new ArgumentNullException();

        var value = GetMembers(set);
        if (!value.Contains(member))
            value.Add(member);
    }
    
    // Returns true if the list contains a collection with the specified name; false otherwise.
    public bool Contains(string set)
    {
        var key = StringHelper.Sanitize(set);
        return _members.ContainsKey(key);
    }
    
    // Returns true if any one of the members has been added to the set.
    public bool Exists(string[] members, string set)
    {
        var key = StringHelper.Sanitize(set);
        if (key == null || members == null)
            return false;

        var value = !_members.ContainsKey(key) ? null : _members[key];
        if (value != null)
        {
            foreach (var member in members)
            {
                var m = StringHelper.Sanitize(member);
                if (m != null && value.Contains(m))
                    return true;
            }
        }
        return false;
    }
    
    // Returns only those members already added to a set. Given a list of members, you might 
    // want to know which one(s) have been added for a specific name.
    public CommaDelimitedStringCollection FindExistingMembers(string set, string[] members)
    {
        var includedPrincipals = new CommaDelimitedStringCollection();
        var key = StringHelper.Sanitize(set);
        if (key == null || !_members.ContainsKey(key))
            return includedPrincipals;

        var value = _members[key];
        foreach (var member in members)
        {
            var m = StringHelper.Sanitize(member);
            if (value.Contains(m))
                includedPrincipals.Add(m);
        }

        return includedPrincipals;
    }
    
    // Returns a specific collection of set members. 
    public CommaDelimitedStringCollection GetMembers(string set)
    {
        var key = StringHelper.Sanitize(set);
        if (key == null)
            throw new ArgumentNullException();

        CommaDelimitedStringCollection value;
        if (!_members.ContainsKey(key))
        {
            value = new CommaDelimitedStringCollection();
            _members.Add(key, value);
        }
        else
        {
            value = _members[key];
        }
        return value;
    }
        
    // Removes a member from a set.
    public void Remove(string member, string set)
    {
        set = StringHelper.Sanitize(set);
        if (set == null)
            throw new ArgumentNullException();

        var value = GetMembers(member);
        if (value.Contains(set))
            value.Remove(set);
    }
}

PrincipalModel

Putting it all together, the code for the principal model itself is clean and compact. Here is a draft of the interface and the class:

public interface IPrincipalModel
{
    GroupTree Groups { get; }
    MembershipList Roles { get; }
    MembershipList Users { get; }
    void AddGroup(string name);
    void AddGroup(string child, string parent);
    void AssignUserToRole(string user, string role);
}
    
// The model contains a "master" hierarchy of groups, a list of roles per user, and a list of 
// users per role.
public class PrincipalModel : IPrincipalModel
{
    // This is the group hierarchy, independent of user/role context.
    private readonly GroupTree _groups;
        
    // Users per role.
    private readonly MembershipList _roles;
        
    // Roles per user.
    private readonly MembershipList _users;
        
    // Constructs an empty hierarchy and empty user/role membership lists.
    public PrincipalModel()
    {
        _roles = new MembershipList();
        _users = new MembershipList();
        _groups = new GroupTree();
    }

    public GroupTree Groups { get { return _groups; } }
    public MembershipList Roles { get { return _roles; } }
    public MembershipList Users { get { return _users; } }
        
    // Assigns a user to a role (and a role to a user).
    public void AssignUserToRole(string user, string role)
    {
        _roles.Add(user, role);
        _users.Add(role, user);
    }
        
    // Adds a group to the hierarchy.
    public void AddGroup(string group)
    {
        if (!_groups.Contains(group))
            _groups.Add(group);
    }
        
    // Adds a child group to a parent group.
    public void AddGroup(string child, string parent)
    {
        _groups.Add(child,parent);
    }
}

Querying the Model

In a typical application, the group hierarchy is relatively static and membership lists change frequently. This means you might want the ability to query the model with membership lists that exist outside the model.

For example, in a web application with 100,000 users, you might not want to store an instance of the entire principal model in memory. Instead, you might want to store the group hierarchy (only) in shared application state, and store the membership list for an individual user in the user's session state. In this scenario, you'll want queries on your access control list to look something like this:

// Find all of the principals to which the current user is assigned (directly or indirectly).
string[] principals = PrincipalFinder
    .FindAllRoles( CurrentUser.Name, CurrentUser.Roles, Global.PrincipalModel.GroupTree );

// Alternatively, you might be able to get away with this in some scenarios (but usually not).
// string[] principals = PrincipalFinder.FindAllRoles( CurrentUser.Name, 
//     Global.PrincipalModel.Roles, Global.PrincipalModel.GroupTree );

// In either case, use the array of principals to query the ACL.
if ( Global.AccessControlList.IsGranted( principals, operation, resource ) )
{
    // Proceed if the current user is granted permission to the operation on the resource.
}

Thinking back to our earlier example, what we want is the ability to invoke a method like this: FindAllRoles( "alice", "explorers, mad tea party attendees", AllGroups ), and have it return a value that looks like this: "alice, explorers, humans, mad tea party attendees".

This is where the PrincipalFinder class comes in, and this is the class in which the real black magic occurs. It contains only a small number of lines of code:

public class PrincipalFinder : IPrincipalFinder
{
    // Traverses the group tree to find all roles assigned to a user (directly or indirectly).
    public MembershipResultSet FindAllRoles(string user, MembershipList users, GroupTree groups)
    {
        // The user is always included in the result set.
        var result = new MembershipResultSet { { PrincipalSubtype.User, user, true, null } };

        // Check the membership list for a matching entry. If one is found then we know the 
        // user is assigned directly to one or more roles.
        if (users.Contains(user))
        {
            // Get all of the roles to which the user is assigned directly.
            var key = StringHelper.Sanitize(user);
            var roles = users.GetMembers(key);

            // Add each 'direct' role to the result set, and find all its descendents.
            foreach (var direct in roles)
            {
                result.Add(PrincipalSubtype.Role, direct, true, null);

                // Look for the roles contained by the 'direct' role. Our user is implicitly
                // assigned to these 'indirect' roles as well. As we work through this, store
                // some additional metadata to assist with troubleshooting questions (e.g. "Why
                // is user A considered a member of role X?"). This is especially helpful in
                // large deeply nested hierarchies.
                if (groups.Contains(direct))
                {
                    var node = groups[direct];
                    foreach (var child in node.AllChildren.Nodes)
                    {
                        var indirect = new MembershipResult
                        {
                            Subtype = PrincipalSubtype.Role,
                            Name = child.Data,
                            IsDirect = false,
                            Path = TreeHelper.CalculatePath(child)
                        };
                        result.Add(indirect);
                    }
                }
            }
        }
        return result;
    }
}

Performance Metrics

How fast is it, really? This was the first question asked by the first readers of my previous article, so I'll answer it right away.

The code can be fast. Very, very fast. But only if it is integrated correctly.

I created several integration test scenarios with a massive principal model. The model contained 100,000 distinct users assigned to 3,000 different roles, with 5 million relationships defined between users and roles (counting only direct role memberships). Then I ran a battery of 5,000 iterations on the model, invoking PrincipalFinder::FindAllRoles.

Scenario #1

If we assume an integration strategy in which the principal model is globally cached, and we never cache the results of a tree-traversal for a specific user's role memberships, then performance is poor. Here are the results:

Number of roles = 3,060
Number of users = 100,000
Number of direct user/role membership tuples = 5,000,000 (not counting implied/indirect memberships)
Number of test iterations = 5,000
Average response time per iteration = 173.77 ms
Best response time = 53.20 ms
Worst response time = 326.61 ms
Standard deviation = 43.38 ms
Theoretical throughput = 5.75 method calls per second

Scenario #2

We know that string comparison and string manipulation is expensive, so what happens if we rewrite the code to use integers (rather than strings) to uniquely identify users and roles? Performance is improved significantly, but it's still not fast enough to peel back anyone's hair:

Average response time per iteration = 3.22 ms
Best response time = 1.92 ms
Worst response time = 7.91 ms
Standard deviation = 0.53 ms
Theoretical throughput = 309 method calls per second

Scenario #3

What if we cache the result after we query the group tree for a user's role membership? We can traverse the group tree when a user is authenticated to determine the user's implied roles, and then cache the result in the user's session, so we perform only one tree-traversal per user session. Adopting this approach, the results are impressive even when we assume string identifiers and worst-case performance for every initial invocation:

Average response time per iteration = 0.0653 ms
Best response time = ~0 ms (depending upon cache speed)
Worst response time = 326.6146 ms
Standard deviation = 4.6186 ms
Theoretical throughput = 15,308 method calls per second

If we assume average response-time for the initial call per user, then throughput jumps to 28,773 calls per second.

And if we replace strings with integers to uniquely identify users and roles, then throughput jumps to more than 1.5 million calls per second. Now we're cooking with gas...

Conclusion

When the model is small, performance is excellent across the board, and the design is quite forgiving if you make a poor choice on integration. For example, when I repeated the first scenario using a model with 1000 users, 30 roles, and 5000 memberships, average response time jumped 192X from the baseline (from 173.77 ms to 0.90 ms per call).

However, when the model is large, loading and traversing the tree is slow. With a careful and considered integration strategy, the code in this article can provide a solution that is blazingly fast, with the added bonus that it does not require high-volume chatter between the application and its database back-end.

Integrating the Solution

Admittedly, this solution is largely theoretical. I have not yet integrated it into a production system. That said, I have plans to do so in the immediate short-term, and then I'm sure I will revisit this article to update this section.

At a high level, the integration strategy I have planned includes these elements:

Store the Access Control List in ApplicationState.
Store the Principal Model in ApplicationState.
Query the Principal Model for role membership when a user is authenticated.
Cache the user's role membership list (including implied/indirect roles) in the user's session.

The idea is to arrive at a solution that supports ACL method calls like this:

Global.AccessControlList.IsGranted( UserSession.Roles, operation, resource );

Improving the Solution

The overall solution is good, but not yet perfect, and there is always room for improvement. Here are a few ideas:

If the Roles in your system are static, then you can create an enumeration type to represent them, and replace the String data type on all references to role variables and parameters.
Create classes to represent users and roles with integer values as unique identifiers.
Define and describe a clean and compact SQL database schema for persistent storage.

If you have other ideas to improve the solution, then please let me know - I'd be interested to hear your feedback.

Points of Interest

Dictionary Redux

The MembershipList class is almost identical to the BaseControlItem class from my previous article. Indexed collections are fast and powerful data structures, and I suspect the Dictionary class might be vastly under-utilized. I don't see it frequently used in solutions I inherit from other developers, and in the past, I have not used it frequently myself. This is one thing that is certain to change in my solutions going forward!

GraphViz

At the beginning of this article, I indicated that users are not leaf nodes in a tree. The role hierarchy is a tree because each role has 0..1 parent roles. A user can be assigned to multiple roles, so the resulting data structure when you combine users and roles is called a directed graph.

GraphViz is an open source program for graph visualization. You can provide it with an input file that looks like this:

digraph principals {
    aliens;    
    animals;    
    mad_tea_party_attendees;    
    harmless_lunatics -> mad_tea_party_attendees;
    humans;
    explorers -> humans;
    dormouse -> animals; 
    dormouse -> mad_tea_party_attendees; 
    march_hare -> animals; 
    march_hare -> harmless_lunatics; 
    mad_hatter -> harmless_lunatics; 
    alice -> mad_tea_party_attendees; 
    alice -> explorers; dora -> explorers;
}

... and it will output a digraph image that looks like this:

In the code attached to the article, you can find a method (PrincipalAdapter::ToDot) that generates a DOT digraph statement to make a graph.

History

December 5, 2015 - First draft
December 7, 2015 - Fixed minor typos

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)