Introduction
This is the first in a series of follow-ups to a previous article titled "Lightning-Fast Access Control Lists in C#". It described a solution for building and querying an access control list as part of an application security model. The solution was fast and lightweight, and it had no dependencies upon a specific database schema.
Many readers requested a follow-up article to propose a database schema suitable for supporting that solution. Before I propose a schema, there is some additional groundwork that needs be done. This will help to guarantee the schema does not impede a high-performance solution.
As most of you probably know, application security is a difficult problem, and solutions (both good and bad) often bear a striking resemblance to black magic. Software security models have the potential for enormously complex requirements, with priorities that often conflict with one another. For example, flexibility, robustness, fault-tolerance, performance, ease of development, ease of administration, ease of use... any (and all) of these can be important aspects.
A good security model has many different moving parts, and a solid access control list is just one of the primary parts. The purpose of this article is to describe a solution for another primary part that is closely related: managing a hierarchy of security principals.
Paired with an Access Control List (ACL), a user/role hierarchy (or Principal Model) needs to satisfy four basic functional requirements:
- A permission is assigned to a principal.
- A principal is an individual user or a group of users.
- A user may be assigned to multiple groups.
- Groups may be nested into a hierarchy with any number of levels.
(This article won't cover nested Operations or nested Resources; that topic will be discussed in a future article.)
In keeping with the theme of the previous article, here, we'll aim for an elegant solution that is also blisteringly fast.
Background
The code makes some foundational assumptions, which should be described before we get into the details.
Most important, it is assumed that every permission is an object with three properties (Principal
, Operation
, Resource
) so it can be read as a declarative statement having this form:
[Principal] is granted permission to [Operation] on [Resource]
You can find definitions and examples for these three properties in the previous article.
The code in this article is focused exclusively on a model to manage and query users assigned to nested groups. The object we're concerned with here is a security Principal Model.
A Principal
is understood to represent the identity of a specific user or group of users. Here, we will use the term User
to identify a specific individual person, and we will use the term Role
to identify a specific group of users. Both User
s and Role
s are Principal
s, so a skeletal class model might look like this:
class Principal { };
class User : Principal { };
class Role : Principal { };
Fair warning: I won't be doing that here. :-)
Instead, the focus here is on the overall data structure and algorithm design. At this stage, I am not concerned with Principal
, User
, or Role
class definitions. Those can come later. Instead, I want to keep the discussion (and the initial code) as elementary as possible. The problem we are trying to solve is difficult enough on its own, and at this early stage, we don't want to get lost in secondary details. Therefore, the code here assumes the following deliberate simplifications:
- A
user
is uniquely identified by a string
value (i.e., email address). - A
role
is uniquely identified by a string
value (i.e. name). No two role
s share the same name, regardless of their positions in the hierarchy. - A
role
may have no parent role, or it may have one parent role. It cannot have multiple parents. - A
role
may have no users, or it may have multiple users. - A
user
may be assigned to any number of role
s. (User
s are not leaf-nodes in a tree. This is an important distinction.)
The logical model looks like this:
Path notation is a useful shorthand to identify nested roles, in the same way that path notation is a useful way to reference files in a file system or pages on a web site.
For Example...
The path "Springfield Residents/Simpson Family Members/Parents" implies the existence of three roles: Springfield Residents, Simpson Family Members, and Parents. It also defines relationships: Simpson Family Members are a subset of Springfield Residents, and Parents are a subset of Simpson Family Residents.
In this example, it is important to note the assumptions I have described disallow the existence of a role "Springfield Residents/Flanders Family Members/Parents", because the role
named "Parents
" cannot appear in multiple locations within the hierarchy. Role
s names must be unique, and role
s can have only one parent.
This constraint exists solely for the sake of simplicity, In a production system, you will most likely use something other than the name of a role
to uniquely identify it, and in that case, you might choose to permit multiple role
s with the same name (in different locations) differentiated by some other identifier. To illustrate:
What are Nested Roles?
Nested roles define subset (or containment) relationships in this model. If A contains B and B contains C, then A indirectly contains C. This can be visualized using a Venn diagram or a directed graph:
One More Constraint
There is one final constraint, again for the sake of simplicity: Role names should not include commas, slashes, colons, hyphens or bars. It's up to you if you want to enforce this. I like to troubleshoot role hierarchies by dumping them to plain text. I also like to test and debug the code with plain text input. These tasks are easier with some basic role-naming conventions to facilitate plain text.
For example, the string
value "curious.george@yellowhathouse.com:Animals/Monkeys,Adventurers
" can be used to declare the existence of a user
named Curious George
, and three role
s named Animals
, Monkeys
, and Adventurers
, where Monkey
s are a subset of Animals
, and the role of Adventurer
is not related to Animals
or Monkeys
.
As another example, troubleshooting both code and data is easier when you can display a group hierarchy in plain text like this:
|-Root
| |-Alpha
| |-Beta
| \-Gamma
| |-Delta
| | \-Epsilon
| |-Zeta
| \-Eta
\-Theta
\-Iota
|-Kappa
\-Lambda
That should be enough preamble... let's jump into the code...
Into the Code
What's the Goal?
Performance is still the highest priority with this solution. A close second is the requirement to support minimalist calls from client code. The API should be clean and compact. For example, when a principal model and an access control list are paired with one another, a permission check should look something like this:
if (principals.IsUserInRole("Alice","Explorers"))
{
}
Here is another example, in which we want to obtain a collection of all the roles to which Alice
is assigned:
var roles = principals.GetUserRoles("Alice");
This should include Alice
herself (obviously), and the roles to which she is assigned directly, and the roles to which she is assigned indirectly - which could then be used to query an ACL:
if ( acl.IsGranted( principals.GetUserRoles("Alice"), "Drink", "Mysterious Potion" ) )
{
}
Creating and loading a principal
model has to be simple as well, so less-experienced developers and administrators can re-use it. Ideally, the calls should look something like this:
principals.AddRole( "Harmless Lunatics", "Mad Tea Party Attendees" );
To maintain a loose coupling with the rest of the application (front-end and back-end), the code needs to include methods for loading a principal
model from a DataTable
or a CSV:
model.LoadRoles( "C:\Files\Roles.csv" )
model.LoadUsers( "C:\Files\Users.csv" )
In the end, I settled on two interfaces: one that is responsible for storing the data model, and one that is responsible for navigating and interrogating model. Towards the end of the article, I will explain the reason for this separation.
public interface IPrincipalModel
{
GroupTree Groups { get; }
MembershipList Roles { get; }
MembershipList Users { get; }
void AddGroup(string name);
void AddGroup(string child, string parent);
void AssignUserToRole(string user, string role);
}
public interface IPrincipalFinder
{
MembershipResultSet FindAllRoles(string user, MembershipList users, IGroupTree groups);
}
Building the Model
GroupTree
First, the principal
model needs to store the group hierarchy. This is the full set of role
s, outside the context of any particular user
's membership. This is our "master list" of groups and the parent/child relationships connecting them. For the sake of example, suppose we have the following roles defined:
Animals
Humans
/Explorers
Harmless Lunatics
/Mad Tea Party Attendees
Here is another plaint-text representation of the same hierarchy:
# Animals
# Humans
## Explorers
# Harmless Lunatics
## Mad Tea Party Attendees
We can visualize this in a diagram and add a few user
s:
Notice the entire set of groups is represented in the section titled "Roles
", including those roles to which no users are assigned. Relationships between roles
are also captured here. The diagram illustrates a number of facts about our principals
:
Alice
and Dora
are both Explorers
(directly) and Humans
(indirectly). - Neither
Alice
nor Dora
is a Harmless Lunatic
. Dora
is not a Mad Tea Party Attendee
or an Animal
. Mad Hatter
and March Hare
are both Harmless Lunatics
(directly) and Mad Tea Party Attendees
(indirectly). - All
Harmless Lunatics
are Mad Tea Party Attendees
. - Not all
Mad Tea Party Attendees
are Harmless Lunatics
.
Trees are the obvious choice for a data structure to represent a hierarchy of groups, and I have borrowed heavily from an excellent implementation in another article. (For reference, please see "A Generic Tree Collection".) My GroupTree
class looks like this:
public interface IGroupNode
{
INode<string> this[string group] { get; }
void Add(string group);
bool Contains(string group);
}
public class GroupTree : IGroupNode
{
private readonly ITree<string> _tree;
public INode<string> Root { get { return _tree.Root; } }
public INode<string> this[string group] { get { return _tree[group]; } }
public GroupTree()
{
var comparer = new StringComparer();
_tree = NodeTree<string>.NewTree(comparer);
}
public void Add(string group)
{
if (_tree.Root.Contains(group))
throw new Exception("Duplicate Not Allowed: " + group);
_tree.AddChild(group);
}
public bool Contains(string group)
{
return _tree.Contains(group);
}
public void Add(string child, string parent)
{
string error;
if (_tree.DataComparer.Equals(child, parent))
{
error = string.Format(
"Self-Reference Not Allowed: {0} cannot be its own parent", child);
throw new Exception(error);
}
if (Contains(child))
{
var childNode = _tree[child];
if (!Contains(parent))
{
error = string.Format("Duplicate Not Allowed: {0} is already in the tree", child);
throw new Exception(error);
}
var parentNode = _tree[parent];
if (parentNode.AllChildren.Nodes.Contains(childNode))
return;
error = string.Format(
"Multiple Parents Not Allowed: {0} is already assigned to {1}",
child,
parent);
throw new Exception(error);
}
if (!Contains(parent))
Add(parent);
_tree[parent].AddChild(child);
}
MembershipList
Next, the security principal
model needs a structure in which to store the list of roles to which a user
is directly assigned. We need the ability to invoke a method like this:
list.AssignUserToRole("Marge","Painters")
list.AssignUserToRole("Lisa","Musicians")
list.AssignUserToRole("Bleeding Gums Murphy", "Musicians")
Essentially, these are declarative statements about membership. For example, the user
named Marge
is in the role named Painters
. At the same time, the Painters
role has Marge
as a member. The Dictionary class is a perfect fit here. We can store a collection of roles for each user, and we can store a collection of users for each role. One class can satisfy both requirements. Here is a simplified version of the MembershipList
class, which can answer basic questions about user/role membership:
public class MembershipList
{
private readonly Dictionary<string, commadelimitedstringcollection> _members;
public MembershipList()
{
_members = new Dictionary<string, commadelimitedstringcollection>();
}
public void Add(string member, string set)
{
member = StringHelper.Sanitize(member);
if (member == null)
throw new ArgumentNullException();
var value = GetMembers(set);
if (!value.Contains(member))
value.Add(member);
}
public bool Contains(string set)
{
var key = StringHelper.Sanitize(set);
return _members.ContainsKey(key);
}
public bool Exists(string[] members, string set)
{
var key = StringHelper.Sanitize(set);
if (key == null || members == null)
return false;
var value = !_members.ContainsKey(key) ? null : _members[key];
if (value != null)
{
foreach (var member in members)
{
var m = StringHelper.Sanitize(member);
if (m != null && value.Contains(m))
return true;
}
}
return false;
}
public CommaDelimitedStringCollection FindExistingMembers(string set, string[] members)
{
var includedPrincipals = new CommaDelimitedStringCollection();
var key = StringHelper.Sanitize(set);
if (key == null || !_members.ContainsKey(key))
return includedPrincipals;
var value = _members[key];
foreach (var member in members)
{
var m = StringHelper.Sanitize(member);
if (value.Contains(m))
includedPrincipals.Add(m);
}
return includedPrincipals;
}
public CommaDelimitedStringCollection GetMembers(string set)
{
var key = StringHelper.Sanitize(set);
if (key == null)
throw new ArgumentNullException();
CommaDelimitedStringCollection value;
if (!_members.ContainsKey(key))
{
value = new CommaDelimitedStringCollection();
_members.Add(key, value);
}
else
{
value = _members[key];
}
return value;
}
public void Remove(string member, string set)
{
set = StringHelper.Sanitize(set);
if (set == null)
throw new ArgumentNullException();
var value = GetMembers(member);
if (value.Contains(set))
value.Remove(set);
}
}
PrincipalModel
Putting it all together, the code for the principal
model itself is clean and compact. Here is a draft of the interface and the class:
public interface IPrincipalModel
{
GroupTree Groups { get; }
MembershipList Roles { get; }
MembershipList Users { get; }
void AddGroup(string name);
void AddGroup(string child, string parent);
void AssignUserToRole(string user, string role);
}
public class PrincipalModel : IPrincipalModel
{
private readonly GroupTree _groups;
private readonly MembershipList _roles;
private readonly MembershipList _users;
public PrincipalModel()
{
_roles = new MembershipList();
_users = new MembershipList();
_groups = new GroupTree();
}
public GroupTree Groups { get { return _groups; } }
public MembershipList Roles { get { return _roles; } }
public MembershipList Users { get { return _users; } }
public void AssignUserToRole(string user, string role)
{
_roles.Add(user, role);
_users.Add(role, user);
}
public void AddGroup(string group)
{
if (!_groups.Contains(group))
_groups.Add(group);
}
public void AddGroup(string child, string parent)
{
_groups.Add(child,parent);
}
}
Querying the Model
In a typical application, the group hierarchy is relatively static
and membership lists change frequently. This means you might want the ability to query the model with membership lists that exist outside the model.
For example, in a web application with 100,000 users, you might not want to store an instance of the entire principal
model in memory. Instead, you might want to store the group hierarchy (only) in shared application state, and store the membership list for an individual user in the user's session state. In this scenario, you'll want queries on your access control list to look something like this:
string[] principals = PrincipalFinder
.FindAllRoles( CurrentUser.Name, CurrentUser.Roles, Global.PrincipalModel.GroupTree );
if ( Global.AccessControlList.IsGranted( principals, operation, resource ) )
{
}
Thinking back to our earlier example, what we want is the ability to invoke a method like this: FindAllRoles( "alice", "explorers, mad tea party attendees", AllGroups )
, and have it return a value that looks like this: "alice, explorers, humans, mad tea party attendees
".
This is where the PrincipalFinder
class comes in, and this is the class in which the real black magic occurs. It contains only a small number of lines of code:
public class PrincipalFinder : IPrincipalFinder
{
public MembershipResultSet FindAllRoles(string user, MembershipList users, GroupTree groups)
{
var result = new MembershipResultSet { { PrincipalSubtype.User, user, true, null } };
if (users.Contains(user))
{
var key = StringHelper.Sanitize(user);
var roles = users.GetMembers(key);
foreach (var direct in roles)
{
result.Add(PrincipalSubtype.Role, direct, true, null);
if (groups.Contains(direct))
{
var node = groups[direct];
foreach (var child in node.AllChildren.Nodes)
{
var indirect = new MembershipResult
{
Subtype = PrincipalSubtype.Role,
Name = child.Data,
IsDirect = false,
Path = TreeHelper.CalculatePath(child)
};
result.Add(indirect);
}
}
}
}
return result;
}
}
Performance Metrics
How fast is it, really? This was the first question asked by the first readers of my previous article, so I'll answer it right away.
The code can be fast. Very, very fast. But only if it is integrated correctly.
I created several integration test scenarios with a massive principal
model. The model contained 100,000 distinct users assigned to 3,000 different roles, with 5 million relationships defined between users and roles (counting only direct role memberships). Then I ran a battery of 5,000 iterations on the model, invoking PrincipalFinder::FindAllRoles
.
Scenario #1
If we assume an integration strategy in which the principal model is globally cached, and we never cache the results of a tree-traversal for a specific user's role memberships, then performance is poor. Here are the results:
- Number of roles = 3,060
- Number of users = 100,000
- Number of direct user/role membership tuples = 5,000,000 (not counting implied/indirect memberships)
- Number of test iterations = 5,000
- Average response time per iteration = 173.77 ms
- Best response time = 53.20 ms
- Worst response time = 326.61 ms
- Standard deviation = 43.38 ms
- Theoretical throughput = 5.75 method calls per second
Scenario #2
We know that string
comparison and string
manipulation is expensive, so what happens if we rewrite the code to use integers (rather than string
s) to uniquely identify users and roles? Performance is improved significantly, but it's still not fast enough to peel back anyone's hair:
- Average response time per iteration = 3.22 ms
- Best response time = 1.92 ms
- Worst response time = 7.91 ms
- Standard deviation = 0.53 ms
- Theoretical throughput = 309 method calls per second
Scenario #3
What if we cache the result after we query the group tree for a user's role membership? We can traverse the group tree when a user is authenticated to determine the user's implied roles, and then cache the result in the user's session, so we perform only one tree-traversal per user session. Adopting this approach, the results are impressive even when we assume string
identifiers and worst-case performance for every initial invocation:
- Average response time per iteration = 0.0653 ms
- Best response time = ~0 ms (depending upon cache speed)
- Worst response time = 326.6146 ms
- Standard deviation = 4.6186 ms
- Theoretical throughput = 15,308 method calls per second
If we assume average response-time for the initial call per user, then throughput jumps to 28,773 calls per second.
And if we replace string
s with integers to uniquely identify users and roles, then throughput jumps to more than 1.5 million calls per second. Now we're cooking with gas...
Conclusion
When the model is small, performance is excellent across the board, and the design is quite forgiving if you make a poor choice on integration. For example, when I repeated the first scenario using a model with 1000 users, 30 roles, and 5000 memberships, average response time jumped 192X from the baseline (from 173.77 ms to 0.90 ms per call).
However, when the model is large, loading and traversing the tree is slow. With a careful and considered integration strategy, the code in this article can provide a solution that is blazingly fast, with the added bonus that it does not require high-volume chatter between the application and its database back-end.
Integrating the Solution
Admittedly, this solution is largely theoretical. I have not yet integrated it into a production system. That said, I have plans to do so in the immediate short-term, and then I'm sure I will revisit this article to update this section.
At a high level, the integration strategy I have planned includes these elements:
- Store the Access Control List in
ApplicationState
. - Store the Principal Model in
ApplicationState
. - Query the Principal Model for role membership when a user is authenticated.
- Cache the user's role membership list (including implied/indirect roles) in the user's session.
The idea is to arrive at a solution that supports ACL method calls like this:
Global.AccessControlList.IsGranted( UserSession.Roles, operation, resource );
Improving the Solution
The overall solution is good, but not yet perfect, and there is always room for improvement. Here are a few ideas:
- If the
Role
s in your system are static
, then you can create an enumeration type to represent them, and replace the String
data type on all references to role
variables and parameters. - Create classes to represent
user
s and role
s with integer values as unique identifiers. - Define and describe a clean and compact SQL database schema for persistent storage.
If you have other ideas to improve the solution, then please let me know - I'd be interested to hear your feedback.
Points of Interest
Dictionary Redux
The MembershipList
class is almost identical to the BaseControlItem
class from my previous article. Indexed collections are fast and powerful data structures, and I suspect the Dictionary class might be vastly under-utilized. I don't see it frequently used in solutions I inherit from other developers, and in the past, I have not used it frequently myself. This is one thing that is certain to change in my solutions going forward!
GraphViz
At the beginning of this article, I indicated that user
s are not leaf nodes in a tree. The role
hierarchy is a tree because each role
has 0..1 parent roles. A user
can be assigned to multiple role
s, so the resulting data structure when you combine user
s and role
s is called a directed graph.
GraphViz is an open source program for graph visualization. You can provide it with an input file that looks like this:
digraph principals {
aliens;
animals;
mad_tea_party_attendees;
harmless_lunatics -> mad_tea_party_attendees;
humans;
explorers -> humans;
dormouse -> animals;
dormouse -> mad_tea_party_attendees;
march_hare -> animals;
march_hare -> harmless_lunatics;
mad_hatter -> harmless_lunatics;
alice -> mad_tea_party_attendees;
alice -> explorers; dora -> explorers;
}
... and it will output a digraph image that looks like this:
In the code attached to the article, you can find a method (PrincipalAdapter::ToDot
) that generates a DOT digraph statement to make a graph.
History
- December 5, 2015 - First draft
- December 7, 2015 - Fixed minor typos