LINQ is fantastic for the ability to write queries that express intent much more clearly than the same SQL, or structured code. One problem that I have run into though is handling NULL
database values that are part of a group by
statement.
Grouping by ProductSKU
Grouping in LINQ allows you to return sets of data from a collection for a given key value. The group by
clause is what the key ends up being in the result set. Let's take a grouping of the Products
by the SKU.
from p in Products
group p by p.ProductSKU
Enumerable IGrouping Collection
This results from the group by
are enumerable groups (IGrouping<String, Product>
) with the String
being the Key
for the groups (the ProductSKU
field from the table). The typical way you walk through this result is a nested for
loop.
var groups = from p in Products
group p by p.ProductSKU;
foreach( var groupentry in groups )
{
Console.WriteLine( "Group: {0}", groupentry.Key );
foreach( var groupitem in groupentry )
{
Console.WriteLine("Product: {0}", groupitem.ProductSKU);
}
}
I end up with a list that looks something like this:
Group: VDB4DBA
Product: VDB4DBA
Group: VDB4DMW
Product: VDB4DMW
Group: VDB4PARTNER
Product: VDB4PARTNER
This works, but not what I really wanted. In this case, the first four characters of the SKU are the same per product family (VDB4 for all VistaDB 4 SKUs). I would like to be able to group by only those first four characters instead of the complete ProductSKU
. You can do this with the following code:
from p in Products
group p by p.ProductSKU.Substring(0, 4)
What If There Are NULL Entries?
But what happens if there is a NULL
entry in the ProductSKU
? You get a ConstraintException
: The property cannot be set to a null
value.
Ternary and Null Coalescing Operators to the Rescue
There are two operators you can use to modify the null
values into something you can use. In SQL, you would use the COALESCE
or ISNULL
operations, these are pretty close matches.
The ternary operator is a shortcut for:
if( condition ) then (true code) : (false code)
The null
coalescing operator is used to define a default value if the variable is null
.
variable = ( condition ) ?? ( defaultvalue)
The code to use both of these follows:
var groups = from p in Products
group p by p.ProductSKU == null ? "<null>" : p.ProductSKU.Substring(0, 4);
var groups2 = from p in Products
group p by p.ProductSKU.Substring(0, 4) ?? "<null>";
In this case, the ternary operator is the only one that will work. This is because the test is independent of the operation. The second example above will crash with the same constraint exception because the ProductSKU.Substring
is attempted to be evaluated first, and substring on a null
doesn’t work!
The null
coalescing operator would work if we only wanted to test if the ProductSKU
was null
, but in this case the ternary is the only way to get the desired result.
Final Result
So the final result after the ternary operator looks like this:
Group: VDB3
Product: VDB3SRC
Group: VDB4
Product: VDB4DMW
Product: VDB4PROB
Product: VDB4CORE
Product: VDB4ASPPAK
Product: VDB4DBA
Now I have cleaner groups like I wanted without having to write string
parsing after the query.
Summary
LINQ has a very expressive syntax that allows you to do some amazing queries without resorting to SQL.
Group by
can also be used on composite keys (more than one column) by projecting into an anonymous type. Maybe I will leave that for another post.