Introduction
It never fails to surprise me what can be gleaned from a single line of code. Gone are the days of BASIC where each line did one thing, a print statement, a gosub or goto, if-then-elseif-end if. Nowadays, a single line of code can be a chain of method calls, LINQ expressions, and operators like ?: (ternary if-else), ?. (null continuation), ?? (null coalescing) and even if-then-else implemented as extension methods.
What we'll look at here is what can be gleaned about the implementation from just one line of code. This falls into two categories:
- Potential red flags that indicate potential bugs with edge cases, faulty input data, and general computation.
- Questions that need asked of either the code (look for further details) or of the programmer that wrote that code (understanding the thinking.)
What one discovers in this process is that a single line of code can address many of the common reasons for doing a code review:
- Security
- SOLID principles (Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion)
- STUPID principles (Singleton, Tight Coupling, Untestability, Premature Optimization, Indescriptive Naming, Duplication) -- William Durand has a great article on the STUPID principle at http://williamdurand.fr/2013/07/30/from-stupid-to-solid-code/
- Algorithm correctness
- Contractual checks and defensive programming
- Learning something
- Teaching something
- etc.
The code I present here has been changed to protected the guilty.
Code Review, First Example
var foo = sentence?.words?.FirstOrDefault(a => a.Classification == PartOfSpeech.Verb)?.Member as Verb;
First, let's parse this out:
sentence
and words
have null continuation operators, indicating that the programmer thinks they might be null. words
is collection, as we see FirstOrDefault
applied - because we see
FirstOrDefault
rather than SingleOrDefault
, we think the programmer believes that the qualifier might return zero matches, and more importantly, more than one match, in which case only the first match is returned. - Since there may be no matches, was see another null continuation operator applied to
Member
Member
is being cast to a concrete type Verb
which implies that Member
is at best an interface and at worst an object. Furthermore, there is an implication the return of the filter is of a specific classification, we can safely cast Member
to Verb
. - Since the
as
keyword is used, if the Member
is not of type Verb
, foo
gets assigned null
. - If
sentence
or words
is null
, or the collection contains no words classified as verbs, foo
is also assigned null
.
This single line of code reveals an amazing number of problems about the rest of the code. I'll analyze the code by asking questions and identifying problems.
Object Initialization To a Sane Default
The field sentence
clearly is a class or structure, as it contains another field called words
.
Question 1: Why can sentence be null rather than initialized to at least some instance with sane defaults? Is it valid for sentence
to be null?
Question 2: If sentence is null, does this indicate an incorrectly initialized object or a problem with the input data that attempted to parse the sentence into words?
Problem: The entire expression will result in null which forces the caller to check for null.
Collection Initialization to an Empty Collection
Assuming sentence
exists, we next encounter a null continuation for words
. We know words
is a collection, both from its semantics and also because of the following FirstOrDefault
call.
Question 1: Why can a collection be null? Collections should typically have two states: empty and not empty.
Question 2: What does a null third state achieve?
Problem: Again, if the collection is null, the result returned will be null.
First, Single, or Default
The next operation is to extract either the first entry in words, or if the collection is empty, the default, which know is null as words, given its usage next, is a collection of objects.
Question 1: Why is FirstOrDefault
used instead of SingleOrDefault
? The usage, as implemented, implies that more than one word classified as a verb can exist in the collection. Is this a valid situation or would it actually imply an error?
Question 2: Why isn't First
or Single
used? As written, it implies that programmer believes that the collection might not contain a verb. Is this valid or an error? We should probably assume that it's a valid situation, but it's an assumption that needs to be verified and it's a good place for a comment explicitly explaining that the collection might not have a verb, and why.
Problem: Again, if the collection is null, the result returned will be null, forcing the caller to check the result for null.
Interfaces and Unnecessary Casting
Lastly, given that the classification of the word is a verb, we have an explicit cast to Verb
.
Question 1: Why this cast? Given the filter, it's probably for the convenience of the programmer, but it paints the programmer into corner.
Problem 1: This implies that a "verb" part of speech can only ever be cast to Verb
. In my "protect the guilty" example, verbs could be specialized into action, transitive, or intransitive types. Similarly, in the actual code, I would imagine that the returned type might be specialized at some point, which would break this code, or at best cast the returned object to the base class Verb
, requiring the caller to further down-cast it (and what that would look like could be a mess to work around the hard cast.)
Problem 2: The cast implies that the collection words
probably is a collection of object. If we inspect the code and discover that it's an interface, why isn't the interface returned?
Problem 3: If we're looking for verbs, why not do something like FirstOrDefault(a => a.Member is Verb)
? This code ensures we get back a Verb
or null. As the code was written, it will throw an exception if the classification does not match the type.
Question 2: Why even use a classification system rather than determining the classification from the object type? As it's written, this implies over-complexity and is very prone to mismatches between classification and type. If necessary, metadata could be used, or a type-to-classification dictionary (or vice versa.)
Question 3: Why isn't an interface used? What operations are performed on the concrete type that are specific to the classification? Can these operations be generalized?
Over Specification
The entire line is over-specified. It is not a general solution to acquiring parts of speech in a sentence. If we look at other places in the code, we would probably find the same line used again and again for different parts of speech. Instead, given that words
is a collection, the collection could be extended through derivation or by extension methods to something like this (if we want to stay close to the initial implementation):
var foo = sentence?.words?.FirstOrDefault(a => a.Member is Verb);
Suggested Improvements
However, given my previous comments, this is probably better:
var foo = sentence?.words?.Get<Verb>();
Here the implementation return the specified type and gets the word of the specified generic parameter type. If we clean up the null continuation tests, and knowing that sentence is also an object, and use an interface, we could simplify this call to:
IWord word = sentence.GetWord<Verb>();
There are other variations, such as GetWords
when we want multiple words of the same type, or even GetSingle
or GetMany
as potential candidates for method names and their behaviors.
Furthermore, rather than using FirstOrDefault
or SingleOrDefault
, the existence if a classification (or type) should be determined instead with Exists<T>
. This has the "advantage" of resulting in an exception if the existence of something isn't checked first, but can be improved even further with this usage example:
if (sentence.TryGetWord<Verb>(out IWord word))...
Implementing all these methods gives the programmer that is calling this function a lot of flexibility, improves readability, and provides a general purpose method that doesn't really on an implicit classification mapping.
Code Review, Second Example
public string Key => $"{Word?.Classification}-{Position}";
This is a fun one:
- The code uses a lambda expression for a read-only auto-property (read more here.)
- It then uses string interpolation (read more here) to return a string.
- Now we get into the same problems that we encountered in the first example: a null continuation operator.
- We can also wonder why a "key" is needed, and why it's composed of the word's and, we assume, position in the sentence. More on this later.
- We should also wonder what happens when
Word
is null, and more importantly, why would it be null?
What Would This Look Like in C# 2.0?
Sometimes its useful to look at code and ask, "what would this look like without all the fancy new stuff in later version of C#?" This code could have been written like this:
public string Key
{
get
{
if (Word != null) return Word.Classification + "-" + Position;
else return "-" + Position;
}
}
This is equivalent code. Yes, a null is interpolated as an empty string.
Null Continuation
Problem 1: The use of null continuations should be at least a yellow flag, and in this case a big red flag.
Question 1: Why is Word
potentially null? What are the valid cases for when Word
is null?
Question 2: Is there a sane initialization for Word
? Perhaps there should be a classification for "no word?"
Question 3: If Word
is null, why would there be an associated Position
value?
Question 4: Is this unnecessary defensive coding?
Question 5: What is the value of Position
when Word
is null? Is it 0?
Question 6: Why isn't a null Word
checked earlier on in the code with some meaningful error?
Problem 2: If Word
is null, then the return is "-[position]", and "position" probably will be 0.
Question 7: How does the caller handle a null Word
? Why even force the caller to handle this case?
Problem 3: This forces us to inspect all the caller cases to see if the caller handles the improperly formed key correctly. Even worse, we might not be able to inspect all the code that uses this property, and any future code that uses this property may not realize that it can get an improperly formed key. This last issue can result in the introduction of unforeseen bugs!
Question 8: Why is Position
implemented as a separate property from Word
? While at first glance one might assume that this is a separation of concerns, it also indicates that a wrapper class might be more useful:
public class WordPosition
{
protected Word word;
protected int position;
public string Key => $"{word.Classification}-{position}";
}
might be a better implementation.
About This "Key" Property Name
Question 1: Why is a "key" being constructed from the classification and position?
Question 2: How is this "key" being used later on?
Question 3: Is the return from Key
used on the UI or is it strictly an internal implementation? This is an important question to ask because the key is being formatted in essentially a human-readable (albeit slightly encoded) string, which indicates it might be being used for some field in the UI or as a human-readable diagnostic when debugging or logging. Further inspection of the code base is required to determine this.
Question 4: Because the key maintains the information of the constituent parts that created the key (as opposed to a hash function), is there code later on that parses the key back out to its constituent parts? We might do a global search for Split('-')
to see if this is the case. If the key is being used in this way, we need to ask why and if there's a better approach that is independent of the formatting of the key. And even worse, because the key retains the information of its constituent parts, it's ripe for abuse by some future code.
An alternative implementations might be worth considering, for example a struct
which is suitable as a value type in a dictionary (we would have to look at how this whole "key" concept is being used):
public struct Key
{
public int Classification;
public int Position;
public override string ToString()
{
return Classification + "-" + Position;
}
}
This approach avoids the whole encoding issue but still provides a ToString
override for "pretty printing" and preserves the constituent elements so that parsing out those elements is not necessary.
Question 5: Is Key
encoded the way it is because it may be serialized or persisted? We probably will answer "no, it's not serialized" particularly since it's a read-only getter, but the key might be stored in a database where some other program needs to parse it back out. I see this a lot with interaction between two different systems, particularly when the data has to be serialized to a string format because encoding/decoding object representations is not doable (endian-ness and conversion issues come to mind.) So there might be a legitimate reason for a string-ified key, but as demonstrated above, there are still probably better implementations on how to separate out this requirement.
What about Reflection?
It's also important to ask, "does this code behave in the expected way if it's used in unexpected ways?" In particular, does a lambda expression in a read-only auto-property work the same with reflection? A simple test verifies that yes indeed, it does (we would certainly hope so, but it's always important to test our assumptions):
public class Foo
{
public string Bar => "bar";
}
class Program
{
static void Main(string[] args)
{
var foo = new Foo();
Console.WriteLine(foo.Bar);
var pi = foo.GetType().GetProperty("Bar", BindingFlags.Public | BindingFlags.Instance);
var pival = pi.GetValue(foo);
Console.WriteLine(pival);
}
}
Yes, "bar" is printed twice.
Conclusion
While there's a lot of good information out there on conducting code reviews, the point of this short article is to illustrate concretely how a single line of code can reveal:
- Flaws in error checking.
- Assumptions the programmer is making about the data.
- Overly defensive / misplaced defensive programming.
- Poor design.
- Lack of consideration to future use.
- Where the burden of validation checking occurs: by the caller, or by the code being called?
Look for the Low Hanging Fruit
A single line of code often results in many questions that deserve deeper investigation and which will probably reveal further areas of potential architecture and implementation issues. Doing that deep dive will teach you about large areas of the code base quickly. Code reviews that look at algorithm correctness, style, comment usage, syntactical efficiency, performance (to name a few) are often inefficient time wasting exercises. A good code review looks at potential bugs in current usage and future usage. To identify the low hanging fruit, look first, not at entire areas of code, but rather single lines of code that are making assumptions (remember what ASS-U-ME means.) In these examples, I've identified single lines of code that reveal assumptions about:
- Null objects.
- Collections and uniqueness of items in the collection.
- Validity of data.
- Who is supposed to handle errors.
These assumptions drive further questions about the code. We identified these issues by looking at only a few things:
- How null continuation operators are used.
- If null coalescing operators had been used, this would be a "look at this code" trigger as well.
- String-ification of data.
- LINQ -
FirstOrDefault
can be a real big flag.
In some ways, the programmer made identifying the low hanging fruit here rather easy through the use of newer C# idiomatic syntax and LINQ expressions, both of which reduce the number of lines of code. If you're looking at older code, one has to go back to the practice of looking at if-else statements and loops to identify assumptions, which can be more time consuming.
Getting Into the Mindset of the Programmer
Reviewing actual code requires getting into the mindset of the programmer. It's great when the programmer is available for questions and this can result in a learning experience for everyone (the programmer may have very good answers for the implementation decisions), but the the process can also be painful -- I will not pussyfoot around the issue of criticism and that one's ego can be bruised, no matter how positively the code review is couched. Even if my ego is bruised, the salient point is to pick myself up from the floor and demonstrate that I learned how to code better.
If the programmer isn't available, a deep dive to answer the questions yourself is necessary, which (again, no pussyfooting) is often a frustrating experience with much gnashing of teeth and "this pile of shyte needs to be rewritten from scratch" under-the-breath (or not) commentaries. In such situations, it's often useful to take a step back and seriously consider that question rather than continue to curse the unknown programmer. It may also be possible to make significant improvements to the code with little effort (but probably not!)
Now to go back and look at my own code!