Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Is it Really Better to 'Return an Empty List Instead of null'? / Part 4

0.00/5 (No votes)
30 Oct 2014 1  
This article series aims to answer the question: Should we return an empty list or 'null' from functions?

Table of Contents

Part IV: Source Code Examples

Introduction

Welcome to the last installment of this article series.

In part 1, we had a look at the very popular advice to return empty collections, and not null from functions (in case of "no data" to return).

In part 2, we saw that this recommendation must be reconsidered. We looked at examples showing that the opposite is true: It is better to return null instead of an empty collection.

In part 3, we looked at how lists are used in the physical world and our conclusion from part 2 was confirmed.

In this installment, we'll have a thorough look at source code examples that cover very common programming tasks, such as validating input and handling resource errors.

We will start with a trivial example and add some complexity in each subsequent step.

Each example will be shown in two different programming languages and we will compare different approaches.

Most importantly, we will try to find answers to the following questions:

  • What happens if we write 'bad code' - in both languages and with different approaches?

    [Note] Note
    By 'bad code', I mean things like not checking for null or emptiness, not validating input arguments in a function, ignoring resource errors that might appear at runtime, etc.
  • Can the design of a programming language and its standard libraries help us to write more reliable and more maintainable code in less time?

So, this article is not just about "How to ...", it is also about "Why to ...".

Same Same, But Not Same

Thai people are jocular. When they see two similar objects, you won't always hear them saying "These objects are similar". No, they say: "Same same, but not same".

The same could be said about the two programming languages we are going to use in this article: Java and PPL.

If you never heard about PPL, you are not alone. While Java is one of the most successful languages in the history of programming, supported by a huge developer community and mature for developing big mission-critical enterprise applications, exactly the opposite is true for PPL.

After many years of writing bad code (such as monstrous functions containing lots of variables and deeply nested instructions; ignoring errors returned by functions; forgetting to check for null, etc. etc.) and suffering from its consequences, I wondered if there exists a programming language that could help me to write better code. I couldn't find what I was really looking for, but I discovered many good ideas and was sometimes speechless to see that some very effective and proven Fail-Fast! features (for example Design by Contract) were not supported in the majority of popular programming languages. Finally, I pondered about a new language and started to create it. I named it Practical Programming Language (PPL) to emphasize the fact that it should be useful and suitable for developing real world applications . The vision and ultimate goal is to provide a programming environment that helps to write more reliable code in less time. I had many ideas to achieve this goal. Some were good. Some were bad. I kept the features that worked well - and discarded those that didn't.

Today, I am absolutely convinced that the design of a programming language is a determinant factor in how easy or difficult it is for the average programmer to write high quality code in a reasonable time. This doesn't mean of course that a programming language can completely inhibit programmers from designing bad data structures or writing buggy and unmaintainable code. The point is this:

Some bugs and some bad programming practices can effectively be prevented through the design choices and features provided by the language.

For example, the very frequent bug of forgetting to check for null (and leading to a null pointer error) can completely be eliminated with compile-time null-safety natively built into the language. The source code examples in this article (especially the last example) should demonstrate what I mean by 'helping to write more reliable code in less time'.

[Note] Note

PPL is still an work in progress and not ready yet to develop mission-critical business applications.

For more information please refer to the website and read the FAQ.

Many fundamental concepts are very similar in Java and PPL. For example, both languages are object-oriented, compiled, statically typed and applications run on a Java Virtual Machine (JVM). However, there are a number of diametrically opposed design principles applied in both languages. The following table is by no means exhaustive. It shows some differences (and one sameness) - only those that are relevant in this article and which we will encounter in the source code examples.

Table 1. Comparison of some design choices in Java and PPL

  Java PPL
Collections - Collections are mutable by default
- All collections can be empty
- Collections are immutable by default
- Only mutable collections can be empty
Strings - Strings are immutable by default
- Any string can be empty
- Strings are immutable by default
- Only mutable strings can be empty
Functions that return collections Some functions return empty collections, others return null to denote "no data"
(depends on the author)
All functions consistently return null to denote "no data"
Null handling - No distinction between nullable and non-nullable types
- All types are nullable
- Null can be assigned to any object reference
- Clear distinction between nullable and non-nullable types
- Types are non-nullable by default
- Null can only be assigned to nullable object references
Compile-time null-safety Not natively supported,
but some third party tools provide limited support for null-handling
Natively supported
Optional pattern Supported since Java version 8 Not supported
(uses null and compile-time null-safety)
Design by Contract Not natively supported,
but third party extensions exist
Natively supported
Unit testing not natively supported,
but third party extensions exist
Natively supported
Error handling Uses exception mechanism for program errors (e.g. stack overflow) and runtime errors (e.g. file read error) - Uses exception-like mechanism for program errors (e.g. stack overflow)
- Commands return an 'error' object (besides a nullable 'result' object) in case of runtime errors (e.g. file read error)
Type inference Not supported Supported for local script constants and variables

It is interesting to note that (as far as I know) many (if not most) popular programming languages tend to apply the design principles in column 'Java'. Some design choices in PPL might seem strange (for example, there are no empty immutable strings in PPL). But these choices have all been made consciously and deliberately, because they all support the important Fail Fast! principle:

Errors should preferably be automatically detected at compile-time, or else as early as possible at run-time.

We will see how this is achieved.

Example 1: Return a Non-empty List

To get our feet wet, we start with a trivial example of a function that returns a non-empty string list of continent names.

[Note] Note
This is a long article (maybe too long). If you are an experienced programmer in a hurry, then you might want to skip the first two examples and jump to example 3.

Java version

Here is the code in Java:

public static List<String> getContinents() {

   List<String> result = new ArrayList<String>();

   result.add ( "Africa" );
   result.add ( "America" );
   result.add ( "Antarctica" );
   result.add ( "Asia" );
   result.add ( "Australia" );
   result.add ( "Europe" );

   return Collections.unmodifiableList ( result );
}

The above example shows the three basics steps often involved in functions that return a collection:

  1. An empty, mutable collection is created.
  2. Elements are added to the mutable collection.
  3. The mutable collection is converted into an immutable one and returned as the result of the function.
[Note] Note

Some readers might wonder why we cannot simply write ...

return result;

... instead of:

return Collections.unmodifiableList ( result );

The reason is that functions should always return immutable collections, unless there is a specific need for mutability. Readers not familiar with the benefits of immutable objects are encouraged to search the net for terms like 'advantages of immutable data structures'. There are some excellent articles available. In short: Immutable data have a simpler API, are less error-prone (not only in multi-processing environments), have no state transitions to handle and can be freely shared without the need for synchronization.

Java doesn't have list literals, but the above code can be simplified by using the convenient Arrays.asList method which returns an immutable list:

public static List<String> getContinents() {

   return Arrays.asList ( "Africa", "America", "Antarctica", "Asia", "Australia", "Europe" );
}

In practice, the above implementation could lead to a serious performance bottleneck, because a new list is created each time the method is called. This takes time to create unnecessary duplications of the list and to recycle them later by the garbage collector. And it takes memory to store them. To avoid this, we can use data caching. The basic idea is this: Create the list the first time the function is called, keep the list saved in a local variable, and then simply return the same list in subsequent calls. Here is an example:

private static List<String> continents = null;

public static List<String> getContinents() {

   if ( continents == null ) {
      continents = Arrays.asList ( "Africa", "America", "Antarctica", "Asia", "Australia", "Europe" );
   }

   return continents;
}

The first time getContinents() is called, the private field continents is null. The list is then created and assigned to the private field. Subsequent calls just return the list that has been created in the first call.

[Note] Note

An even simpler solution would be to define a public static field that holds the list, as follows:

public static List<String> continents = 
   Arrays.asList ( "Africa", "America", "Antarctica", "Asia", "Australia", "Europe" );

But this solution has a drawback: It doesn't use lazy initialization. Suppose that the list is never used by the application. Then it is created the first time the class containing the field is loaded by the classloader. This consumes time and memory, and can therefore lead to high memory requirements and slow startup time if this pattern is used often.

PPL version

Here is the equivalent of the Java method getContinents(), written in PPL:

command get_continents
   out list<string> result

   script
      const r = mutable_list<string>.create

      r.append ( "Africa" )
      r.append ( "America" )
      r.append ( "Antarctica" )
      r.append ( "Asia" )
      r.append ( "Australia" )
      r.append ( "Europe" )

      result = r.make_immutable
   .
.

As most readers are not familiar with PPL, let's delve a bit into its syntax.

  • command get_continents

    A command in PPL is like a method in Java. It has a name (get_continents in our case), zero or more input arguments, zero or more output arguments, and performs an operation.

    Note that PPL doesn't use curly brackets (braces) to embed a block of code. In PPL, code embedded within a block is indented, and the block is terminated by a dot (.) on a single line. Hence, code like this in a language that uses braces ...

    foo {
       // body
    }

    ... is written like this in PPL:

    foo
       // body
    .

    Note also that, for better readability, PPL uses an underscore (_) to separate words in identifiers. PPL doesn't use camel-case. So, if you are used to ...

    thisIsALongIdentifier

    ... you will see this in PPL:

    this_is_a_long_identifier
  • out list<string> result

    This instruction defines the command's output. PPL provides multiple output arguments - the counterpart of multiple input arguments. Therefore, each output argument also has a name. In our case, we have a single output argument of type list<string>, named result.

    [Note] Note

    Multiple instructions can be written on a single line, separated by the instruction separator symbol |. Hence, we could also write:

    command get_continents | out list<string> result
  • script

    script is a container for the command's implementation (i.e. the instructions that perform the operation).

  • const r = mutable_list<string>.create

    This instruction declares a local constant named r and initialized to an empty mutable list of strings.

  • r.append ( "Africa" )

    The string "Africa" is appended to r

  • result = r.make_immutable

    This instruction converts the mutable list contained in r into an immutable one which is then assigned to output argument result.

PPL provides list literals. And the script instruction is optional. The code can be shortened as follows:

command get_continents
   out list<string> result

   result = ["Africa", "America", "Antarctica", "Asia", "Australia", "Europe" ]
.

We could now improve the code and use caching and lazy initialization, as we did with the Java version in the previous section. But this is not necessary. PPL implicitly uses lazy initialization for static constant fields (unless we explicitly disable lazy initialization in the source code). We can simply define a constant static field like this:

const list<string> continents default:["Africa", "America", "Antarctica", "Asia", "Australia", "Europe" ]

If continents is never used in the application, it will never be initialized, so there is no waste of time and memory. The list will be created the first time continents is actually accessed in the application, and subsequent accesses will re-use the same list.

Example 2: There Might Be No Data To Return

While the previous example showed a function that always returns a non-empty list, we will now look at a function that might return with "no data".

The function in this example takes a nullable string as input and returns a list of characters containing all digits found in the string. If the input string is null or doesn't contain digits, the function returns null.

Java version

The following code shows an implementation in Java:

public static List<Character> getDigitsInString ( String string ) {

   if ( string == null ) return null;

   List<Character> result = new ArrayList<Character>();

   for ( char ch : string.toCharArray() ) {
      if ( Character.isDigit ( ch ) ) {
         result.add ( ch );
      }
   }

   if ( result.isEmpty() ) {
      return null;
   } else {
      return Collections.unmodifiableList ( result );
   }
}

The interesting part of this code is of course the if statement at the end. If no digits are found in the input string (i.e. result.isEmpty() is true) then the function returns null, otherwise an immutable list containing the digits is returned.

If this pattern is used often, we can write a utility such as the following one:

public static <T> List<T> toUnmodifiableListOrNull ( List<T> list ) {

   if ( list == null || list.isEmpty() ) {
      return null;
   } else {
      return Collections.unmodifiableList ( list );
   }
}

The last if in getDigitsInString can then be replaced by simply writing:

return toUnmodifiableListOrNull ( result );
[Note] Note
An alternative solution would be of course to return an empty list if there are no digits found. But we will not consider this solution here because of the disadvantages of returning empty collections in case of "no data", as explained in part 2 of this article series.

To test our function, we could use a third-party test-framework (like JUnit). But to keep things simple, we will just use standard Java functionality and write a small test method like this:

public static void test() {

   // case 1: there are digits
   List<Character> digits = getDigitsInString ( "asd123" );
   assert digits.size() == 3;
   assert digits.get(0) == '1'; // first element
   assert digits.get(digits.size()-1) == '3'; // last element
   assert digits.toString().equals ( "[1, 2, 3]" );

   // case 2: there are no digits
   digits = getDigitsInString ( "asd" );
   assert digits == null;

   // case 3: input is null
   digits = getDigitsInString ( null );
   assert digits == null;
}

If you want to try out the whole code without using an IDE, you can proceed as follows (after ensuring that Java is installed on your system):

  • Create file ListExample_02.java in any directory, with the following content:

    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.List;
    
    public class ListExample_02 {
    
       public static List<Character> getDigitsInString ( String string ) {
    
          if ( string == null ) return null;
    
          List<Character> result = new ArrayList<Character>();
    
          for ( char ch : string.toCharArray() ) {
             if ( Character.isDigit ( ch ) ) {
                result.add ( ch );
             }
          }
    
          if ( result.isEmpty() ) {
             return null;
          } else {
             return Collections.unmodifiableList ( result );
          }
       }
    
       public static void test() {
    
          System.out.println ( "ListExample_02:" );
    
          // case 1: there are digits
          List<Character> digits = getDigitsInString ( "asd123" );
          assert digits.size() == 3;
          assert digits.get(0) == '1'; // first element
          assert digits.get(digits.size()-1) == '3'; // last element
          assert digits.toString().equals ( "[1, 2, 3]" );
          System.out.println ( digits );
    
          // case 2: there are no digits
          digits = getDigitsInString ( "asd" );
          assert digits == null;
          System.out.println ( digits );
    
          // case 3: input is null
          digits = getDigitsInString ( null );
          assert digits == null;
          System.out.println ( digits );
       }
    
       public static void main ( String[] arguments ) {
          test();
       }
    }
  • Compile the file by typing the following operating system command in a terminal window:

    javac ListExample_02.java
  • Execute the program by typing:

    java ListExample_02
  • The following output will be displayed:

    ListExample_02:
    [1, 2, 3]
    null
    null
[Note] Note
You can download the source code for all examples in a .zip file (see link at the beginning of the article).

PPL version

In PPL, the code looks like this:

command get_digits_in_string
   in nullable string string

   out nullable list<character> result

   script
      if string is null
         result = null
         return
      .

      const r = mutable_list<character>.create

      repeat for each character in string
         if character.is_digit then
            r.append ( character )
         .
      .

      result = r.make_immutable_or_null
   .
.

Here are some points to consider:

  • in nullable string string

    This instruction defines an input argument named string, of type nullable string. In PPL all object references are non-nullable by default. Therefore we have to use the keyword nullable to clearly state that null is a valid input value.

  • out nullable list<character> result

    The command returns a nullable list of characters.

  • if string is null
       result = null
       return
    .

    If the command is called with null as input, it returns with null as result.

  • repeat for each character in string
       if character.is_digit then
          r.append ( character )
       .
    .

    This is the loop that checks each character in the input string. If the character is a digit, then it is appended to the mutable list in r.

  • result = r.make_immutable_or_null

    If the list in r is empty, then the command returns null as result, else it returns an immutable list.

Unit testing is natively supported in PPL. We can write a test script like this:

test
   // case 1: there are digits
   test "asd123"
   verify result is not null
   verify result.size =v 3
   verify result.first =v '1'
   verify result.last =v '3'
   verify result.to_long_string =v "[1, 2, 3]"

   // case 2: there are no digits
   test "asd"
   verify result is null

   // case 3: input is null
   test null
   verify result is null
.

Some explanations:

  • test

    Instructions within a test block are executed to unit-test command get_digits_in_string

  • test "asd123"

    Each test instruction launches a test case. In this case, command get_digits_in_string is called with "asd123" as input.

  • verify result is not null

    The verify instruction is used to check the result returned by get_digits_in_string. In this case we ensure that the result is non-null.

  • verify result.size =v 3

    There must be 3 elements in the list returned. Note the comparison operator =v. The v stands for value comparison (i.e. the values of the objects are compared, not their references). If we wanted to compare references, we would have to use the reference comparison operator =r.

  • verify result.first =v '1'

    The first element in the result list must be the character 1.

If you want to try out the whole code, you can proceed as follows (after installing PPL):

  • Open a terminal window in any directory.

  • Create a PPL project named list_examples by entering the following command:

    ppl create_project project_id=list_examples

    PPL creates a sub-directory named list_examples that contains a number of sub-directories and files.

  • Create directory examples under the existing sub-directory work/ppl/source_code/list_examples/.

    In this new directory, create file se_list_example_02.ppl with the following content:

    service list_example_02
    
       command get_digits_in_string
          in nullable string string
    
          out nullable list<character> result
    
          script
             if string is null
                result = null
                return
             .
    
             const r = mutable_list<character>.create
    
             repeat for each character in string
                if character.is_digit then
                   r.append ( character )
                .
             .
    
             result = r.make_immutable_or_null
          .
          test
             // case 1: there are digits
             test "asd123"
             verify result is not null
             verify result.size =v 3
             verify result.first =v '1'
             verify result.last =v '3'
             verify result.to_long_string =v "[1, 2, 3]"
             %write_line ( result.to_long_string )
    
             // case 2: there are no digits
             test "asd"
             verify result is null
             %write_line ( se_any_type_to_string.to_long_string ( result ) )
    
             // case 3: input is null
             test null
             verify result is null
             %write_line ( se_any_type_to_string.to_long_string ( result ) )
          .
       .
    
    .
  • Compile and build your project by executing the compile_and_build system file which is located in your projects root directory (i.e. compile_and_build.sh on Linux/Unix and compile_and_build.bat on Windows).

  • To execute the unit test, run the system command file run_tests located in the PPL project's root directory (i.e. run_tests.sh on a Linux/Unix system, and run_tests.bat on a Windows system.

    A message like the following will be displayed:

    Running unit tests in list_examples
    testing list_examples.examples.se_list_example_02
    [1, 2, 3]
    #null#
    #null#
    
    Objects tested: 1
    
    BRAVO AND CONGRATULATIONS!
    All tests passed without errors.
[Note] Note

To get more information about creating PPL projects, please refer to chapter Developing a standalone PC application in the tutorial.

You can download the PPL project containing all examples in a .zip file (see link at the beginning of the article).

Example 3: Invalid Input Data

Checking input against invalid values is one of the most effective ways to apply defensive programming. It is an absolute requirement for robust, high quality software. In this context, input can mean any kind of data transferred from a source to a destination, such as user input, data read from external sources, input sent to a function, etc. It is important to check the validity immediately at the instant of getting the input. In case of a function, this means that checking the input should be the first thing done in the function.

In practice, there are two common ways to protect functions/methods/commands against being called with invalid input:

  • Exception mechanism: The first statements in the function check the input values and an exception is thrown if an invalid value is found.

    This is the way used in Java and many other popular programming languages.

  • Design by Contract (also called Contract Programming): For each input argument, a precondition can be defined. In its simplest form, a precondition is a boolean expression that must evaluate to true for the input value to be valid. If a precondition is not fulfilled, then a program error (exception) is thrown.

At first, both approaches appear to be very similar, because in both cases an exception is thrown if an input value is invalid. There is, however, an important difference that leads to Design by Contract clearly being the better technique.

In the first case (exception mechanism), the input arguments' conditions are defined in the object's implementation code, and not in the code that defines the object's interface. So, in Java (for example) if a string input must not exceed 256 characters, then this condition is tested at the class level, and not at the interface level.

On the other hand, preconditions are part of the object's interface (type), and not of the implementation (factory).

This leads to considerable advantages:

  • The preconditions are part of the official contract between the caller and the callee. They appear in the API documentation. This means that a programmer looking at an API can immediately see the requirements for valid input arguments. For example, we wouldn't just see ...

    String name

    ... for an input argument, but ...

    String name check: name.size <= 256
    

    ... and reliably be told that a program error occurs if the function is called with a name exceeding 256 characters.

  • Preconditions are automatically inherited in child types and they can be made weaker (but not stronger) if appropriate. This is an application of the important Liskov Substitution Principle that states:

    If S is a subtype of T, then objects of type T may be replaced with objects of type S without altering the behavior of the program.
  • Preconditions are automatically enforced in all implementations of a given type. So, if there are three factories/classes implementing the same type/interface, there is no risk of forgetting to code the precondition in one of the factories or to accidentally define different preconditions in the factories. Moreover, there is no risk of code duplication.

  • Individual preconditions can be evaluated programmatically at runtime. So, client code can programmatically check input data before calling a function with preconditions.

[Note] Note
For more information about Design by Contract, please refer to the Wikipedia article or read about it in the PPL language manual.

To see both approaches in action, we will write a function that creates a list of a given size, containing the same word in each position. The function has two input arguments:

  1. The number of elements in the resulting list.

    This number must be a positive integer greater than zero. This input should be optional. If no value is specified by the caller, then the value defaults to 1.

  2. The word used to fill the list.

    This is a string that cannot be null and must match the regular expression "\w+" (i.e. a string composed of one or more letters, digits and underscores).

The function's result is a list of strings.

Java version

Here is a possible implementation in Java:

public static List<String> createFilledWordList ( int num_elements, String word ) {

   if ( num_elements <= 0 )
      throw new IllegalArgumentException ( "'num_elements' must be greater than 0" );
   if ( word == null )
      throw new IllegalArgumentException ( "'word' cannot be null" );
   if ( ! word.matches ( "\\w+" ) )
      throw new IllegalArgumentException ( "'" + word + "' is not a word." );

   List<String> result = new ArrayList<String>();

   for ( int i = 0; i < num_elements; i++ ) {
      result.add ( word );
   }

   return Collections.unmodifiableList ( result );
}

In Java, method overloading (not to be confounded with method overriding) enables us to define another method with the same name, as long as there are different types of input arguments. This can be used to simulate default input argument values. Besides the above method, we have to add the following method that can be called to create a list with one element:

public static List<String> createFilledWordList ( String word ) {

   return createFilledWordList ( 1, word );
}

Here is a method that illustrates a unit test for createFilledWordList:

public static void test() {

   // case 1: standard case
   List<String> foos = createFilledWordList ( 3, "foo" );
   assert foos.toString().equals ( "[foo, foo, foo]" );

   // case 2: use default value for 'num_elements'
   foos = createFilledWordList ( "foo" );
   assert foos.toString().equals ( "[foo]" );

   // case 3: illegal input value
   // Note: this test would be easier to be written in a real unit-test framework
   boolean exceptionRaised = false;
   try {
      foos = createFilledWordList ( "foo bar" );
   } catch ( Exception e ) {
      exceptionRaised = true;
   }
   assert exceptionRaised;
}

PPL version

command create_filled_word_list
   in positive_32 num_elements default:1
   in string word check:word.matches_regex ( regex.create ( '''\w+''' ) )

   out list<string> result

   script
      const r = mutable_list<string>.create

      repeat num_elements times
         r.append ( word )
      .

      result = r.make_immutable
   .
.

Notes to the above code:

  • in positive_32 num_elements default:1

    Here, we declare input argument num_elements of type positive_32 with a default value of 1.

    Besides signed integers, PPL also provides unsigned integers. positive_32 denotes a 32 bits positive integer, excluding zero (there is also type zero_positive_32 which includes zero as value). In our case, using type positive_32 has two advantages::

    • We don't need to explicitly check if the input value is greater than zero.

    • The risk of calling create_filled_word_list with num_elements set to zero or a negative value (and thus causing a runtime error) is eliminated, because assigning a signed integer to an unsigned one (without explicit conversion) is a type compatibility violation which is reported as an error by the compiler.

    The default clause is used to specify an expression that represents the default value for input num_elements. Calling create_filled_word_list without specifying a value for num_elements is the same as calling the command with the value 1 assigned to num_elements.

    It is not necessary to explicitly check num_elements for null, because, as said already, null is not allowed by default.

  • in string word check:word.matches_regex ( regex.create ( '''\w+''' ) )

    The second input argument is a (non-nullable) string named word, whose value must match the regular expression \w+.

    This instruction demonstrates the use of a precondition (Design by Contract). We use the check clause to define a boolean expression that must be fulfilled before the command is executed. If create_filled_word_list is called with input word not containing a word, then a program error is thrown immediately.

    Note the regular expression embedded between '''. This is an example of a triple-apostrophed string literal which can contain new lines and characters that would need to be escaped in a standard string literal. Hence '''\w+''' is equivalent to "\\w+"

  • out list<string> result

    The output is a non-nullable list of non-nullable strings.

  • The command's script should be self-explanatory. We create a mutable list of size num_elements, containing word in each position. At the end, an immutable list is returned.

A test script could be written like this:

test
   // case 1: standard case
   test ( num_elements = 3
      word = "foo" )
   verify result.to_long_string =v "[foo, foo, foo]"

   // case 2: use default value for 'num_elements'
   test ( word = "foo" )
   verify result.to_long_string =v "[foo]"

   // case 3: illegal input value
   test ( word = "foo bar" )
   verify_error // check that a program error is thrown
.

Example 4: Resource Errors

Now comes the most interesting and revealing example. Besides the points already covered in the previous examples, we will add another common difficulty: a system error that might occur at runtime and that has to be anticipated in the source code. Typical examples of these problems are: file read error, database not available, network connection error, etc.

Most importantly, we will look at what happens if we write 'bad code', and we will compare different approaches and analyze how they affect software reliability and maintainability.

The function we are going to write reads a text file and returns lines that don't match a regular expression. The function has two input arguments:

  • the file to be read
  • the regular expression used to check each line

The function returns a list of strings representing the lines that don't match the regular expression.

A practical use case of this function would be to check the validity of a file.

Suppose, for example, a configuration file that contains parameters composed of key/value pairs such as these:

start_line: 10
end_line:20
verbose:yes

The following regular expression could then be used to find invalid entries in such a config file:

\w+:[ \t]*\w+

This regex translates to: The line must start with one or more word characters (letter or digit or underscore), followed by a colon (:), followed by optional white space, followed by one or more word characters.

Java version

Here is a source code suggestion, written in Java:

public static List<String> findLinesNotMachingRegexInFile ( File file, Pattern lineRegex ) throws IOException {

   // read lines from file and store result in 'lines'
   List<String> lines = Files.readAllLines ( file.toPath() );

   // store all lines that don't match the regex in a mutable string list
   List<String> result = new ArrayList<>();
   for ( String line : lines ) {
      if ( ! lineRegex.matcher ( line ).matches() ) {
         result.add ( line );
      }
   }

   return result;
}

Is this code ok?

Suppose we have a file with the following content:

start_line: 10
illegal
verbose:yes

If we call findLinesNotMachingRegexInFile with the regex \w+:[ \t]*\w+ then the function will correctly return a list containing the invalid line illegal.

So, the code is ok, isn't it?

No, it isn't.

Any experienced programmer would point out immediately that the above code is just a 'quick and dirty' version of the code we should write. Yes, the above code is what most of us would like to write ("Please let me write simple code and don't bother me with exceptional cases that are unlikely to happen!"). It is a sad fact that this is indeed the kind of source code we often see in practice (and yes, I have myself often written code like this, and I am not proud of it).

The problem with this code is that the following six corner-cases are not handled explicitly:

  1. The function is called with input argument file set to null.
  2. The function is called with input argument lineRegex set to null.
  3. The file doesn't exist.
  4. The file can't be read (e.g. user has no access right).
  5. The file is empty.
  6. The file contains empty lines.
[Note] Note

In an ideal world, our initial specification of the function would have mentioned all these corner-cases and tell us exactly what to do in each case. But we don't live in an ideal world, and real-world specifications often don't mention corner cases. This is no valid excuse to ignore them in our source code too.

[Note] Note

Another corner case would be to call the function for a binary file that doesn't contain text. However, we will not discuss this case because there is unfortunately no easy and reliable way to know if a file contains text or not.

For more information, look at the Stackoverflow question Determining binary/text file type in Java?

Another point is that the above method returns an empty list in case of no lines found, although it is better to return null, as we saw in part 2.

It is not difficult to address all corner-cases. Here is an improved version:

public static List<String> findLinesNotMachingRegexInFile ( File file, Pattern lineRegex )
   throws IOException {

   // check input arguments
   if ( file == null )
      throw new IllegalArgumentException ( "Input argument 'file' cannot be null." );
   if ( ! file.exists() )
      throw new IllegalArgumentException ( "File " + file + " doesn't exist." );
   if ( lineRegex == null )
      throw new IllegalArgumentException ( "Input argument 'lineRegex' cannot be null." );

   // read lines from file and store result in 'lines'
   List<String> lines;
   try {
      lines = Files.readAllLines ( file.toPath() );
   } catch ( IOException e ) {
      throw new IOException ( "File " + file + " cannot be read.", e );
   }

   // if there are no lines in the file then throw an exception
   if ( lines.isEmpty() ) {
      throw new IllegalArgumentException ( "File " + file + " is empty." );
   }

   // store all lines that don't match the regex in a mutable string list
   List<String> result = new ArrayList<>();
   for ( String line : lines ) {
      if ( ! line.isEmpty() ) { // ignore empty lines
         if ( ! lineRegex.matcher ( line ).matches() ) {
            result.add ( line );
         }
      }
   }

   // if lines have been found then return an immutable non-empty list, else return null
   if ( result.isEmpty() ) {
      return null;
   } else {
      return Collections.unmodifiableList ( result );
   }
}

Now all corner-cases are handled explicitly, but ... the method body has more than tripled in size!

Is it worth it to write long code like this? Is this code really better?

In the article's introduction, we asked "What happens if we write 'bad code'?".

To find the answers, let us have a look at each corner-case and compare how the software behaves, depending on the code we write.

  • The function is called with input argument file set to null.

    • In version 1, a NullPointerException is thrown when file.toPath() is executed in the statement:

      List<String> lines = Files.readAllLines ( file.toPath() );
    • In version 2, an IllegalArgumentException is thrown.

    Version 2 is clearly better because:

    • An IllegalArgumentException tells us more than a NullPointerException and helps in debugging. Suppose that the function resides in a third party library and that we don't have access to the source code. Then seeing a NullPointerException occurring in findLinesNotMachingRegexInFile makes it more difficult to find the bug. On the other hand, an IllegalArgumentException with a clear error message (Input argument 'file' cannot be null) immediately tells us the reason of the bug.

    • The behavior in version 1 depends on the function's implementation (i.e. the code of the function's body). In our case, the bug leads to a NullPointerException. If the code changes later, another exception might be thrown, or no exception might be thrown, but the function simply returns an empty list or null.

    • The behavior can be stated in the method's documentation comment (not shown above), and is a useful API information for users of the method.

    • The fact that it is illegal to call the method with input file set to null is part of the official contract between the caller and the callee.

  • The function is called with input argument lineRegex set to null.

    This case is similar to case 1. Version 2 is better for the same reasons.

  • The file doesn't exist.

    • In version 1, it depends on what Files.readAllLines ( file.toPath() ) does, if the file doesn't exist. If we look at the Java API documentation, we see that there is no information about this case. So, we have to look at the JDK's source code or just try it out. I did the latter and a NoSuchFileException was thrown. However, we don't have a guarantee that the same will happen in a future version.

    • In version 2, an IllegalArgumentException is thrown. Again, this is better because the exception doesn't depend on statements used within the function. There is a clear and controlled error message.

      Moreover, calling the method with a file that doesn't exist is an error that must be attributed to the caller. So, it is semantically more correct to throw an IllegalArgumentException instead of an arbitrary exception that depends on the method body.

  • The file can't be read (e.g. user has no access right)

    • Similar to the previous case, in version 1 it depends on what Files.readAllLines() does. According to the Java API documentation, an IOException or a SecurityException is thrown.

    • In version 2, we take control over this situation and we throw a specific exception. Our intention is clearly documented in the code and API.

  • The file is empty

    • In version 1, Files.readAllLines() returns an empty list, which means that the result returned from the function will also be an empty list. So, version 1 returns the same result in semantically different cases - in case of an empty file as well as in case of a non-empty file that doesn't contain lines not matching the regular expression.

    • In version 2, we explicitly throw an exception with a clear error message

    Version 2 is (at least in most scenarios) better, because executing the function for an empty file doesn't make sense and the file is probably empty due to a previous anomaly of the system. The client code is forced to take appropriate action.

    Version 2 also helps in debugging. Suppose that an empty file is a case that shouldn't appear under normal conditions. In version 1, the function simply returns a normal value (and not an error), which means that the anomaly of an empty file is less likely to be detected during the software tests.

  • The file contains empty lines
    • In version 1, all empty lines are contained in the result, which is very probably not what we want.
    • In version 2, empty lines are explicitly ignored.

We can see that version 2 behaves better in all corner-cases.

We control every potential runtime problem individually and take care in an appropriate way. This leads to consistent behavior and makes the code more maintainable, because the error information returned to client code doesn't depend on inner calls to other functions and the error messages are specific and clear.

The downside is of course that we have to write a lot of error handling code. And nothing prevents us from forgetting to appropriately handle anyone of the painful edge conditions.

But this shouldn't come as a surprise. It is a well known fact that writing robust and maintainable code is hard and requires a lot of discipline and experience.

Wouldn't it be nice if the programming language helped us and made it less hard?

Before answering this question, let us first look at a unit-test method:

public static void test() {
   try {
      // case 1: lines found

      List<String> params = Arrays.asList
      ( "start_line: 10", "end_line:20", 
      "verbose:yes", "illegal", "missing_value:", ":missing_name" );
      File testFile = createTemporaryTextFile ( params );
      Pattern regex = Pattern.compile ( "\\w+:[ \\t]*\\w+" );
      List<String> linesFound = findLinesNotMachingRegexInFile ( testFile, regex );
      assert linesFound.toString().equals ( "[illegal, missing_value:, :missing_name]" );

      // case 2: no lines found

      params = Arrays.asList ( "start_line: 10", 
      "end_line:20", "", "verbose:yes", "" );
      testFile = createTemporaryTextFile ( params );
      linesFound = findLinesNotMachingRegexInFile ( testFile, regex );
      assert linesFound == null;

      // case 4: empty file

      testFile = createTemporaryTextFile ( new ArrayList<String>() );
      boolean exceptionOccured = false;
      try {
         linesFound = findLinesNotMachingRegexInFile ( testFile, regex );
      } catch ( IllegalArgumentException e ) {
         exceptionOccured = true;
      }
      assert exceptionOccured;

      // case 4: error (file doesn't exit)

      testFile = new File ( "C:\\sdhkdjhgkjdhgkdfgdkghsdfgdfghhdf.txt" );
      exceptionOccured = false;
      try {
         linesFound = findLinesNotMachingRegexInFile ( testFile, regex );
      } catch ( IllegalArgumentException e ) {
         exceptionOccured = true;
      }
      assert exceptionOccured;

   } catch ( IOException e ) {
      e.printStackTrace();
      assert false;
   }
}

private static File createTemporaryTextFile ( List<String> lines ) throws IOException {

   File file = File.createTempFile ( "temp", null );
   file.deleteOnExit();
   Files.write ( file.toPath(), lines, StandardOpenOption.WRITE );

   return file;
}

PPL version

The implementation in PPL looks like this:

command find_lines_not_maching_regex_in_file
   in file file check:file.exists
   in regex line_regex
   %system_error_handler_input_argument

   out nullable list<string> result
   out nullable file_error error
   out_check: not ( result #r null and error #r null ) // 'result' and 'error' cannot both be non-null

   script

      // read lines from file and store result in constant 'lines'
      se_text_file_IO.restore_lines_from_text_file (
         file = i_file
         error_handler = i_error_handler ) \
         ( const lines = result
         const file_read_error = error )

      // if there was an error reading the file then return immediately
      if file_read_error is not null then
         result = null
         error = file_error.create (
            description = """File {{file.path}} cannot be read. 
            Reason: {{file_read_error.description}}"""
            cause = file_read_error
            resource = file )
         return
      .

      // if there are no lines in the file then return an error
      if lines is null then
         result = null
         error = file_error.create (
            description = """File {{file.path}} is empty."""
            resource = file )
         return
      .

      // store all lines that don't match the regex in a mutable string list
      const r = mutable_list<string>.create
      repeat for each line in lines
         if line is not null then // ignore empty lines
            if not line.matches_regex ( line_regex ) then
               r.append ( line )
            .
         .
      .

      // if lines have been found then return an immutable list, else return 'null'
      result = r.make_immutable_or_null
      error = null
   .
.

Here are some explanations:

  • in file file check:file.exists
    in regex line_regex

    The command's first input argument is named file. It is of type non-nullable file. The check clause defines a precondition requiring that the file must exist at the moment of calling the command. If the file doesn't exist, a program error is thrown.

    The second input argument is named line_regex and is of type non-nullable regex.

  • %system_error_handler_input_argument

    This is an example of a source code template in PPL. In this article, we will not look at how templates work. Please refer to the language manual if you want more information. It suffices to know that the compiler expands this template identifier to:

    in system_error_handler error_handler default:se_system_utilities.default_system_error_handler

    This means that there is a third input argument named error_handler. We will also not discuss the role of this input argument. The basic idea is that, in PPL, commands that might fail at runtime accept an error handler that, by default, sends an error message to the operating system's error device.

  • out nullable list<string> result
    out nullable file_error error

    PPL supports multiple output arguments. This command has two nullable output arguments: result and error.

    Each time the command is called there are three outcomes possible:

    1. Lines not matching the regex have been found in the file:

      In this case, output argument result holds the list of lines found and error is null.

    2. No lines have been found:

      In this case, result and error are both null.

    3. A runtime error occurs:

      In this case, result is null and error points to an error object describing the problem.

    [Note] Note
    Unlike Java and other programming languages, PPL doesn't use an exception mechanisms to signal resource errors to client code. Instead, it uses multiple output arguments, as explained above. The rationale for this is not covered here - it might be the subject of another article in the future.
  • out_check: not ( result #r null and error #r null ) // 'result' and 'error' cannot both be non-null

    This is an example of a post-condition (Design by Contract). It states that the command will never return with output arguments result and error both pointing to a non-null value. A program error is thrown if the command's implementation violates this condition at runtime.

  • // read lines from file and store result in constant 'lines'
    se_text_file_IO.restore_lines_from_text_file (
       file = i_file
       error_handler = i_error_handler ) \
       ( const lines = result
       const file_read_error = error )

    se_text_file_IO is a PPL service (similar to a class in Java with only static members). To read the file's text content, we use command restore_lines_from_text_file in this service. This command has two input arguments:

    • The file to be read - to which we assign input argument file of command find_lines_not_maching_regex_in_file. Note: The i_ prefix in i_file is optional and used here to explicitly state that we assign an input argument.

    • An error_handler used to handle any file error that might occur.

    It also has two output arguments:

    • result is a list of strings, each string representing a line in the text file. This value is stored into a local script constant named lines.

    • error points to an error object in case of a file-read problem. This output value is stored into a local script constant named file_read_error.

The rest of the script should be self-explanatory.

In a previous article with the title Why We Should Love 'null', we saw that the reason for so many null pointer bugs in real world applications is the simple fact that we often forget to check for null. In our present example there are, again, many things we could forget to check. Therefore, let us now analyze what happens if we forget to explicitly handle anyone of the six corner-cases we saw already in the previous section.

  1. We forget to check if input argument file is null:

    This can't happen, because in PPL all input arguments (and, more generally, all object references) are non-nullable by default.

  2. We forget to check if input argument line_regex is null:

    This can't happen for the same reason: line_regex is non-nullable by default.

  3. We forget to check that the file must exist:

    In this case se_text_file_IO.restore_lines_from_text_file will report an error and this error will be forwarded to the client code. This is not an ideal solution, because the error forwarded to the client depends on the supplier's implementation, which means the type of error could change when the implementation changes. But, more importantly, the condition 'file must exist' is not specified in the contract between the client and supplier, and is therefore also not visible in the API documentation. It is much better to specify check:file.exists for the input argument, as we did in the code above.

  4. We forget to check the error reported in case of a file read error:

    Suppose that, instead of writing ...

    se_text_file_IO.restore_lines_from_text_file (
       file = i_file
       error_handler = i_error_handler ) \
       ( const lines = result
       const file_read_error = error )

    ... we don't consider file errors and simply write:

    se_text_file_IO.restore_lines_from_text_file (
       file = i_file
       error_handler = i_error_handler ) \
       ( const lines = result )
    [Note] Note

    An alternative syntax (also ignoring the error output) would be:

    const lines = se_text_file_IO.restore_lines_from_text_file.result (
       file = i_file
       error_handler = i_error_handler )

    This can't happen because the PPL compiler emits a warning each time the error output of a command is ignored in client code. This is similar to checked exceptions in Java and other programming languages. Checked exceptions thrown in supplier code must be caught in client code, or else a compiler error occurs.

    One might wonder what happens if we don't forget to store the error reported by restore_lines_from_text_file into a local constant (i.e. we actually write const file_read_error = error), but then we forget to check if an error is reported in file_read_error, i.e. we don't code ...

    if file_read_error is not null then
     ...

    This bug can't happen too, because the compiler reports an error whenever a local constant or variable is declared, but never used in the script.

  5. We forget to check if the file is empty, i.e. we don't write:

    if lines is null then
     ...

    This can't happen, because the compiler would report an error in the instruction:

    repeat for each line in lines
    

    Explanation: The expression that defines the collection to be used in a repeat for each instruction (in our case: lines) must be of type non-nullable, or it must have been previously checked for non-null in the source code.

    In our case, constant lines is a nullable type (i.e. nullable list<nullable string>: it is either null or a list of strings that can be null). The reason is that there is no immutable empty list in idiomatic PPL. If the file is empty, restore_lines_from_text_file returns null as result (and output argument error is null too). So, the compiler knows that lines might be null, and therefore accepts it in the repeat for each instruction only if it has previously been checked to be non-null.

    [Note] Note
    Internally, the compiler uses static code analysis to make this kind of code verification.
  6. We forget to check for empty lines in the file:

    Again, this can't happen, because the compiler would report an error.

    Explanation:

    Look at:

    if not line.matches_regex ( line_regex ) then

    The expression line.matches_regex is only valid if line is a non-nullable type or if line has been checked to be non-null at runtime. This is one of the most important rules embedded in the PPL compiler - in the context of compile-time null-safety. It eliminates the risk for null pointer errors at run-time.

    In our case, line is a nullable type (i.e. nullable string). The reason is that there is no immutable empty string in idiomatic PPL. So, if the third line in the file is an empty line, then the third element in lines will be null, and the loop constant line will point to null at the third round of the repeat for each instruction. Therefore omitting to write ...

    if line is not null then

    ... results in an error reported by the compiler.

As we can see, there is only one corner-case out of six that risks to be forgotten by the programmer (see case 3 above: file must exist).

In all other corner-cases, the compiler assists us in writing more robust code, because forgetting to handle a special case results in a compiler error. A 'quick and dirty' version of the code wouldn't compile in PPL.

It is important to note that this compiler-assistance is possible only because of the design choices mentioned in the introduction of this article. In our case, it is a combination of the following rules that are relevant:

  • immutable collections cannot be empty (i.e. null is used to denote "no data")
  • immutable strings cannot be empty (i.e. null is used to denote "no data")
  • object references are non-nullable by default
  • null-safety is natively supported by the compiler
  • error objects returned by commands cannot be ignored in client code

Without these design choices, there would be more risks of forgetting to properly handle corner-cases.

Besides bugs in the command's implementation, the compiler is also able to detect some bugs in client code that calls the command. For example, the following bugs would lead to a compiler error message:

  • The command is called with input argument file set to null or set to an object of type nullable file that has not been checked to contain a non-null value at the moment of invoking the command.
  • An analogous check is done by the compiler for input argument line_regex
  • The error object returned by the command is ignored in the client code.

Command find_lines_not_maching_regex_in_file can be unit-tested with the following test script:

      test

         // case 1: lines found

         var params = '''start_line: 10
end_line:20
verbose:yes

illegal
missing_value:
:missing_name
'''
         se_text_file_IO.create_temporary_text_file (
            delete_file_on_exit = yes
            text = params ) (
            var test_file = result
            var file_error = error )
         verify file_error is null
         verify test_file is not null

         const regex = regex.create ( '''\w+:[ \t]*\w+''' )

         test ( file = test_file
            line_regex = regex )

         verify error is null
         verify result is not null
         verify result.to_long_string =v "[illegal, missing_value:, :missing_name]"

         // case 2: no lines found

         params = '''start_line: 10
end_line:20

verbose:yes

'''
         file_error = se_text_file_IO.store_string_to_existing_file (
            string = params
            file = test_file )
         verify file_error is null

         test ( file = test_file
            line_regex = regex )

         verify error is null
         verify result is null

         // case 3: empty file

         file_error = se_empty_file_utilities.empty_existing_file (
            file = test_file )
         verify file_error is null

         test ( file = test_file
            line_regex = regex )

         verify error is not null
         verify result is null

         // case 4: error (file doesn't exit)

         test_file = file.create ( file_path.create ( '''C:\sdhkdjhgkjdhgkdfgdkghsdfgdfghhdf.txt''' ) )
         test ( file = test_file
            line_regex = regex )
         verify_error
      .
   .

Conclusion

The last example in this article highlights one of the main reasons it is so hard to write robust and reliable software: corner cases.

It is easy to forget them and it requires a lot of discipline to handle them correctly.

They are a common cause (among others) for software project failures and large surpasses of time and budget estimates.

Unfortunately, corner cases appear often in all kinds of algorithms and domains.

Our 'quick and dirty' example code in the previous chapter was a Java method made up of only 7 statements. Nevertheless, there were not less than 6 corner cases that were ignored and led to incorrect behavior or unmaintainable software. This is an exceptionally high 'corner case per instruction' ratio. But just suppose that an average would be 1 corner case per 10 instructions. Then a small application made up of several thousand instructions contains hundreds of corner cases. Moreover, corner cases can interact with other corner cases, which leads to higher level corner cases which can ... (you get the point). Even very experienced programmers with the best intentions to write good code will forget some corner cases. And some of them will not be covered by the software tests and some of them will pop up in production mode.

What we need, therefore, is a programming environment that detects unhandled corner cases automatically, as far as this is technically feasible.

The good news is that, as this article has demonstrated, the design of the language can promote automatic detection of some corner cases. The risk of forgetting to handle them can be reduced considerably.

Compile-time null-safety is one very effective technique. It eliminates the null pointer error, the most frequent bug in many applications. Moreover, an object reference that points to null is pretty much always a corner case, and null-safety eliminates the risk of forgetting to explicitly handle that corner case.

Quite often, empty lists and empty strings represent a good number of corner cases. By not using empty lists and empty strings, but instead using null as suggested, the whole set of these corner cases is also covered by null-safety.

There are more techniques to discover other kinds of corner cases at compile-time. For example, the risk of division by zero could also be reported by the compiler and we could be forced to check for the denominator of a division to be non-zero, before being allowed to execute the division. Note: This is one item on PPL's to-do list.

Anyway, the final conclusion is the answer to our second question asked in the introduction:

Yes, the design of a programming language and its standard libraries can definitely help us to write more reliable and more maintainable code in less time.

A Personal Experience

One year ago PPL was different from today.

It didn't have null-safety. All objects were nullable. Collections and strings could be empty. I simply applied what I was used to after more than two decades of programming, and I didn't question the rationale.

However, after some sessions of pondering, I applied the design choices listed in the introduction. Then I was eager to see how these changes would affect my existing code that needed to be refactored in order to bow to the new rules.

Would my code become better?

The answer is a clear YES.

I was often happily surprised to see how I was forced to fix lousy code. The compiler didn't allow me anymore to write 'quick and dirty' code, such as ignoring potential runtime errors, not checking for null or not handling corner cases. Some bugs that lurked in the source code but didn't pop up yet at runtime were now automatically detected by the compiler and I had to fix them. No doubt - the quality of my code increased. I wouldn't want to go back anymore.

These observations are of course not representative. Maybe they are just the author's biased feelings. Never mind! My convictions are very strong and therefore I wanted to share and explain them in this article.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here