Table of Contents
Part IV: Source Code Examples
Introduction
Welcome to the last installment of this article series.
In part 1, we had a look at the very popular advice to return empty collections, and not null
from functions (in case of "no data" to return).
In part 2, we saw that this recommendation must be reconsidered. We looked at examples showing that the opposite is true: It is better to return null
instead of an empty collection.
In part 3, we looked at how lists are used in the physical world and our conclusion from part 2 was confirmed.
In this installment, we'll have a thorough look at source code examples that cover very common programming tasks, such as validating input and handling resource errors.
We will start with a trivial example and add some complexity in each subsequent step.
Each example will be shown in two different programming languages and we will compare different approaches.
Most importantly, we will try to find answers to the following questions:
-
What happens if we write 'bad code' - in both languages and with different approaches?
|
Note |
By 'bad code', I mean things like not checking for null or emptiness, not validating input arguments in a function, ignoring resource errors that might appear at runtime, etc. |
-
Can the design of a programming language and its standard libraries help us to write more reliable and more maintainable code in less time?
So, this article is not just about "How to ...", it is also about "Why to ...".
Same Same, But Not Same
Thai people are jocular. When they see two similar objects, you won't always hear them saying "These objects are similar". No, they say: "Same same, but not same".
The same could be said about the two programming languages we are going to use in this article: Java and PPL.
If you never heard about PPL, you are not alone. While Java is one of the most successful languages in the history of programming, supported by a huge developer community and mature for developing big mission-critical enterprise applications, exactly the opposite is true for PPL.
After many years of writing bad code (such as monstrous functions containing lots of variables and deeply nested instructions; ignoring errors returned by functions; forgetting to check for null
, etc. etc.) and suffering from its consequences, I wondered if there exists a programming language that could help me to write better code. I couldn't find what I was really looking for, but I discovered many good ideas and was sometimes speechless to see that some very effective and proven Fail-Fast! features (for example Design by Contract) were not supported in the majority of popular programming languages. Finally, I pondered about a new language and started to create it. I named it Practical Programming Language (PPL) to emphasize the fact that it should be useful and suitable for developing real world applications . The vision and ultimate goal is to provide a programming environment that helps to write more reliable code in less time. I had many ideas to achieve this goal. Some were good. Some were bad. I kept the features that worked well - and discarded those that didn't.
Today, I am absolutely convinced that the design of a programming language is a determinant factor in how easy or difficult it is for the average programmer to write high quality code in a reasonable time. This doesn't mean of course that a programming language can completely inhibit programmers from designing bad data structures or writing buggy and unmaintainable code. The point is this:
Some bugs and some bad programming practices can effectively be prevented through the design choices and features provided by the language.
For example, the very frequent bug of forgetting to check for null
(and leading to a null
pointer error) can completely be eliminated with compile-time null
-safety natively built into the language. The source code examples in this article (especially the last example) should demonstrate what I mean by 'helping to write more reliable code in less time'.
|
Note |
PPL is still an work in progress and not ready yet to develop mission-critical business applications.
For more information please refer to the website and read the FAQ.
|
Many fundamental concepts are very similar in Java and PPL. For example, both languages are object-oriented, compiled, statically typed and applications run on a Java Virtual Machine (JVM). However, there are a number of diametrically opposed design principles applied in both languages. The following table is by no means exhaustive. It shows some differences (and one sameness) - only those that are relevant in this article and which we will encounter in the source code examples.
Table 1. Comparison of some design choices in Java and PPL
|
Java |
PPL |
Collections |
- Collections are mutable by default
- All collections can be empty |
- Collections are immutable by default
- Only mutable collections can be empty |
Strings |
- Strings are immutable by default
- Any string can be empty |
- Strings are immutable by default
- Only mutable strings can be empty |
Functions that return collections |
Some functions return empty collections, others return null to denote "no data"
(depends on the author) |
All functions consistently return null to denote "no data" |
Null handling |
- No distinction between nullable and non-nullable types
- All types are nullable
- Null can be assigned to any object reference |
- Clear distinction between nullable and non-nullable types
- Types are non-nullable by default
- Null can only be assigned to nullable object references |
Compile-time null-safety |
Not natively supported,
but some third party tools provide limited support for null -handling |
Natively supported |
Optional pattern |
Supported since Java version 8 |
Not supported
(uses null and compile-time null-safety) |
Design by Contract |
Not natively supported,
but third party extensions exist |
Natively supported |
Unit testing |
not natively supported,
but third party extensions exist |
Natively supported |
Error handling |
Uses exception mechanism for program errors (e.g. stack overflow) and runtime errors (e.g. file read error) |
- Uses exception-like mechanism for program errors (e.g. stack overflow)
- Commands return an 'error' object (besides a nullable 'result' object) in case of runtime errors (e.g. file read error) |
Type inference |
Not supported |
Supported for local script constants and variables |
It is interesting to note that (as far as I know) many (if not most) popular programming languages tend to apply the design principles in column 'Java'. Some design choices in PPL might seem strange (for example, there are no empty immutable string
s in PPL). But these choices have all been made consciously and deliberately, because they all support the important Fail Fast! principle:
Errors should preferably be automatically detected at compile-time, or else as early as possible at run-time.
We will see how this is achieved.
Example 1: Return a Non-empty List
To get our feet wet, we start with a trivial example of a function that returns a non-empty string
list of continent names.
|
Note |
This is a long article (maybe too long). If you are an experienced programmer in a hurry, then you might want to skip the first two examples and jump to example 3. |
Java version
Here is the code in Java:
public static List<String> getContinents() {
List<String> result = new ArrayList<String>();
result.add ( "Africa" );
result.add ( "America" );
result.add ( "Antarctica" );
result.add ( "Asia" );
result.add ( "Australia" );
result.add ( "Europe" );
return Collections.unmodifiableList ( result );
}
The above example shows the three basics steps often involved in functions that return a collection:
- An empty, mutable collection is created.
- Elements are added to the mutable collection.
- The mutable collection is converted into an immutable one and returned as the result of the function.
|
Note |
Some readers might wonder why we cannot simply write ...
return result;
... instead of:
return Collections.unmodifiableList ( result );
The reason is that functions should always return immutable collections, unless there is a specific need for mutability. Readers not familiar with the benefits of immutable objects are encouraged to search the net for terms like 'advantages of immutable data structures'. There are some excellent articles available. In short: Immutable data have a simpler API, are less error-prone (not only in multi-processing environments), have no state transitions to handle and can be freely shared without the need for synchronization.
|
Java doesn't have list literals, but the above code can be simplified by using the convenient Arrays.asList
method which returns an immutable list:
public static List<String> getContinents() {
return Arrays.asList ( "Africa", "America", "Antarctica", "Asia", "Australia", "Europe" );
}
In practice, the above implementation could lead to a serious performance bottleneck, because a new list is created each time the method is called. This takes time to create unnecessary duplications of the list and to recycle them later by the garbage collector. And it takes memory to store them. To avoid this, we can use data caching. The basic idea is this: Create the list the first time the function is called, keep the list saved in a local variable, and then simply return the same list in subsequent calls. Here is an example:
private static List<String> continents = null;
public static List<String> getContinents() {
if ( continents == null ) {
continents = Arrays.asList ( "Africa", "America", "Antarctica", "Asia", "Australia", "Europe" );
}
return continents;
}
The first time getContinents()
is called, the private field continents
is null
. The list is then created and assigned to the private
field. Subsequent calls just return the list that has been created in the first call.
|
Note |
An even simpler solution would be to define a public static field that holds the list, as follows:
public static List<String> continents =
Arrays.asList ( "Africa", "America", "Antarctica", "Asia", "Australia", "Europe" );
But this solution has a drawback: It doesn't use lazy initialization. Suppose that the list is never used by the application. Then it is created the first time the class containing the field is loaded by the classloader. This consumes time and memory, and can therefore lead to high memory requirements and slow startup time if this pattern is used often.
|
PPL version
Here is the equivalent of the Java method getContinents()
, written in PPL:
command get_continents
out list<string> result
script
const r = mutable_list<string>.create
r.append ( "Africa" )
r.append ( "America" )
r.append ( "Antarctica" )
r.append ( "Asia" )
r.append ( "Australia" )
r.append ( "Europe" )
result = r.make_immutable
.
.
As most readers are not familiar with PPL, let's delve a bit into its syntax.
-
command get_continents
A command
in PPL is like a method
in Java. It has a name (get_continents
in our case), zero or more input arguments, zero or more output arguments, and performs an operation.
Note that PPL doesn't use curly brackets (braces) to embed a block of code. In PPL, code embedded within a block is indented, and the block is terminated by a dot (.
) on a single line. Hence, code like this in a language that uses braces ...
foo {
}
... is written like this in PPL:
foo
// body
.
Note also that, for better readability, PPL uses an underscore (_) to separate words in identifiers. PPL doesn't use camel-case. So, if you are used to ...
thisIsALongIdentifier
... you will see this in PPL:
this_is_a_long_identifier
-
out list<string> result
This instruction defines the command's output. PPL provides multiple output arguments - the counterpart of multiple input arguments. Therefore, each output argument also has a name. In our case, we have a single output argument of type list<string>
, named result
.
|
Note |
Multiple instructions can be written on a single line, separated by the instruction separator symbol | . Hence, we could also write:
command get_continents | out list<string> result
|
-
script
script
is a container for the command's implementation (i.e. the instructions that perform the operation).
-
const r = mutable_list<string>.create
This instruction declares a local constant named r
and initialized to an empty mutable list of string
s.
-
r.append ( "Africa" )
The string
"Africa
" is appended to r
-
result = r.make_immutable
This instruction converts the mutable list contained in r
into an immutable one which is then assigned to output argument result
.
PPL provides list literals. And the script
instruction is optional. The code can be shortened as follows:
command get_continents
out list<string> result
result = ["Africa", "America", "Antarctica", "Asia", "Australia", "Europe" ]
.
We could now improve the code and use caching and lazy initialization, as we did with the Java version in the previous section. But this is not necessary. PPL implicitly uses lazy initialization for static
constant fields (unless we explicitly disable lazy initialization in the source code). We can simply define a constant static field like this:
const list<string> continents default:["Africa", "America", "Antarctica", "Asia", "Australia", "Europe" ]
If continents
is never used in the application, it will never be initialized, so there is no waste of time and memory. The list will be created the first time continents
is actually accessed in the application, and subsequent accesses will re-use the same list.
Example 2: There Might Be No Data To Return
While the previous example showed a function that always returns a non-empty list, we will now look at a function that might return with "no data".
The function in this example takes a nullable string as input and returns a list of characters containing all digits found in the string
. If the input string
is null
or doesn't contain digits, the function returns null
.
Java version
The following code shows an implementation in Java:
public static List<Character> getDigitsInString ( String string ) {
if ( string == null ) return null;
List<Character> result = new ArrayList<Character>();
for ( char ch : string.toCharArray() ) {
if ( Character.isDigit ( ch ) ) {
result.add ( ch );
}
}
if ( result.isEmpty() ) {
return null;
} else {
return Collections.unmodifiableList ( result );
}
}
The interesting part of this code is of course the if
statement at the end. If no digits are found in the input string
(i.e. result.isEmpty()
is true
) then the function returns null
, otherwise an immutable list containing the digits is returned.
If this pattern is used often, we can write a utility such as the following one:
public static <T> List<T> toUnmodifiableListOrNull ( List<T> list ) {
if ( list == null || list.isEmpty() ) {
return null;
} else {
return Collections.unmodifiableList ( list );
}
}
The last if
in getDigitsInString
can then be replaced by simply writing:
return toUnmodifiableListOrNull ( result );
|
Note |
An alternative solution would be of course to return an empty list if there are no digits found. But we will not consider this solution here because of the disadvantages of returning empty collections in case of "no data", as explained in part 2 of this article series. |
To test our function, we could use a third-party test-framework (like JUnit). But to keep things simple, we will just use standard Java functionality and write a small test method like this:
public static void test() {
List<Character> digits = getDigitsInString ( "asd123" );
assert digits.size() == 3;
assert digits.get(0) == '1'; assert digits.get(digits.size()-1) == '3'; assert digits.toString().equals ( "[1, 2, 3]" );
digits = getDigitsInString ( "asd" );
assert digits == null;
digits = getDigitsInString ( null );
assert digits == null;
}
If you want to try out the whole code without using an IDE, you can proceed as follows (after ensuring that Java is installed on your system):
-
Create file ListExample_02.java in any directory, with the following content:
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class ListExample_02 {
public static List<Character> getDigitsInString ( String string ) {
if ( string == null ) return null;
List<Character> result = new ArrayList<Character>();
for ( char ch : string.toCharArray() ) {
if ( Character.isDigit ( ch ) ) {
result.add ( ch );
}
}
if ( result.isEmpty() ) {
return null;
} else {
return Collections.unmodifiableList ( result );
}
}
public static void test() {
System.out.println ( "ListExample_02:" );
List<Character> digits = getDigitsInString ( "asd123" );
assert digits.size() == 3;
assert digits.get(0) == '1'; assert digits.get(digits.size()-1) == '3'; assert digits.toString().equals ( "[1, 2, 3]" );
System.out.println ( digits );
digits = getDigitsInString ( "asd" );
assert digits == null;
System.out.println ( digits );
digits = getDigitsInString ( null );
assert digits == null;
System.out.println ( digits );
}
public static void main ( String[] arguments ) {
test();
}
}
-
Compile the file by typing the following operating system command in a terminal window:
javac ListExample_02.java
-
Execute the program by typing:
java ListExample_02
-
The following output will be displayed:
ListExample_02:
[1, 2, 3]
null
null
|
Note |
You can download the source code for all examples in a .zip file (see link at the beginning of the article). |
PPL version
In PPL, the code looks like this:
command get_digits_in_string
in nullable string string
out nullable list<character> result
script
if string is null
result = null
return
.
const r = mutable_list<character>.create
repeat for each character in string
if character.is_digit then
r.append ( character )
.
.
result = r.make_immutable_or_null
.
.
Here are some points to consider:
-
in nullable string string
This instruction defines an input argument named string
, of type nullable string
. In PPL all object references are non-nullable by default. Therefore we have to use the keyword nullable
to clearly state that null
is a valid input value.
-
out nullable list<character> result
The command returns a nullable list of characters.
-
if string is null
result = null
return
.
If the command is called with null
as input, it returns with null
as result.
-
repeat for each character in string
if character.is_digit then
r.append ( character )
.
.
This is the loop that checks each character in the input string
. If the character is a digit, then it is appended to the mutable list in r
.
-
result = r.make_immutable_or_null
If the list in r
is empty, then the command returns null
as result, else it returns an immutable list.
Unit testing is natively supported in PPL. We can write a test script like this:
test
// case 1: there are digits
test "asd123"
verify result is not null
verify result.size =v 3
verify result.first =v '1'
verify result.last =v '3'
verify result.to_long_string =v "[1, 2, 3]"
// case 2: there are no digits
test "asd"
verify result is null
// case 3: input is null
test null
verify result is null
.
Some explanations:
-
test
Instructions within a test
block are executed to unit-test command get_digits_in_string
-
test "asd123"
Each test
instruction launches a test case. In this case, command get_digits_in_string
is called with "asd123"
as input.
-
verify result is not null
The verify
instruction is used to check the result returned by get_digits_in_string
. In this case we ensure that the result is non-null
.
-
verify result.size =v 3
There must be 3 elements in the list returned. Note the comparison operator =v
. The v
stands for v
alue comparison (i.e. the values of the objects are compared, not their references). If we wanted to compare references, we would have to use the reference comparison operator =r
.
-
verify result.first =v '1'
The first element in the result list must be the character 1
.
If you want to try out the whole code, you can proceed as follows (after installing PPL):
-
Open a terminal window in any directory.
-
Create a PPL project named list_examples
by entering the following command:
ppl create_project project_id=list_examples
PPL creates a sub-directory named list_examples
that contains a number of sub-directories and files.
-
Create directory examples
under the existing sub-directory work/ppl/source_code/list_examples/
.
In this new directory, create file se_list_example_02.ppl with the following content:
service list_example_02
command get_digits_in_string
in nullable string string
out nullable list<character> result
script
if string is null
result = null
return
.
const r = mutable_list<character>.create
repeat for each character in string
if character.is_digit then
r.append ( character )
.
.
result = r.make_immutable_or_null
.
test
// case 1: there are digits
test "asd123"
verify result is not null
verify result.size =v 3
verify result.first =v '1'
verify result.last =v '3'
verify result.to_long_string =v "[1, 2, 3]"
%write_line ( result.to_long_string )
// case 2: there are no digits
test "asd"
verify result is null
%write_line ( se_any_type_to_string.to_long_string ( result ) )
// case 3: input is null
test null
verify result is null
%write_line ( se_any_type_to_string.to_long_string ( result ) )
.
.
.
-
Compile and build your project by executing the compile_and_build
system file which is located in your projects root directory (i.e. compile_and_build.sh on Linux/Unix and compile_and_build.bat on Windows).
-
To execute the unit test, run the system command file run_tests
located in the PPL project's root directory (i.e. run_tests.sh on a Linux/Unix system, and run_tests.bat on a Windows system.
A message like the following will be displayed:
Running unit tests in list_examples
testing list_examples.examples.se_list_example_02
[1, 2, 3]
#null#
#null#
Objects tested: 1
BRAVO AND CONGRATULATIONS!
All tests passed without errors.
|
Note |
To get more information about creating PPL projects, please refer to chapter Developing a standalone PC application in the tutorial.
You can download the PPL project containing all examples in a .zip file (see link at the beginning of the article).
|
Example 3: Invalid Input Data
Checking input against invalid values is one of the most effective ways to apply defensive programming. It is an absolute requirement for robust, high quality software. In this context, input can mean any kind of data transferred from a source to a destination, such as user input, data read from external sources, input sent to a function, etc. It is important to check the validity immediately at the instant of getting the input. In case of a function, this means that checking the input should be the first thing done in the function.
In practice, there are two common ways to protect functions/methods/commands against being called with invalid input:
-
Exception mechanism: The first statements in the function check the input values and an exception is thrown if an invalid value is found.
This is the way used in Java and many other popular programming languages.
-
Design by Contract (also called Contract Programming): For each input argument, a precondition can be defined. In its simplest form, a precondition is a boolean expression that must evaluate to true
for the input value to be valid. If a precondition is not fulfilled, then a program error (exception) is thrown.
At first, both approaches appear to be very similar, because in both cases an exception is thrown if an input value is invalid. There is, however, an important difference that leads to Design by Contract clearly being the better technique.
In the first case (exception mechanism), the input arguments' conditions are defined in the object's implementation code, and not in the code that defines the object's interface. So, in Java (for example) if a string
input must not exceed 256 characters, then this condition is tested at the class level, and not at the interface level.
On the other hand, preconditions are part of the object's interface (type), and not of the implementation (factory).
This leads to considerable advantages:
-
The preconditions are part of the official contract between the caller and the callee. They appear in the API documentation. This means that a programmer looking at an API can immediately see the requirements for valid input arguments. For example, we wouldn't just see ...
String name
... for an input argument, but ...
String name check: name.size <= 256
... and reliably be told that a program error occurs if the function is called with a name exceeding 256 characters.
-
Preconditions are automatically inherited in child types and they can be made weaker (but not stronger) if appropriate. This is an application of the important Liskov Substitution Principle that states:
If S
is a subtype of T
, then objects of type T
may be replaced with objects of type S
without altering the behavior of the program.
-
Preconditions are automatically enforced in all implementations of a given type. So, if there are three factories/classes implementing the same type/interface, there is no risk of forgetting to code the precondition in one of the factories or to accidentally define different preconditions in the factories. Moreover, there is no risk of code duplication.
-
Individual preconditions can be evaluated programmatically at runtime. So, client code can programmatically check input data before calling a function with preconditions.
To see both approaches in action, we will write a function that creates a list of a given size, containing the same word in each position. The function has two input arguments:
-
The number of elements in the resulting list.
This number must be a positive integer greater than zero. This input should be optional. If no value is specified by the caller, then the value defaults to 1.
-
The word used to fill the list.
This is a string
that cannot be null
and must match the regular expression "\w+" (i.e. a string
composed of one or more letters, digits and underscores).
The function's result is a list of string
s.
Java version
Here is a possible implementation in Java:
public static List<String> createFilledWordList ( int num_elements, String word ) {
if ( num_elements <= 0 )
throw new IllegalArgumentException ( "'num_elements' must be greater than 0" );
if ( word == null )
throw new IllegalArgumentException ( "'word' cannot be null" );
if ( ! word.matches ( "\\w+" ) )
throw new IllegalArgumentException ( "'" + word + "' is not a word." );
List<String> result = new ArrayList<String>();
for ( int i = 0; i < num_elements; i++ ) {
result.add ( word );
}
return Collections.unmodifiableList ( result );
}
In Java, method overloading (not to be confounded with method overriding) enables us to define another method with the same name, as long as there are different types of input arguments. This can be used to simulate default input argument values. Besides the above method, we have to add the following method that can be called to create a list with one element:
public static List<String> createFilledWordList ( String word ) {
return createFilledWordList ( 1, word );
}
Here is a method that illustrates a unit test for createFilledWordList
:
public static void test() {
List<String> foos = createFilledWordList ( 3, "foo" );
assert foos.toString().equals ( "[foo, foo, foo]" );
foos = createFilledWordList ( "foo" );
assert foos.toString().equals ( "[foo]" );
boolean exceptionRaised = false;
try {
foos = createFilledWordList ( "foo bar" );
} catch ( Exception e ) {
exceptionRaised = true;
}
assert exceptionRaised;
}
PPL version
command create_filled_word_list
in positive_32 num_elements default:1
in string word check:word.matches_regex ( regex.create ( '''\w+''' ) )
out list<string> result
script
const r = mutable_list<string>.create
repeat num_elements times
r.append ( word )
.
result = r.make_immutable
.
.
Notes to the above code:
-
in positive_32 num_elements default:1
Here, we declare input argument num_elements
of type positive_32
with a default value of 1
.
Besides signed integers, PPL also provides unsigned integers. positive_32
denotes a 32 bits positive integer, excluding zero (there is also type zero_positive_32
which includes zero as value). In our case, using type positive_32
has two advantages::
-
We don't need to explicitly check if the input value is greater than zero.
-
The risk of calling create_filled_word_list
with num_elements
set to zero or a negative value (and thus causing a runtime error) is eliminated, because assigning a signed integer to an unsigned one (without explicit conversion) is a type compatibility violation which is reported as an error by the compiler.
The default
clause is used to specify an expression that represents the default value for input num_elements
. Calling create_filled_word_list
without specifying a value for num_elements
is the same as calling the command with the value 1
assigned to num_elements
.
It is not necessary to explicitly check num_elements
for null
, because, as said already, null
is not allowed by default.
-
in string word check:word.matches_regex ( regex.create ( '''\w+''' ) )
The second input argument is a (non-nullable) string
named word
, whose value must match the regular expression \w+
.
This instruction demonstrates the use of a precondition (Design by Contract). We use the check
clause to define a boolean expression that must be fulfilled before the command is executed. If create_filled_word_list
is called with input word
not containing a word, then a program error is thrown immediately.
Note the regular expression embedded between '''
. This is an example of a triple-apostrophed string literal which can contain new lines and characters that would need to be escaped in a standard string literal. Hence '''\w+'''
is equivalent to "\\w+"
-
out list<string> result
The output is a non-nullable list of non-nullable string
s.
-
The command's script should be self-explanatory. We create a mutable list of size num_elements
, containing word
in each position. At the end, an immutable list is returned.
A test script could be written like this:
test
// case 1: standard case
test ( num_elements = 3
word = "foo" )
verify result.to_long_string =v "[foo, foo, foo]"
// case 2: use default value for 'num_elements'
test ( word = "foo" )
verify result.to_long_string =v "[foo]"
// case 3: illegal input value
test ( word = "foo bar" )
verify_error // check that a program error is thrown
.
Example 4: Resource Errors
Now comes the most interesting and revealing example. Besides the points already covered in the previous examples, we will add another common difficulty: a system error that might occur at runtime and that has to be anticipated in the source code. Typical examples of these problems are: file read error, database not available, network connection error, etc.
Most importantly, we will look at what happens if we write 'bad code', and we will compare different approaches and analyze how they affect software reliability and maintainability.
The function we are going to write reads a text file and returns lines that don't match a regular expression. The function has two input arguments:
- the file to be read
- the regular expression used to check each line
The function returns a list of string
s representing the lines that don't match the regular expression.
A practical use case of this function would be to check the validity of a file.
Suppose, for example, a configuration file that contains parameters composed of key/value pairs such as these:
start_line: 10
end_line:20
verbose:yes
The following regular expression could then be used to find invalid entries in such a config file:
\w+:[ \t]*\w+
This regex translates to: The line must start with one or more word characters (letter or digit or underscore), followed by a colon (:
), followed by optional white space, followed by one or more word characters.
Java version
Here is a source code suggestion, written in Java:
public static List<String> findLinesNotMachingRegexInFile ( File file, Pattern lineRegex ) throws IOException {
List<String> lines = Files.readAllLines ( file.toPath() );
List<String> result = new ArrayList<>();
for ( String line : lines ) {
if ( ! lineRegex.matcher ( line ).matches() ) {
result.add ( line );
}
}
return result;
}
Is this code ok?
Suppose we have a file with the following content:
start_line: 10
illegal
verbose:yes
If we call findLinesNotMachingRegexInFile
with the regex \w+:[ \t]*\w+
then the function will correctly return a list containing the invalid line illegal
.
So, the code is ok, isn't it?
No, it isn't.
Any experienced programmer would point out immediately that the above code is just a 'quick and dirty' version of the code we should write. Yes, the above code is what most of us would like to write ("Please let me write simple code and don't bother me with exceptional cases that are unlikely to happen!"). It is a sad fact that this is indeed the kind of source code we often see in practice (and yes, I have myself often written code like this, and I am not proud of it).
The problem with this code is that the following six corner-cases are not handled explicitly:
- The function is called with input argument
file
set to null
.
- The function is called with input argument
lineRegex
set to null
.
- The file doesn't exist.
- The file can't be read (e.g. user has no access right).
- The file is empty.
- The file contains empty lines.
|
Note |
In an ideal world, our initial specification of the function would have mentioned all these corner-cases and tell us exactly what to do in each case. But we don't live in an ideal world, and real-world specifications often don't mention corner cases. This is no valid excuse to ignore them in our source code too.
|
|
Note |
Another corner case would be to call the function for a binary file that doesn't contain text. However, we will not discuss this case because there is unfortunately no easy and reliable way to know if a file contains text or not.
For more information, look at the Stackoverflow question Determining binary/text file type in Java? |
Another point is that the above method returns an empty list in case of no lines found, although it is better to return null
, as we saw in part 2.
It is not difficult to address all corner-cases. Here is an improved version:
public static List<String> findLinesNotMachingRegexInFile ( File file, Pattern lineRegex )
throws IOException {
if ( file == null )
throw new IllegalArgumentException ( "Input argument 'file' cannot be null." );
if ( ! file.exists() )
throw new IllegalArgumentException ( "File " + file + " doesn't exist." );
if ( lineRegex == null )
throw new IllegalArgumentException ( "Input argument 'lineRegex' cannot be null." );
List<String> lines;
try {
lines = Files.readAllLines ( file.toPath() );
} catch ( IOException e ) {
throw new IOException ( "File " + file + " cannot be read.", e );
}
if ( lines.isEmpty() ) {
throw new IllegalArgumentException ( "File " + file + " is empty." );
}
List<String> result = new ArrayList<>();
for ( String line : lines ) {
if ( ! line.isEmpty() ) { if ( ! lineRegex.matcher ( line ).matches() ) {
result.add ( line );
}
}
}
if ( result.isEmpty() ) {
return null;
} else {
return Collections.unmodifiableList ( result );
}
}
Now all corner-cases are handled explicitly, but ... the method body has more than tripled in size!
Is it worth it to write long code like this? Is this code really better?
In the article's introduction, we asked "What happens if we write 'bad code'?".
To find the answers, let us have a look at each corner-case and compare how the software behaves, depending on the code we write.
-
The function is called with input argument file
set to null
.
-
In version 1, a NullPointerException
is thrown when file.toPath()
is executed in the statement:
List<String> lines = Files.readAllLines ( file.toPath() );
-
In version 2, an IllegalArgumentException
is thrown.
Version 2 is clearly better because:
-
An IllegalArgumentException
tells us more than a NullPointerException
and helps in debugging. Suppose that the function resides in a third party library and that we don't have access to the source code. Then seeing a NullPointerException
occurring in findLinesNotMachingRegexInFile
makes it more difficult to find the bug. On the other hand, an IllegalArgumentException
with a clear error message (Input argument 'file' cannot be null
) immediately tells us the reason of the bug.
-
The behavior in version 1 depends on the function's implementation (i.e. the code of the function's body). In our case, the bug leads to a NullPointerException
. If the code changes later, another exception might be thrown, or no exception might be thrown, but the function simply returns an empty list or null
.
-
The behavior can be stated in the method's documentation comment (not shown above), and is a useful API information for users of the method.
-
The fact that it is illegal to call the method with input file
set to null
is part of the official contract between the caller and the callee.
-
The function is called with input argument lineRegex
set to null
.
This case is similar to case 1. Version 2 is better for the same reasons.
-
The file doesn't exist.
-
In version 1, it depends on what Files.readAllLines ( file.toPath() )
does, if the file doesn't exist. If we look at the Java API documentation, we see that there is no information about this case. So, we have to look at the JDK's source code or just try it out. I did the latter and a NoSuchFileException
was thrown. However, we don't have a guarantee that the same will happen in a future version.
-
In version 2, an IllegalArgumentException
is thrown. Again, this is better because the exception doesn't depend on statements used within the function. There is a clear and controlled error message.
Moreover, calling the method with a file that doesn't exist is an error that must be attributed to the caller. So, it is semantically more correct to throw an IllegalArgumentException
instead of an arbitrary exception that depends on the method body.
-
The file can't be read (e.g. user has no access right)
-
Similar to the previous case, in version 1 it depends on what Files.readAllLines()
does. According to the Java API documentation, an IOException
or a SecurityException
is thrown.
-
In version 2, we take control over this situation and we throw a specific exception. Our intention is clearly documented in the code and API.
-
The file is empty
-
In version 1, Files.readAllLines()
returns an empty list, which means that the result returned from the function will also be an empty list. So, version 1 returns the same result in semantically different cases - in case of an empty file as well as in case of a non-empty file that doesn't contain lines not matching the regular expression.
-
In version 2, we explicitly throw an exception with a clear error message
Version 2 is (at least in most scenarios) better, because executing the function for an empty file doesn't make sense and the file is probably empty due to a previous anomaly of the system. The client code is forced to take appropriate action.
Version 2 also helps in debugging. Suppose that an empty file is a case that shouldn't appear under normal conditions. In version 1, the function simply returns a normal value (and not an error), which means that the anomaly of an empty file is less likely to be detected during the software tests.
- The file contains empty lines
- In version 1, all empty lines are contained in the result, which is very probably not what we want.
- In version 2, empty lines are explicitly ignored.
We can see that version 2 behaves better in all corner-cases.
We control every potential runtime problem individually and take care in an appropriate way. This leads to consistent behavior and makes the code more maintainable, because the error information returned to client code doesn't depend on inner calls to other functions and the error messages are specific and clear.
The downside is of course that we have to write a lot of error handling code. And nothing prevents us from forgetting to appropriately handle anyone of the painful edge conditions.
But this shouldn't come as a surprise. It is a well known fact that writing robust and maintainable code is hard and requires a lot of discipline and experience.
Wouldn't it be nice if the programming language helped us and made it less hard?
Before answering this question, let us first look at a unit-test method:
public static void test() {
try {
List<String> params = Arrays.asList
( "start_line: 10", "end_line:20",
"verbose:yes", "illegal", "missing_value:", ":missing_name" );
File testFile = createTemporaryTextFile ( params );
Pattern regex = Pattern.compile ( "\\w+:[ \\t]*\\w+" );
List<String> linesFound = findLinesNotMachingRegexInFile ( testFile, regex );
assert linesFound.toString().equals ( "[illegal, missing_value:, :missing_name]" );
params = Arrays.asList ( "start_line: 10",
"end_line:20", "", "verbose:yes", "" );
testFile = createTemporaryTextFile ( params );
linesFound = findLinesNotMachingRegexInFile ( testFile, regex );
assert linesFound == null;
testFile = createTemporaryTextFile ( new ArrayList<String>() );
boolean exceptionOccured = false;
try {
linesFound = findLinesNotMachingRegexInFile ( testFile, regex );
} catch ( IllegalArgumentException e ) {
exceptionOccured = true;
}
assert exceptionOccured;
testFile = new File ( "C:\\sdhkdjhgkjdhgkdfgdkghsdfgdfghhdf.txt" );
exceptionOccured = false;
try {
linesFound = findLinesNotMachingRegexInFile ( testFile, regex );
} catch ( IllegalArgumentException e ) {
exceptionOccured = true;
}
assert exceptionOccured;
} catch ( IOException e ) {
e.printStackTrace();
assert false;
}
}
private static File createTemporaryTextFile ( List<String> lines ) throws IOException {
File file = File.createTempFile ( "temp", null );
file.deleteOnExit();
Files.write ( file.toPath(), lines, StandardOpenOption.WRITE );
return file;
}
PPL version
The implementation in PPL looks like this:
command find_lines_not_maching_regex_in_file
in file file check:file.exists
in regex line_regex
%system_error_handler_input_argument
out nullable list<string> result
out nullable file_error error
out_check: not ( result #r null and error #r null ) // 'result' and 'error' cannot both be non-null
script
// read lines from file and store result in constant 'lines'
se_text_file_IO.restore_lines_from_text_file (
file = i_file
error_handler = i_error_handler ) \
( const lines = result
const file_read_error = error )
// if there was an error reading the file then return immediately
if file_read_error is not null then
result = null
error = file_error.create (
description = """File {{file.path}} cannot be read.
Reason: {{file_read_error.description}}"""
cause = file_read_error
resource = file )
return
.
// if there are no lines in the file then return an error
if lines is null then
result = null
error = file_error.create (
description = """File {{file.path}} is empty."""
resource = file )
return
.
// store all lines that don't match the regex in a mutable string list
const r = mutable_list<string>.create
repeat for each line in lines
if line is not null then // ignore empty lines
if not line.matches_regex ( line_regex ) then
r.append ( line )
.
.
.
// if lines have been found then return an immutable list, else return 'null'
result = r.make_immutable_or_null
error = null
.
.
Here are some explanations:
-
in file file check:file.exists
in regex line_regex
The command's first input argument is named file
. It is of type non-nullable file
. The check
clause defines a precondition requiring that the file must exist at the moment of calling the command. If the file doesn't exist, a program error is thrown.
The second input argument is named line_regex
and is of type non-nullable regex
.
-
%system_error_handler_input_argument
This is an example of a source code template in PPL. In this article, we will not look at how templates work. Please refer to the language manual if you want more information. It suffices to know that the compiler expands this template identifier to:
in system_error_handler error_handler default:se_system_utilities.default_system_error_handler
This means that there is a third input argument named error_handler
. We will also not discuss the role of this input argument. The basic idea is that, in PPL, commands that might fail at runtime accept an error handler that, by default, sends an error message to the operating system's error device.
-
out nullable list<string> result
out nullable file_error error
PPL supports multiple output arguments. This command has two nullable output arguments: result
and error
.
Each time the command is called there are three outcomes possible:
-
Lines not matching the regex have been found in the file:
In this case, output argument result
holds the list of lines found and error
is null
.
-
No lines have been found:
In this case, result
and error
are both null
.
-
A runtime error occurs:
In this case, result
is null
and error
points to an error object describing the problem.
|
Note |
Unlike Java and other programming languages, PPL doesn't use an exception mechanisms to signal resource errors to client code. Instead, it uses multiple output arguments, as explained above. The rationale for this is not covered here - it might be the subject of another article in the future. |
-
out_check: not ( result #r null and error #r null ) // 'result' and 'error' cannot both be non-null
This is an example of a post-condition (Design by Contract). It states that the command will never return with output arguments result
and error
both pointing to a non-null value. A program error is thrown if the command's implementation violates this condition at runtime.
-
// read lines from file and store result in constant 'lines'
se_text_file_IO.restore_lines_from_text_file (
file = i_file
error_handler = i_error_handler ) \
( const lines = result
const file_read_error = error )
se_text_file_IO
is a PPL service (similar to a class in Java with only static
members). To read the file's text content, we use command restore_lines_from_text_file
in this service. This command has two input arguments:
-
The file
to be read - to which we assign input argument file
of command find_lines_not_maching_regex_in_file
. Note: The i_
prefix in i_file
is optional and used here to explicitly state that we assign an input argument.
-
An error_handler
used to handle any file error that might occur.
It also has two output arguments:
-
result
is a list of string
s, each string
representing a line in the text file. This value is stored into a local script constant named lines
.
-
error
points to an error object in case of a file-read problem. This output value is stored into a local script constant named file_read_error
.
The rest of the script should be self-explanatory.
In a previous article with the title Why We Should Love 'null', we saw that the reason for so many null
pointer bugs in real world applications is the simple fact that we often forget to check for null
. In our present example there are, again, many things we could forget to check. Therefore, let us now analyze what happens if we forget to explicitly handle anyone of the six corner-cases we saw already in the previous section.
-
We forget to check if input argument file
is null
:
This can't happen, because in PPL all input arguments (and, more generally, all object references) are non-nullable by default.
-
We forget to check if input argument line_regex
is null
:
This can't happen for the same reason: line_regex
is non-nullable by default.
-
We forget to check that the file must exist:
In this case se_text_file_IO.restore_lines_from_text_file
will report an error and this error will be forwarded to the client code. This is not an ideal solution, because the error forwarded to the client depends on the supplier's implementation, which means the type of error could change when the implementation changes. But, more importantly, the condition 'file must exist' is not specified in the contract between the client and supplier, and is therefore also not visible in the API documentation. It is much better to specify check:file.exists
for the input argument, as we did in the code above.
-
We forget to check the error reported in case of a file read error:
Suppose that, instead of writing ...
se_text_file_IO.restore_lines_from_text_file (
file = i_file
error_handler = i_error_handler ) \
( const lines = result
const file_read_error = error )
... we don't consider file errors and simply write:
se_text_file_IO.restore_lines_from_text_file (
file = i_file
error_handler = i_error_handler ) \
( const lines = result )
|
Note |
An alternative syntax (also ignoring the error output) would be:
const lines = se_text_file_IO.restore_lines_from_text_file.result (
file = i_file
error_handler = i_error_handler )
|
This can't happen because the PPL compiler emits a warning each time the error output of a command is ignored in client code. This is similar to checked exceptions in Java and other programming languages. Checked exceptions thrown in supplier code must be caught in client code, or else a compiler error occurs.
One might wonder what happens if we don't forget to store the error reported by restore_lines_from_text_file
into a local constant (i.e. we actually write const file_read_error = error
), but then we forget to check if an error is reported in file_read_error
, i.e. we don't code ...
if file_read_error is not null then
...
This bug can't happen too, because the compiler reports an error whenever a local constant or variable is declared, but never used in the script.
-
We forget to check if the file is empty, i.e. we don't write:
if lines is null then
...
This can't happen, because the compiler would report an error in the instruction:
repeat for each line in lines
Explanation: The expression that defines the collection to be used in a repeat for each
instruction (in our case: lines
) must be of type non-nullable, or it must have been previously checked for non-null in the source code.
In our case, constant lines
is a nullable type (i.e. nullable list<nullable string>
: it is either null
or a list of strings that can be null
). The reason is that there is no immutable empty list in idiomatic PPL. If the file is empty, restore_lines_from_text_file
returns null
as result
(and output argument error
is null
too). So, the compiler knows that lines
might be null
, and therefore accepts it in the repeat for each
instruction only if it has previously been checked to be non-null
.
|
Note |
Internally, the compiler uses static code analysis to make this kind of code verification. |
-
We forget to check for empty lines in the file:
Again, this can't happen, because the compiler would report an error.
Explanation:
Look at:
if not line.matches_regex ( line_regex ) then
The expression line.matches_regex
is only valid if line
is a non-nullable type or if line
has been checked to be non-null
at runtime. This is one of the most important rules embedded in the PPL compiler - in the context of compile-time null
-safety. It eliminates the risk for null
pointer errors at run-time.
In our case, line
is a nullable type (i.e. nullable string
). The reason is that there is no immutable empty string
in idiomatic PPL. So, if the third line in the file is an empty line, then the third element in lines
will be null
, and the loop constant line
will point to null
at the third round of the repeat for each
instruction. Therefore omitting to write ...
if line is not null then
... results in an error reported by the compiler.
As we can see, there is only one corner-case out of six that risks to be forgotten by the programmer (see case 3 above: file must exist).
In all other corner-cases, the compiler assists us in writing more robust code, because forgetting to handle a special case results in a compiler error. A 'quick and dirty' version of the code wouldn't compile in PPL.
It is important to note that this compiler-assistance is possible only because of the design choices mentioned in the introduction of this article. In our case, it is a combination of the following rules that are relevant:
- immutable collections cannot be empty (i.e.
null
is used to denote "no data")
- immutable strings cannot be empty (i.e.
null
is used to denote "no data")
- object references are non-nullable by default
null
-safety is natively supported by the compiler
- error objects returned by commands cannot be ignored in client code
Without these design choices, there would be more risks of forgetting to properly handle corner-cases.
Besides bugs in the command's implementation, the compiler is also able to detect some bugs in client code that calls the command. For example, the following bugs would lead to a compiler error message:
- The command is called with input argument
file
set to null
or set to an object of type nullable file
that has not been checked to contain a non-null
value at the moment of invoking the command.
- An analogous check is done by the compiler for input argument
line_regex
- The
error
object returned by the command is ignored in the client code.
Command find_lines_not_maching_regex_in_file
can be unit-tested with the following test script:
test
// case 1: lines found
var params = '''start_line: 10
end_line:20
verbose:yes
illegal
missing_value:
:missing_name
'''
se_text_file_IO.create_temporary_text_file (
delete_file_on_exit = yes
text = params ) (
var test_file = result
var file_error = error )
verify file_error is null
verify test_file is not null
const regex = regex.create ( '''\w+:[ \t]*\w+''' )
test ( file = test_file
line_regex = regex )
verify error is null
verify result is not null
verify result.to_long_string =v "[illegal, missing_value:, :missing_name]"
// case 2: no lines found
params = '''start_line: 10
end_line:20
verbose:yes
'''
file_error = se_text_file_IO.store_string_to_existing_file (
string = params
file = test_file )
verify file_error is null
test ( file = test_file
line_regex = regex )
verify error is null
verify result is null
// case 3: empty file
file_error = se_empty_file_utilities.empty_existing_file (
file = test_file )
verify file_error is null
test ( file = test_file
line_regex = regex )
verify error is not null
verify result is null
// case 4: error (file doesn't exit)
test_file = file.create ( file_path.create ( '''C:\sdhkdjhgkjdhgkdfgdkghsdfgdfghhdf.txt''' ) )
test ( file = test_file
line_regex = regex )
verify_error
.
.
Conclusion
The last example in this article highlights one of the main reasons it is so hard to write robust and reliable software: corner cases.
It is easy to forget them and it requires a lot of discipline to handle them correctly.
They are a common cause (among others) for software project failures and large surpasses of time and budget estimates.
Unfortunately, corner cases appear often in all kinds of algorithms and domains.
Our 'quick and dirty' example code in the previous chapter was a Java method made up of only 7 statements. Nevertheless, there were not less than 6 corner cases that were ignored and led to incorrect behavior or unmaintainable software. This is an exceptionally high 'corner case per instruction' ratio. But just suppose that an average would be 1 corner case per 10 instructions. Then a small application made up of several thousand instructions contains hundreds of corner cases. Moreover, corner cases can interact with other corner cases, which leads to higher level corner cases which can ... (you get the point). Even very experienced programmers with the best intentions to write good code will forget some corner cases. And some of them will not be covered by the software tests and some of them will pop up in production mode.
What we need, therefore, is a programming environment that detects unhandled corner cases automatically, as far as this is technically feasible.
The good news is that, as this article has demonstrated, the design of the language can promote automatic detection of some corner cases. The risk of forgetting to handle them can be reduced considerably.
Compile-time null
-safety is one very effective technique. It eliminates the null
pointer error, the most frequent bug in many applications. Moreover, an object reference that points to null
is pretty much always a corner case, and null
-safety eliminates the risk of forgetting to explicitly handle that corner case.
Quite often, empty lists and empty string
s represent a good number of corner cases. By not using empty lists and empty string
s, but instead using null
as suggested, the whole set of these corner cases is also covered by null
-safety.
There are more techniques to discover other kinds of corner cases at compile-time. For example, the risk of division by zero could also be reported by the compiler and we could be forced to check for the denominator of a division to be non-zero, before being allowed to execute the division. Note: This is one item on PPL's to-do list.
Anyway, the final conclusion is the answer to our second question asked in the introduction:
Yes, the design of a programming language and its standard libraries can definitely help us to write more reliable and more maintainable code in less time.
A Personal Experience
One year ago PPL was different from today.
It didn't have null
-safety. All objects were nullable. Collections and string
s could be empty. I simply applied what I was used to after more than two decades of programming, and I didn't question the rationale.
However, after some sessions of pondering, I applied the design choices listed in the introduction. Then I was eager to see how these changes would affect my existing code that needed to be refactored in order to bow to the new rules.
Would my code become better?
The answer is a clear YES.
I was often happily surprised to see how I was forced to fix lousy code. The compiler didn't allow me anymore to write 'quick and dirty' code, such as ignoring potential runtime errors, not checking for null
or not handling corner cases. Some bugs that lurked in the source code but didn't pop up yet at runtime were now automatically detected by the compiler and I had to fix them. No doubt - the quality of my code increased. I wouldn't want to go back anymore.
These observations are of course not representative. Maybe they are just the author's biased feelings. Never mind! My convictions are very strong and therefore I wanted to share and explain them in this article.