It is recommended (but not required for experienced programmers) to read the articles in their order of publication, starting with Part 1: What? Why? How?.
Introduction
Do you like to write code that handles errors?
Most programmers (including myself) don't. We prefer coding "the happy path" — handling errors is "no fun." We hope that our applications will run in a world where required files always exist, databases never fail, network connections are always available, and malicious people are unheard of.
Practice shows that error-handling is often neglected (especially in the early stages of a project), because it requires a good deal of dedication, discipline, experience, and expertise. A common pitfall is to think that error-handling "can be done later," since this too often means that it'll never be done, because "there are deadlines, and we need and want to add features."
At the same time, we're also aware of the importance of error-handling, because it helps to identify and resolve issues quickly. Good error-handling is critical for creating software that is reliable, robust, secure, fault-tolerant, maintainable, and user-friendly.
To put it shortly: Error-handling is important, but nobody wants to do it!
Therefore, a type system designed for reliability should:
-
protect us from accidentally forgetting to handle errors
-
facilitate all error-handling variations as much as possible (including explicitly ignoring errors), and support a succinct syntax that is easy to read and write
This article shows how PTS aims to achieve this.
Practical Considerations
Error-handling is a vast topic — too vast to be fully covered in this article.
The best strategy for handling errors largely depends on the application domain and the potential damages in a worst-case error scenario.
Simply aborting program execution as soon as an error occurs might be an acceptable approach in a stamp-inventory application for personal use, but applying the same approach in mission-critical enterprise software would be irresponsible.
PTS is designed to always be on the safe side by default, because this helps to write reliable, robust, and fault-tolerant software. For example, anticipated errors returned by functions can't be ignored. However, this strict approach also means that code might end up being overly verbose and complex in applications that don't require the highest level of reliability, robustness, and fault-tolerance.
Obviously, it's impossible to provide one-size-fits-all "rules" for error-handling. However, in the following sections, I'll provide some general guidelines (not rules set in stone) that might be useful.
Avoid Returning Many Types of Anticipated Errors!
Consider a high-level function named do_stuff
that calls lower-level functions executing various read/write operations on files and directories. These low-level functions in the call tree return anticipated errors such as file_not_found_error
, file_read_error
, file_write_error
, directory_read_error
, directory_access_error
. If each function in the tree propagates errors to its parent functions, then do_stuff
might end up with a horrible signature like this:
fn do_stuff -> string or \
file_not_found_error or file_read_error or file_write_error or \
directory_read_error or directory_access_error
Worse, each time a signature in a lower-level function is changed later on (e.g., an error type is added or removed), the signatures of all parent functions (including do_stuff
) need to be adapted accordingly.
While there are different solutions to avoid maintenance nightmares like this, a simple solution for higher-level functions is to just return a common parent type of all errors returned in the call tree. For example, do_stuff
can be simplified as follows:
fn do_stuff -> string or directory_or_file_error
Here, we assume that directory_or_file_error
is the parent type of all errors returned in the call tree.
Now suppose that, later on, database and network operations are added in the code. do_stuff
needs to be adapted:
fn do_stuff -> string or directory_or_file_error or database_error or network_error
But again, we can simplify by using a common parent type:
fn do_stuff -> string or IO_error
In practice, using an appropriate parent type from the onset (e.g., IO_error
) is often an acceptable solution, because:
-
It facilitates code maintenance.
-
Caller functions often don't care about which error occurred — they only care about whether an error occurred or not.
-
It hides implementation details which are irrelevant and might change in future versions.
It's important to note that error information is not lost if a higher-level function in the call-tree returns a higher-level error in the type hierarchy. For example, if a low-level function returns file_not_found_error
, then a higher-level function declared to return IO_error
still returns an instance of file_not_found_error
(i.e., a child-type of IO_error
), which can be explored by parent functions, or used for debugging/diagnostic purposes.
As a rule of thumb (that might be ignored if there is a good reason), functions shouldn't return many error types. It is often appropriate to declare a single error type which is a common parent type of all errors returned in the call tree. This leads to simpler and more maintainable code.
Use Wrappers if Appropriate!
There is another solution to the problem of "too many error types returned" explained in the previous section: define a dedicated error type that serves as a wrapper for all low-level errors, and return this wrapper in all low-level functions.
In our example, we could define type stuff_error
, a wrapper for all errors in the call tree:
type stuff_error
inherit: runtime_error
.
The signature of do_stuff
becomes:
fn do_stuff -> string or stuff_error
Lower-level functions also return stuff_error
, and they store the source error (the cause) into attribute cause
:
fn stuff_child -> null or stuff_error
...
const text = read_text_file ( file_path ) \
on file_read_error as e: return stuff_error.create (
message = e.message
cause = e )
...
.
To shorten the code, we could define a creator/constructor create_from_cause
for stuff_error
(not shown here), and then simply write:
const text = read_text_file ( file_path ) \
on file_read_error as e: return stuff_error.create_from_cause ( e )
Again, the low-level error information isn't lost, since it's stored in attribute cause
of stuff_error
.
See also: section Returning a Wrapper Error.
Use Unanticipated Errors if Appropriate!
Sometimes, we don't want to handle errors — we assume the code to be running in an environment where errors aren't supposed to occur at run-time. If an error still occurs despite our assumption, then an immediate termination is appropriate: the application writes an error message to STDERR
, and then aborts with exit code 1
.
In other words, instead of handling an error, we opt to just abort execution instead. This is also referred to as panicking — for example, in Rust the panic!
macro can be used to abort the application gracefully and release resources.
Aborting program execution in case of an error (i.e., panicking) is justified in various situations: for example, when experimenting with code, writing a prototype, building a personal stamp-inventory application, or when we just want to write quick and dirty code. Even in applications designed to handle errors, there might be specific cases where an immediate termination is preferable, for example, to avoid corrupt data on a long-term basis. Good advice related to this topic is provided in chapter, To panic! or Not to panic! of The Rust Programming Language.
While PTS is clearly not designed to abort by default, it does offer support for "crash early" approaches. You can abort (panic) if you have a good reason to do so — you just have to be explicit about it.
The basic idea is to convert anticipated errors into unanticipated ones every time you would have to deal with an anticipated error. Thus, instead of returning an anticipated error, you throw an unanticipated one. Let's see different ways to do this.
assert Statement
A first technique is to use an assert
statement to declare that an anticipated error isn't supposed to occur:
const result = customer_name_by_id ( customer_id )
assert result is not error
// continue with a customer object stored in 'result'
on error: throw Clause
A better and less verbose technique is to use the on error: throw
clause, which was introduced in section Throwing an Unanticipated Error:
const name = customer_name_by_id ( customer_id ) on error: throw
Utility Functions That Throw
Writing lots of on error: throw
clauses can be annoying. A better solution might therefore be to write utility functions that throw unanticipated errors, instead of returning anticipated errors.
For example, suppose that many functions in our quick-and-dirty throw-away prototype read non-empty text files. Under normal circumstances (i.e., where reliability matters), we would call a library function like the following:
// returns 'null' if the file is empty
fn read_text_file ( file_path ) -> string or null or file_read_error
Using this function requires checks for null
and file_read_error
in client code. To avoid these checks, we could define the following utility function, which assumes that file read errors don't occur and text files are never empty:
fn read_non_empty_text_file_or_throw ( file_path ) -> string (1)
case type of read_text_file ( file_path )
is string as content
return content
is null
throw program_error.create ( (2)
"""File {{path.to_string}} is empty.""" )
is file_read_error as e
throw program_error.create (
"""Could not read file {{path.to_string}} (3)
Reason: {{e.message}}""" )
.
.
(1) By convention, the function name suffix _or_throw
states that an unanticipated error might be thrown. Under normal conditions, the function returns a string
containing the content of the non-empty text file.
(2) An unanticipated error is thrown if the file is empty.
(3) An unanticipated error is thrown if the file can't be read.
A simpler version of the above function could be written as follows:
fn read_non_empty_text_file_or_throw ( file_path ) -> string
const result = read_text_file ( file_path )
assert result is not null and result is not error
return result
.
Client code is now simple and short, because null
- and error-handling is no longer needed:
const text = read_non_empty_text_file_or_throw ( file_path.create ( "example.txt" ) )
Using Unanticipated Errors in Private Code
Sometimes, it makes sense to use unanticipated errors in unexposed (private) parts of an application, because this can considerably simplify code and increase maintainability.
Suppose we are working on a complex parser with a main function like this:
fn parse ( string ) -> AST or syntax_error
Syntax errors are likely to be detected in low-level, private functions. Using anticipated errors in the whole call tree of function parse
can easily lead to verbose code, because all errors need to be handled (e.g., propagated to the parent function) and declared in the function signatures. Whenever error types in function signatures change, a lot of refactoring might be required. To avoid this maintenance burden, it might be better to throw unanticipated errors in the functions called by parse
, the root function in the call tree. Function parse
uses a try
statement to catch any unanticipated error, and converts it into an anticipated error which is then returned. The following simplified code illustrates this approach:
fn parse ( string ) -> AST or syntax_error
try
const AST = AST.create_empty
// parse the string and populate 'AST'
return AST
catch unanticipated_error as ue
return syntax_error.create ( message = ue.message )
.
.
Warning
The above techniques that convert anticipated errors into unanticipated ones should usually not be used in public APIs (e.g., public functions in libraries, and frameworks).
Public APIs must be expressive and grant reliability. Public functions should return anticipated errors providing useful error information whenever something goes wrong. Consumers, not suppliers, decide how to handle errors.
However, there are rare exceptions to this rule. For example, it might be better (even in a public API) to abort instead of continuing execution with wrong/corrupted data. Such crash-early/fail-fast behavior should be clearly documented and, as suggested already, functions that might throw should have their name suffixed with _or_throw
(e.g. do_it_or_throw
).
Don't Use null to Return Error Conditions!
Suppose you're designing the map
type (aka dictionary
, associated_array
) in a standard library. Method get
takes a key as input and returns the corresponding value stored in the map. Here's an interesting question: What should the method do if the key doesn't exist?
It's tempting to simply return null
, as is done in several libraries (e.g., Java Map.get
). Using PTS syntax, map
could be defined as follows:
type map<key_type child_of:hashable, value_type>
fn get ( key key_type ) -> value_type or null
// more methods
.
There are two downsides to this approach:
-
If the values in the map are allowed to be null
(e.g., map<string, string or null>
), an ambiguity arises.
For example, if a method call like map.get ( "foo" )
returns null
, it can mean two things: either there is no entry with key "foo"
, or there is an entry with key "foo"
and value null
.
-
If the values in the map aren't allowed to be null
(e.g. map<string, string>
), there's a risk of misinterpretation.
For example, if a method call like map.get ( "foo" )
returns null
, it could erroneously be interpreted in the client code as an entry with key "foo"
and value null
.
This risk for misinterpretation increases if a map with nullable values is later on changed to a map with non-null values (e.g., from map<string, string or null>
to map<string, string>
).
To eliminate the first problem (ambiguity when null
is returned), we can add method contains_key
(which is needed anyway):
type map<key_type child_of:hashable, value_type>
fn contains_key ( key key_type ) -> boolean
fn get ( key key_type ) -> value_type or null
// more methods
.
This works, because we can now call contains_key
to eliminate the ambiguity. But it doesn't work well. Firstly, method get
is error-prone, because one has to read the docs, be careful, and not forget to call contains_key
if get
returns null
. Secondly, calling get
first, and contains_key
afterward, is verbose and can result in very nasty bugs in case of race conditions caused by sharing a mutable map in concurrent or parallel processing environments.
This error-proneness vanishes if get
returns an error (instead of null
) whenever a key isn't contained in the map:
type map<key_type child_of:hashable, value_type>
fn get ( key key_type ) -> value_type or key_not_contained_in_map_error
// more methods
.
Client code is now required to check for key_not_contained_in_map_error
, e.g.:
const value = map.get ( "foo" ) on error as e: return e
We are protected from forgetting to check if the key actually exists in the map. Moreover, the fact that method get
could fail is also clearly expressed in the client code.
However, being forced to check for an error can be annoying, and leads to verbose code. If a key isn't contained in the map, you might want to:
-
throw an unanticipated error, because you assume that the key ought to be contained in the map
-
fallback to a default value
-
get null
, because you don't need to differentiate between the two possible meanings of null
These use cases can easily be covered by adding variations of the get
method:
type map<key_type child_of:hashable, value_type>
fn get ( key key_type ) -> value_type or key_not_contained_in_map_error
fn get_or_throw ( key key_type ) -> value_type
fn get_or_default ( key key_type, default value_type ) -> value_type
fn get_or_null ( key key_type ) -> value_type or null
// more methods
.
Providing many choices is sometimes counterproductive, but in this case, it is justified by the fact that map
is a fundamental data structure, defined in the standard library, and used in many different ways.
Besides providing a more versatile API, we benefit from the following:
-
The behavior of the four getter methods is clearly expressed by their signatures — the programmer probably doesn't need to read the docs to know which getter method to use (although he/she must still be aware of the potentially ambiguous meaning in the case of null
being returned by get_or_null
).
-
Client code is succinct in all cases and auto-documents the behavior in case of a non-existent key
. Here are a few examples:
const value_1 = map.get ( "foo" ) ^error
const value_2 = map.get_or_throw ( "foo" )
const value_3 = map.get_or_default ( key = "foo", default = "bar" )
const value_4 = map.get_or_null ( "foo" )
if value_4 is null then
// handle it
.