This article explains why immutable strings and collections shouldn't be permitted to be empty. It also shows how this concept is implemented in the Practical Type System (PTS). Several source code examples demonstrate the benefits.
Table of Contents
Empty String vs Null
Introduction
Did you ever wonder what's the difference between an empty string and null
?
For example, what's the difference between:
email = ""
... and:
email = null
Do both statements mean the same or not?
What are the pros and cons of using an empty string vs using null
?
Do we need both versions, or could we get rid of one to simplify things?
Let's see.
What's the Problem?
Before revealing the answers to the questions raised in the previous section, it's helpful to first investigate the infamous "test for empty and/or null?" problem, encountered in many popular programming languages (C-family languages, Java, JavaScript, Python, etc.). Chances are you have encountered this problem firsthand, in various situations.
Consider a service provider company that needs to send an important email to inform all customers about an upcoming change of conditions. Before sending the email, the staff wants to check the customers' email addresses, and ensure that each customer has an email address defined in the database.
Here's an excerpt of code that does the job, written in Java (the code would be similar in other languages):
for ( Customer customer : customers ) {
String id = customer.getId();
String email = customer.getEmail();
if ( ) {
writeError ( id + ": no email" );
} else {
writeInfo ( id + ": " + email );
}
}
The interesting part in this code is the if
statement that checks if an email address exists for the customer.
Practice shows that different developers are likely to check for the absence of an email address in different ways. Let's look at six possible variations.
Developer A follows the prevalent advice that functions should never return null
, but empty strings (and empty collections), in order to simplify code and avoid the dreaded null pointer error. Therefore he assumes that customer.getEmail()
returns an empty string if there's no email address. The code looks like this:
if ( email.isEmpty() ) {
writeError ( id + ": no email" );
} else {
writeInfo ( id + ": " + email );
}
Developer B is a member of a growing tribe of developers who embrace null
. Only null
should be used to represent the absence of value, and therefore she assumes that customer.getEmail()
returns null
if there's no email address:
if ( email == null ) {
writeError ( id + ": no email" );
} else {
writeInfo ( id + ": " + email );
}
Developer C wants to be on the safe side and tests for null
or an empty string:
if ( email == null || email.isEmpty() ) {
writeError ( id + ": no email" );
} else {
writeInfo ( id + ": " + email );
}
Developer D also wants to be on the safe side, but he's having a bad day and gets the code wrong:
if ( email.isEmpty() || email == null ) {
writeError ( id + ": no email" );
} else {
writeInfo ( id + ": " + email );
}
Note
The above code is wrong because it first tests email.isEmpty()
, before testing for null
, which means that a null pointer error is thrown if email
points to null
.
Developer E also gets it wrong, but not because the order of the operands is wrong. Instead of using the short-circuiting logical OR operator ||
, she uses the bitwise inclusive OR operator |
, which has the effect of a non-short-circuiting logical OR operator if both operands are of type boolean. Therefore a null pointer error is thrown if email
points to null
:
if ( email == null | email.isEmpty() ) {
writeError ( id + ": no email" );
} else {
writeInfo ( id + ": " + email );
}
Developer F is having a very bad day and forgets to check if there's an email address:
writeInfo ( id + ": " + email );
Note
If email
points to null
then the outcome of the above code is language-dependant.
-
In Java, appending null
to a string appends the string "null"
— hence a message like C123: null
is written.
-
Other languages (e.g. C#) append nothing, which results in C123:
.
-
Some languages might throw an exception at run-time.
-
The safest approach (in a null-safe language) is this one: the compiler generates an error, requiring us to decide what to do whenever we try to append a nullable value to a string.
Now let's see what happens at run-time.
Besides considering the above six code variations from the consumer side of customer.getEmail()
, we also need to take into account the possible variations on the supplier side.
If there is no email address defined, then customer.getEmail()
might:
-
return null
-
return an empty string
As there are six variations on the consumer side, and two on the supplier side, the (surprisingly high) number of combinations is: 6 x 2 = 12.
The following table shows the outcome for all combinations of supplier/consumer code. You don't need to scrutinize this table — you can just skim over it, because the goal of this example is to provide an idea of the complexity and error-proneness involved in this simple example.
Supplier
Returns
| Consumer
Checks
| Output
|
Correct
| Runtime
Error
| Silently
Ignored
Error
|
empty string
| A: empty
| ✔
| | |
B: null
| | | ✗
|
C: null, then empty
| ✔
| | |
D: empty, then null
| ✔
| | |
E: wrong operator
| ✔
| | |
F: nothing
| | | ✗
|
null
| A: empty
| | ✗
| |
B: null
| ✔
| | |
C: null, then empty
| ✔
| | |
D: empty, then null
| | ✗
| |
E: wrong operator
| | ✗
| |
F: nothing
| | | ✗
|
| Count
| 6
| 3
| 3
|
Points of interest:
-
The outcome is correct in 50% of the cases.
There's a 25% chance for the worst outcome: a silently ignored error.
-
Only the code of developer C, which checks for null
, then for an empty string, and uses the right operator (||
), works correctly in all cases.
-
If the code on the supplier- or consumer-side is changed later on, then the outcome might change too. Code that worked correctly might become buggy, or vice versa.
-
If customer.getEmail()
sometimes returns null
, and sometimes an empty string (depending on the value stored in the database), then the code might work correctly for some customers, but not for others.
Note
Some languages support additional values, besides null
and empty strings. For example, Javascript also has undefined
. VBScript has four values: Nothing
, Empty
, Null
, and empty strings. When I wrote this article, I initially intended to provide an even more problematic example including undefined
, along with null
and an empty string. However, because of the exponential increase in combinations, I abandoned this idea swiftly.
The above example demonstrates what you probably knew already: Checking for null
and an empty string is cumbersome and error-prone.
Wouldn't it be great if we could get rid of this recurring annoyance?
Testing for the absence of an email address should be straightforward, and there should be only one right way to do it, ideally enforced by the compiler.
Is There a Solution?
"Delete," he said. "Delete, delete, delete."
— Isaacson, Walter. Elon Musk, 2023, pp. 402
Because checking for both null
and an empty string is a common pattern, C# provides a static String
method named IsNullOrEmpty
.
Instead of writing:
if ( email == null || email == "" )
... you can simply write:
if ( String.IsNullOrEmpty ( email ) )
Java doesn't provide such a method in its standard library, but some third-party libraries do. For example, Google Guava provides Strings.isNullOrEmpty()
, and Apache Commons provides StringUtils.isEmpty()
.
Such utilities are useful, but no compiler in the world can force us to use them. We are not protected from writing wrong code — all variations shown in the previous section are still allowed. We need a better solution.
Could we eradicate null
and only use empty strings to represent the absence of a string value? If you read my previous articles (especially Union Types in the Practical Type System (PTS) and Null-Safety in the Practical Type System (PTS) ), then you know already that this is not an option.
We need null
!
What if we eradicated the empty string?
Can we do that?
Should we do that?
Yes and yes!
What might at first look as an unforgivable, barbaric act of destruction will turn out to be a wonderful simplification that increases reliability and makes us sleep better at night!
We can even go a step further.
A string is a sequence/collection of characters or Unicode code points (e.g. "foo"
is a collection of the characters 'f'
, 'o'
, and 'o'
). Hence, if we decide to eradicate the empty string, it's reasonable to ask: Does that mean that we should also eradicate empty collections (list
, set
, map
, etc.)?
Again, the answer is a wholehearted yes, we can and should!
However, as we'll see later, we need to do it properly and keep everything practical.
Important
Remark
This article is part of the How to Design a Practical Type System to Maximize Reliability, Maintainability, and Productivity in Software Development Projects series. My suggestion to eradicate empty strings and empty collections concerns only new languages that implement the Practical Type System (PTS) or a similar paradigm designed for reliability.
I do NOT suggest to remove empty strings/collections in existing mainstream languages such as C, C++, C#, Java, JavaScript, Python, and Rust.
In the next section you'll see why we can remove empty strings and empty collections, even though this contradicts established practices and guidelines. After evaluating the pros and cons, it will become clear why we should also remove them. Finally you'll see how it all works in PTS, and a practical example will demonstrate the benefits.
Can We Do It?
If you think that eliminating empty strings and collections is a bad idea, know that you're not alone. We're so used to them that we take them for granted and can't imagine living without them. In this section we'll have a look at some counter-arguments to my suggestion.
Note
Source code examples in this section and the next one are shown in Java, but the concepts discussed are applicable to other programming languages as well.
Argument #1: Empty strings and empty collections are supported in all popular languages and they are used in pretty much all kinds of applications. There must be a good reason for this. We can't eliminate them.
Arguments like "Everybody does it, so it must be right" or "It has always been done like this, so we should do the same" can be flawed.
Staying open-minded for novel ideas, and daring to challenge entrenched concepts (including those that may appear unassailable), is crucial to drive progress.
Note
When I decided to eliminate empty strings and collections in PPL (a proof-of-concept implementation of PTS, now hibernating), I anticipated that I would later regret my idea, after encountering cases that would show me clearly why empty strings and collections are needed (in addition to null
). Nevertheless, I decided to just try it out and see what would happen. What did happen was that I never regretted the decision. In the following sections, I'll explain why this turned out to be a beneficial idea (unlike several other ideas that I ultimately had to discard after experimenting with them).
Argument #2: Using empty strings/collections instead of null
simplifies code and eliminates some null pointer errors.
Common wisdom dictates that functions ought to return empty strings and collections, instead of null
. Lots of articles have been written about this topic, and the advice is supported by many prominent and influential voices. For example, Microsoft states (in Guidelines for Collections): "DO NOT return null values from collection properties or from methods returning collections. Return an empty collection or an empty array instead."
The rationale for this guideline is easy to understand.
Suppose we want to iterate over food in the fridge. If fridge.getFoods()
returns an empty collection to represent "no food in the fridge", we can simply write:
for ( Food food : fridge.getFoods() ) {
System.out.println ( food.toString() );
}
If there's no food in the fridge, the body of the loop won't be executed. We don't need to write:
List<Food> foods = fridge.getFoods();
if ( ! foods.isEmpty() ) {
for ( Food food : foods ) {
System.out.println ( food.toString() );
}
}
On the other hand, if fridge.getFoods()
returns null
, then a simple loop:
for ( Food food : fridge.getFoods() ) {
System.out.println ( food.toString() );
}
... results in a null pointer error if there's no food in the fridge (i.e. whenever fridge.getFoods()
returns null
).
We have to write:
List<Food> foods = fridge.getFoods();
if ( foods != null ) {
for ( Food food : foods ) {
System.out.println ( food.toString() );
}
}
Obviously, it looks like using null
(instead of an empty collection) does indeed add unnecessary complexity and increases error-proneness, doesn't it?
Yes, but ... this is not the whole story — we have to look at it from a different perspective. We must reconsider this argument in the context of more reliable software development, which is the primary goal of PTS.
For now, suffice to say that we can use null
instead of an empty collection, even if it seems like we shouldn't.
In the next section we'll come back to this essential point.
Argument #3: Sometimes, an empty string/collection has a different meaning than null
, and in such cases they must be handled differently in the code. Therefore we need both.
In our introductory example, the meaning of an empty string was the same as for null
, because we handled both cases in the same way:
if ( email == null || email.isEmpty() ) {
Whether email
is null
or empty, the same code is executed: the "then" branch of the if
statement. The meaning is the same in both cases: there is no email address defined for the customer.
It turns out that, in practice, the meaning of an empty string/collection and null
is always the same, unless we assign them different meanings in specific cases.
For example, we could specify that an empty string means "the customer doesn't have an email address", whereas null
means that "we don't know yet whether the customer has an email address or not".
However, this is bad practice, therefore we shouldn't do it. Assigning different meanings to null
and an empty string/collection would just be a convention that must be documented and applied everywhere in the code by everybody working on the codebase. This is error-prone, and the compiler can't enforce such conventions.
If we need to differentiate between two (or more) cases then a safe approach is to use different types for the different cases.
For example, consider a function that returns the allergies of a given person. Obviously, it's crucial to differentiate between "this person has no allergies" and "this person has not yet been tested for allergies". It could be tempting to keep it simple and specify that an empty list returned by the function means that the person has no allergies, whereas null
means that the person has not yet been tested for allergies. However, this would turn out to be a terrible idea, because every developer needs to be aware of this convention and apply it correctly. Client code would look like this:
List<Allergy> allergies = person.allergies();
if ( allergies == null ) {
} else if ( allergies.isEmpty() ) {
} else {
}
Extremely error-prone — we shouldn't do this.
Instead, person.allergies()
should return one of three types: AllergiesNotTested
, NoAllergies
, or HasAllergies
(a type that contains a non-empty list of allergies). Client code is now clear and type-safe, other cases could easily be added in the future (e.g. AllergyTestPending
), and the compiler checks for cases we might have forgotten. The code looks like this:
switch ( person.allergies() ) {
case AllergiesNotTested notTested -> {
}
case NoAllergies noAllergies -> {
}
case HasAllergies hasAllergies -> {
}
}
In a nutshell, code like this:
if ( collection == null ) {
} else if ( collection.isEmpty() ) {
} else {
}
... is a code smell. More precisely, it's a data design smell. It means that different semantics (meanings) have been assigned to the cases "the collection is null" and "the collection is empty," instead of using different types for these different cases.
We can conclude:
If we don't do what we shouldn't do, the conclusion can be simplified:
Microsoft puts it like this in Guidelines for Collections: "The general rule is that null and empty (0 item) collections or arrays should be treated the same."
Hence, we never need null
and empty collections/strings to represent semantically different cases of the absence of a value.
Note
Can we hastily conclude that the integer zero and null
(or the boolean false
and null
) also have the same meaning, in the same sense that an empty string and null
mean the same? No, that would be a terrible fallacy. Zero and null
, as well as false
and null
have very different meanings. For example, accountBalance = 0
means that there's no money in the account, while accountBalance = null
means that we don't know how much money there's in the account.
Argument #4: Sometimes we need mutable collections, and they must be allowed to be empty — for example to implement stacks, queues, deques, etc.
Yes, that's a valid argument. The short answer (in PTS) is that immutable collections cannot be empty, but the standard library also provides mutable collections that can be empty. This will be covered in a later section.
Argument #5: We need empty collections and null
whenever we work with libraries and frameworks (possibly written in different languages) that use both.
Working with third-party APIs is not a problem, because we can simply convert data between the "null and empty" and "only null" worlds. Examples will be shown later.
Conclusion
As seen in this section we can eliminate empty strings and empty collections, and use null
instead.
However, that doesn't mean yet that we should do it. If Bob can write a big application using only Windows Notepad, it doesn't mean that he should do it.
Should We Do It?
In this section we'll look at pros and cons of eliminating empty strings/collections, starting with the pros.
Potentially Troublesome Values Eliminated
The first PTS article introduced the following PTS Coding Rule: "All data types in a software project should have the lowest possible cardinality."
Reminder
The cardinality of a type is the number of allowed values in the type. For example, type boolean
has a cardinality of two, because two values are allowed: true
and false
.
By eliminating the empty string and empty collections, the cardinality of all strings and all collections in every application has been reduced by one.
That's nice.
Even better, we've eliminated the most troublesome values in these types.
As every experienced developer knows, empty strings and empty collections are often invalid values, or they must be handled differently. For example: each name has at least one character; there's at least one student in each class; every online retailer sells at least one product, etc. Eliminating empty values by design eliminates potential bugs related to these values.
Simpler Code
Remember the source code examples from section What's the Problem?, where six programmers wrote different code, and the only correct version was this one:
if ( email == null || email.isEmpty() ) {
writeError ( id + ": no email" );
} else {
writeInfo ( id + ": " + email );
}
By eliminating empty strings, the risk for wrong code in similar cases (common in many projects) has been eliminated. The correct code becomes simpler, and there's only one right way to do it. In a null-safe language, the check for null
is even required by the compiler. We have to write:
if ( email == null ) {
writeError ( id + ": no email" );
} else {
writeInfo ( id + ": " + email );
}
You don't need to wonder anymore: "Should I check for empty, or null
, or both?" We always just have to check for null
, and if we forget to do so (on a bad day), the compiler will remind us to do it.
More Reliable Code
The most important point to remember from this article is this: Using null
instead of empty strings/collections increases software reliability. Let's see why.
When working with collections, more often than not we need to distinguish between the following two semantically different cases, and we have to handle them individually:
Some pseudo-code examples:
if directory_is_not_empty then
copy_files_in_directory
else
report_missing_files_error
.
if no_students_in_the_classroom then
close_the_windows
switch_off_the_lights
else
if its_hot then
open_the_windows
.
switch_on_the_lights
.
// real-life example
if there_are_bugs_in_the_code then
fix_bugs
else
work_on_new_features
.
Sometimes, we forget to distinguish between these two cases — typically, we forget to write specific code for the "no elements" edge case. This can result in bugs that remain undetected during the development and test phases, especially for edge cases that occur rarely.
The good news is this: If we use null
(in a null-safe language) to represent "there are no elements" then we can't ignore these edge cases anymore, because the compiler gently reminds us to handle them. In other words, we are always required to handle both cases (i.e. "there are elements" and "there are no elements"). This eliminates many potential bugs.
If we use empty collections, we are allowed to write code like this:
for ( Object element : collection ) {
}
An empty collection simply results in "do nothing" behavior. Sometimes this is the right thing to do, but usually it isn't. It's easy to overlook the "there are no elements" edge case. Moreover, if the code was written by somebody else, we don't know if the author intended to do nothing in case of an empty collection.
On the other hand, in null-safe languages that enforces null
instead of empty collections, the above code wouldn't compile anymore. We'd have to write:
if ( collection != null ) {
for ( Object element : collection ) {
}
}
... or:
if ( collection != null ) {
for ( Object element : collection ) {
}
} else {
}
Yes, the code is a bit more verbose if no special handling is required for the "no elements" case, but there are two advantages:
-
We can't accidentally forget the "there are no elements" edge case.
On a bad day we could still forget to add an else
branch (whenever needed to handle the edge case), but the risk of this bug is largely mitigated since the test for collection != null
is required.
-
The programmer's intention to handle or ignore the edge case is clearly stated in the code.
You'll later see a practical PTS example demonstrating the benefits.
Note
To see some practical Java examples, you can also read my article Is it Really Better to 'Return an Empty List Instead of null'? - Part 2. That article illustrates wrong outcomes caused by empty collections. For example: a majestic house you can get for free, and a runner-up declared as winner in an election.
Simpler and More Reliable APIs
If strings and collections can't be empty, their APIs becomes simpler and less error-prone.
For example:
-
There's no need for an isEmpty
method.
-
Method size
(aka length
, which returns the number of elements in the collection) never returns zero. This eliminates the potential risks of some bugs, such as a division by zero when computing the average value in a list of numbers.
-
Methods first
and last
always return an element (instead of returning null
or throwing an exception if the collection is empty).
-
Methods like allMatch
, noneMatch
, and anyMatch
can't return unexpected and debatable results that can lead to subtle bugs in edge cases.
People have different opinions of what these methods should return if the collection is empty — as can be seen in the following long discussion in the CodeProject lounge: .NET's Sometimes Nonsensical Logic.
-
Methods that compute an aggregate value (e.g. sum
, average
, min
, max
) also become more straightforward.
For example, we don't need to debate questions like "What should function average
do if the list of numbers is empty? Return zero? Return null
? Throw an exception?"
More Efficient Code
When it comes to time- and space-efficiency, nothing beats null
.
In most languages, null
is super fast.
Assigning null
to an object reference is typically implemented by simply assigning zero (all bits at 0) to a pointer, and checking for null
is done by comparing the pointer to zero. Both are CPU operations that are extremely fast.
Better Code Analysis
If collections can't be empty then loops and iterations involving them are always guaranteed to execute at least once. This certainty can be leveraged by advanced compilers (and complementary tools such as static code analyzers) to make assumptions that can't be made if the body of the loop might be executed. For example, a compiler might be able to optimize target code for loops that are guaranteed to execute at least once.
Disadvantages
So far we've talked about advantages. Are there disadvantages too?
The only disadvantage I can think of is the occasional added verbosity. For example, instead of writing:
int elementCount = collection.size();
boolean weHaveCheese = foodsInFridge.contains ( "cheese" );
... we have to write:
int elementCount = collection == null ? 0 : collection.size();
boolean weHaveCheese = foodsInFridge == null ? false : foodsInFridge.contains ( "cheese" );
Yes, sometimes the code is more verbose — but that's a small price to pay for all the benefits we've seen so far. We can't have our cake and eat it too. On the plus side, the code is also more expressive and (in some cases) less error-prone because, again, the "no elements" edge case is handled explicitly in the code.
Collections in the Physical World
Whenever I struggle to come up with the best way to design data or write code, I often find it useful to look at how things work in the physical world.
So, how do we use collections in the real world?
Consider Bob who collects colorful stones, stored in a box labeled "Stones".
Alice doesn't collect stones. Does this mean she has an empty box labeled "Stones"? No, of course not. Alice simply doesn't have a box.
It's easy to translate this to the digital world: An empty box in the physical world is like an empty list in the digital world; no box at all is like null
.
We could come up with many more examples, but after pondering about them for a while we would conclude:
-
Most collections (lists) in the physical world are immutable and non-empty.
For example: the set of components in your computer model; the list of ingredients in grandma's Christmas cake; the list of the 2023 Nobel price winners, etc.
-
Sometimes we use mutable collections which are empty at their outset, get populated over time, and might finally be discarded or kept as immutable, non-empty lists.
For example: Bob's "Stones" box; the list of students enrolled so far in a language course; the applications installed on your computer, etc.
As you'll see in the next section, PTS collections are designed to work like collections in the physical world.
How Does It Work?
In this section I'm going to show briefly how the concept of no empty strings/collections works in PTS. It's important to note that the approach described here is not the only viable one — it's the one I used in my proof-of-concept implementation of PTS. To keep this section short, many implementation details are left out.
Source code examples in the following sections use the PTS syntax, introduced in previous PTS articles.
Note
Please be aware that PTS is a new paradigm and still a work-in-progress. As explained in the History section of Essence and Foundation of the Practical Type System (PTS), I created a proof-of-concept implementation which is now a bit outdated — therefore you won’t be able to try out the PTS code examples shown in this article.
Non-empty Immutable Collections
PTS strings and collections are immutable and non-empty. The following types are defined in the standard library:
-
Type string
: an immutable string that cannot be empty (i.e. the string must contain at least one character).
-
Types list
, set
, map
, etc.: immutable collections that cannot be empty (i.e. the collection must contain at least one element).
These are the types predominantly used in function signatures (i.e. immutable and non-empty). For example, the following function takes a non-empty, immutable list of non-empty, immutable strings as input, and returns a non-empty, immutable list of integers:
fn foo ( strings list<string> ) -> list<integer>
// body
.
If input and output are allowed to be null
(i.e. there might be "no elements"), the signature will contain union types (t1 or t2
), like this:
fn foo ( strings list<string> or null ) -> list<integer> or null
// body
.
If the input and output lists are also allowed to contain null
elements, the signature is as follows:
fn foo ( strings list<string or null> or null ) -> list<integer or null> or null
// body
.
No Empty Literals
Because strings can't be empty, there is no empty string literal, as shown below:
const name = "Bob" // OK
const name = "" // compile-time error
const name string or null = null // OK
There aren't any empty collection literals either:
const numbers = [list 1 2 3] // OK
const numbers = [list ] // compile-time error
const numbers list<number> or null = null // OK
Immutable Collection Builders
While literals are convenient for hard-coding predefined values, builder types allow us to programmatically create strings and collections. For example, we use a list_builder
to build a list
.
Builders apply the Builder pattern, commonly used in object-oriented languages. Internally, a builder uses a mutable data structure (e.g. a mutable list) to build the collection. An immutable collection is built in three steps:
-
Create a builder object (e.g. list_builder.create
)
-
Add elements (e.g. builder.append ( ... )
)
-
Create a non-empty, immutable collection by calling builder.build
(or builder.build_or_null
if there might be no elements)
Here's an example of a function that creates a range of integers:
fn int_range ( start integer, end integer ) -> list<integer>
in_check: end >= start
const builder = list_builder<integer>.create
repeat from i = start to end
builder.append ( i )
.
return builder.build
.
Mutable Collections That Can Be Empty
Sometimes we need mutable collections that can be empty. For example: stacks, queues, deques, collections that are populated by several functions, etc.
To keep these data structures efficient and practical, I opted to provide dedicated mutable collections in PTS.
The name of a mutable collection type always starts with the mutable_
prefix, followed by the name of its immutable counterpart. Thus, the standard PTS library provides:
-
Type mutable_string
: a mutable string that can be empty (i.e. its character length might be zero).
-
Types mutable_list
, mutable_set
, mutable_map
, etc.: mutable collections that can be empty (i.e. their size might be zero).
Here's a trivial example of a function that appends one or two elements to a mutable list passed as argument:
fn append_elements ( strings mutable_list<string> )
if strings.is_empty
strings.append ( "first" )
.
strings.append ( "foo" )
.
We can convert a mutable collection into its immutable counterpart (or null
if the collection is empty) by calling method to_immutable_or_null
(e.g. return customers_found.to_immutable_or_null
).
If the mutable collection is assumed to contain at least one element in a given context, the method to_immutable_or_throw
should be used: instead of returning null
whenever the mutable collection is empty, this method throws an error (exception).
Loops Syntax
We can iterate over collections via a classical loop (imperative style), streams (functional style), or recursion. This section covers only the classical loop construct.
Here's a simple example of using a repeat
statement to iterate over a collection:
repeat for each number in [list 1 2 3]
out.write_line ( number.to_string )
.
Output:
1
2
3
In cases where there might be no elements, the compiler requires a check for null
:
const commands list<command> or null = get_commands()
if commands is not null
repeat for each command in commands
log_info ( """Executing command {{command.to_string}}.""" )
// code to execute command
.
else
log_warning ( "There are no commands to execute." )
.
The else
branch is optional:
const commands list<command> or null = get_commands()
if commands is not null
repeat for each command in commands
log_info ( """Executing command {{command.to_string}}.""" )
// more code
.
.
We can shorten the above code by using the if_null
clause in the repeat
statement:
repeat for each command in get_commands() if_null: skip
log_info ( """Executing command {{command.to_string}}.""" )
// more code
.
If a collection is declared to be nullable (either explicitly or via type-inference), but supposed to be non-null in a given context, we can use the if_null: throw
clause to abort program execution whenever the collection is null
, despite our assumption to the contrary:
repeat for each command in get_commands() if_null: throw "'commands' is not supposed to be 'null'."
log_info ( """Executing command {{command.to_string}}.""" )
// more code
.
The above code is a shorthand for the following one; both throw an error if the collection is null
:
const commands list<command> or null = get_commands()
if commands is not null
repeat for each command in commands
log_info ( """Executing command {{command.to_string}}.""" )
// more code
.
else
throw null_iterator_in_loop_error.create (
message = "'commands' is not supposed to be 'null'.",
id = "NULL_ITERATOR_IN_LOOP" )
.
Working with Non-PTS Libraries
Empty strings and collections are ubiquitous in non-PTS libraries. How can we use these libraries in a language where strings and collections can't be empty?
This depends largely on the PTS implementation, but let's consider a PTS implementation that generates Java target code and allows Java source code to be embedded between java
and end java
statements. Then there are at least three solutions to use Java libraries from within a PTS application:
- Convert input/output arguments
Before calling a Java library function that doesn't allow null
as input, but requires an empty collection, we need to convert null
into an empty collection. Here's an example of PTS code using embedded Java:
java
sendCommands ( commands == null ? Collections.emptyList() : commands );
end java
After calling an external Java function that might return an empty collection, we need to convert the empty collection into null
, e.g.:
java
List<Command> commands = getCommands();
if ( commands.isEmpty() ) {
commands == null;
}
end java
If these transformations are needed often, we can create utility functions that serve as wrappers, so that client code remains idiomatic and succinct.
- Use mutable collections that can be empty
Instead of converting collections, an alternative solution is to use mutable PTS collections (covered previously in section Mutable Collections That Can Be Empty) to work with non-PTS libraries.
However, this solution is not recommended, because we lose the advantages of immutable, non-empty collections.
- Use dedicated types to work with non-PTS libraries
A standard PTS library can provide immutable collections that can be empty, dedicated to be used only when working with non-PTS libraries.
For example, in my proof-of-concept PTS implementation, I created type emptyable_string
(in addition to string
and mutable_string
). This specific type was sometimes handy — it simplified the task of working with external Java libraries, especially in rare cases where null
and an empty string had different meanings.
Example
Mistakes are a fact of life. It is the response to the error that counts.
— Nikki Giovanni
In this section you'll see a practical example demonstrating the benefits of using a null-safe language that doesn't permit strings and collections to be empty. We'll investigate a simple function under two different paradigms: unsafe and safe.
Unsafe Paradigm
PTS is null-safe and immutable strings and collections can't be empty. However, for the purpose of this exercise, let's first suppose that PTS was designed like many other languages:
Now imagine we want to compute the average length of remarks entered by users in a data entry form. Consider the following PTS function to achieve this:
fn average_length_of_remarks ( remarks list<string> ) -> decimal
variable sum = 0.0
repeat for each remark in remarks
sum = sum + remark.length
.
return sum / remarks.size
.
The code would look similar in many other languages. Here's an example in Java:
static double averageLengthOfRemarks ( List<String> remarks ) {
double sum = 0.0;
for ( String remark : remarks ) {
sum+= remark.length();
}
return sum / remarks.size();
}
Note
We could also employ a functional style in PTS and Java. Using Java streams, for example, the code would be:
static double averageLengthOfRemarks2 ( List<String> remarks ) {
return (double) remarks
.stream()
.mapToInt ( String::length )
.sum() / remarks.size();
}
However, whether we use an imperative or functional style is irrelevant for the topic at hand. We'll continue this exercise using a classical loop.
If we call the above PTS function with [list "f" "fo" "foo"]
as input, it returns the correct result: 2.0
.
Unfortunately, although small and simple, there are three problems:
-
If the function is called with remarks = null
, a null pointer runtime-error occurs in the repeat
statement (for
statement in the Java version).
-
If the function is called with an empty list as input, a division by zero error occurs in the last statement (because remarks.size
is zero).
-
Assuming that empty remarks should be ignored, the result will be wrong if input argument remarks
contains empty strings. For example, calling the function with [list "foo" "" "foo" ]
returns 2.0 (=6/3)
instead of 3.0 (=6/2)
.
A silently ignored error occurs (worst-case).
Safe Paradigm
Now let's see what actually happens in PTS (which is null-safe and doesn't permit empty strings and collections):
-
A function call with remarks = null
is not permitted and results in a compile-time error, because all types are non-nullable by default.
-
We are also not permitted to call the function with an empty list as input, because collections can't be empty. Hence, a division by zero error cannot occur.
-
Input argument remarks
cannot contain empty strings, because PTS strings are non-empty too. Therefore, a silently ignored error due to empty remarks cannot occur.
As you can see, all three problems have disappeared. The above function doesn't support the "no remarks" and "some empty remarks" cases, and the compiler ensures that nobody calls the function with invalid input. The function requires a non-null, non-empty list of remarks, and empty remarks are not allowed in the list.
Now suppose that the function should handle empty remarks. Here's the new version:
fn average_length_of_remarks ( remarks list<string or null> ) -> decimal or null
variable sum = 0.0
variable remarks_count = 0
repeat for each remark in remarks
if remark is not null
sum = sum + remark.length
remarks_count = remarks_count + 1
.
.
return if remarks_count =v 0 then null else sum / remarks_count
.
Note that:
-
The type of elements contained in parameter remarks
changed from string
to the union type string or null
, which explicitly states that empty remark fields are now supported.
-
The check if remark is not null
is required by the compiler because PTS is null-safe, and remark.length
would result in a null pointer error if remark
were null
at run-time.
-
The expression for the return-value if remarks_count =v 0 then null else sum / remarks_count
is also required by the compiler. If we simply wrote return sum / remarks_count
, the code would not compile, because remarks_count
might be zero and thus cause a division by zero (note that this is an advanced compiler feature beyond the scope of this article).
Because the function might return null
, we are also required to change the output type from decimal
to decimal or null
.
This, in turn, means that the caller of the function can't forget to handle the edge case where all remark fields are empty, and therefore an average length of remarks could not be computed. We're always safe.
What about variable remarks_count
, which is needed to compute the correct result if there are empty remark fields? Suppose we incorrectly wrote:
fn average_length_of_remarks ( remarks list<string or null> ) -> decimal
variable sum = 0.0
repeat for each remark in remarks
if remark is not null
sum = sum + remark.length
.
.
return sum / remarks.size
.
Does the compiler report a bug?
No, it doesn't, because the above code would be correct if we actually wanted empty remarks to be included in the computation of their average length.
As said already in a previous PTS article, the slogan "if it compiles, it works" is just wishful thinking. We still need to write unit tests to detect logical bugs.
Now let's consider the "there are no remarks" edge case. The function doesn't currently handle this case. Therefore, the caller of the function is required to explicitly handle this case (if it occurs). However, if we want the function itself to also handle this case, we can easily do this:
fn average_length_of_remarks ( remarks list<string or null> or null ) -> decimal or null
if remarks is null then return null
// rest of code
.
Now the function returns null
when called with remarks = null
.
Conclusion
Hopefully, these simple examples demonstrate the benefits of a null-safe language that doesn't permit strings and collections to be empty.
Imagine a big application with hundreds or even thousands of edge cases similar to the above ones. Having a compiler that spots all edge cases, including the most elusive ones, is like having a helpful, reliable companion, always by our side, tapping on our shoulder from time to time and telling us: "Look! Here's an edge case that needs to be handled or ignored explicitly."
Nikki Giovanni wisely said:
"Mistakes are a fact of life. It is the response to the error that counts."
We can easily apply this insight to the field of software development:
"Coding mistakes are a fact of life. It is the compiler response to the error that counts."
Fail fast!
Other Languages
This article focuses on languages that support null
and ensure null-safety. What about other programming languages? Would it still make sense to apply the ideas presented in this article?
Non-null-Safe Languages
Many popular programming languages support null
, but aren't null-safe. Should new languages applying this paradigm also restrict immutable strings/collections to be non-empty?
Yes, if software reliability matters.
The difference between a null-safe and a non-null-safe language is this: In a null-safe language we get a compile-time error if null
is handled incorrectly in the code; in a non-null-safe language we get a run-time error instead — the dreaded null pointer error, whenever a pointer to null
is dereferenced.
Note
In some languages, dereferencing a null pointer results in undefined behavior (instead of throwing a null pointer error). In this section we'll only consider languages that throw null pointer errors.
A null-pointer-error is "bad". But a silently ignored error due to an empty string or collection is "very bad". "Bad" is better than "very bad". It's generally much better for an application to immediately throw a null pointer error and abort program execution than to ignore the problem and silently continue execution with polluted or corrupted data that crawls through the system, and, sooner or later (maybe much later), risks to result in a mysterious bug that's difficult to identify and fix.
Many people hate null pointer errors, but these errors are in fact very useful, because:
-
They support the Fail-Fast! principle (at run-time) and the bug is therefore more likely to be discovered early in the development process. The program crashes immediately and noisily, instead of ignoring the problem and silently continuing execution with wrong/corrupted data.
The final outcome of a null pointer error is usually less severe and more predictable.
Imagine a nightmarish scenario like this one: Application A writes wrong data into a database. Later on, this data is read and handled by applications B, C, and D. Although the bug originated in application A, it manifests in applications B, C, and D.
-
Null pointer errors are usually easy to identify and fix, because their cause and effect are short-distanced.
-
Helpful tools capable of identifying potential null pointer errors in the codebase are available for most popular programming languages.
Languages That Use an Optional
/Maybe
Type
Some languages don't support null
. Instead, they provide a dedicated type for cases where a value might be absent. For example, Rust provides the Option
type, F# the Option
monad, and Haskell the Maybe
monad.
The idea of non-empty strings and collections can also be applied in new languages using this approach. Instead of empty strings/collections, the None
/Nothing
values of the Option
/Maybe
types are used, respectively, and pattern matching ensures that we distinguish between the "no elements" and "one or more elements" cases, thereby reaping benefits akin to those demonstrated in this article.
Summary
Preventing immutable strings and collections from being empty provides the following advantages:
-
More reliable code because more bugs can be found at compile-time. The compiler reminds us to handle edge cases we might easily overlook.
-
Simpler and less error-prone code because there is no need to test for "null
or empty".
-
Simpler and less error-prone APIs for non-empty, immutable string
and collection types (e.g. element_count
never returns zero, hence no risk of a division by zero).
These advantages are particularly valuable when working on large codebases.
The following approach is used in PTS:
-
Strings and collections are immutable and can't be empty.
-
Instead of empty strings/collections, null
is used to represent "no elements".
-
Builders are used to build (create) immutable strings and collections.
-
PTS also provides mutable strings and collections that can be empty, but they are used rarely.
Acknowledgments
Many thanks to Tristano Ajmone for his useful feedback to improve this article.
History
- 2024-06-28: initial version
- 2024-07-02: fixed 'Developer E' example