Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Java

We Should Eradicate the Empty String — Here's Why

4.54/5 (8 votes)
27 Jun 2024CPOL34 min read 14.2K  
Eradicating empty strings (and empty collections) increases software reliability and leads to other advantages.
This article explains why immutable strings and collections shouldn't be permitted to be empty. It also shows how this concept is implemented in the Practical Type System (PTS). Several source code examples demonstrate the benefits.

Table of Contents

Image 1

Empty String vs Null

Introduction

Did you ever wonder what's the difference between an empty string and null?

For example, what's the difference between:

Java
email = ""

... and:

Java
email = null

Do both statements mean the same or not?

What are the pros and cons of using an empty string vs using null?

Do we need both versions, or could we get rid of one to simplify things?

Let's see.

What's the Problem?

Before revealing the answers to the questions raised in the previous section, it's helpful to first investigate the infamous "test for empty and/or null?" problem, encountered in many popular programming languages (C-family languages, Java, JavaScript, Python, etc.). Chances are you have encountered this problem firsthand, in various situations.

Consider a service provider company that needs to send an important email to inform all customers about an upcoming change of conditions. Before sending the email, the staff wants to check the customers' email addresses, and ensure that each customer has an email address defined in the database.

Here's an excerpt of code that does the job, written in Java (the code would be similar in other languages):

Java
for ( Customer customer : customers ) {
    String id = customer.getId();
    String email = customer.getEmail();
    if ( /* TODO: no email address defined */ ) {
        writeError ( id + ": no email" );
    } else {
        writeInfo ( id + ": " + email );
    }
}

The interesting part in this code is the if statement that checks if an email address exists for the customer.

Practice shows that different developers are likely to check for the absence of an email address in different ways. Let's look at six possible variations.

Developer A follows the prevalent advice that functions should never return null, but empty strings (and empty collections), in order to simplify code and avoid the dreaded null pointer error. Therefore he assumes that customer.getEmail() returns an empty string if there's no email address. The code looks like this:

Java
if ( email.isEmpty() ) {
    writeError ( id + ": no email" );
} else {
    writeInfo ( id + ": " + email );
}

Developer B is a member of a growing tribe of developers who embrace null. Only null should be used to represent the absence of value, and therefore she assumes that customer.getEmail() returns null if there's no email address:

Java
if ( email == null ) {
    writeError ( id + ": no email" );
} else {
    writeInfo ( id + ": " + email );
}

Developer C wants to be on the safe side and tests for null or an empty string:

Java
if ( email == null || email.isEmpty() ) {
    writeError ( id + ": no email" );
} else {
    writeInfo ( id + ": " + email );
}

Developer D also wants to be on the safe side, but he's having a bad day and gets the code wrong:

Java
if ( email.isEmpty() || email == null ) {
    writeError ( id + ": no email" );
} else {
    writeInfo ( id + ": " + email );
}

Note

The above code is wrong because it first tests email.isEmpty(), before testing for null, which means that a null pointer error is thrown if email points to null.

Developer E also gets it wrong, but not because the order of the operands is wrong. Instead of using the short-circuiting logical OR operator ||, she uses the bitwise inclusive OR operator |, which has the effect of a non-short-circuiting logical OR operator if both operands are of type boolean. Therefore a null pointer error is thrown if email points to null:

Java
if ( email == null | email.isEmpty() ) {
    writeError ( id + ": no email" );
} else {
    writeInfo ( id + ": " + email );
}

Developer F is having a very bad day and forgets to check if there's an email address:

Java
writeInfo ( id + ": " + email );

Note

If email points to null then the outcome of the above code is language-dependant.

  • In Java, appending null to a string appends the string "null" — hence a message like C123: null is written.

  • Other languages (e.g. C#) append nothing, which results in C123: .

  • Some languages might throw an exception at run-time.

  • The safest approach (in a null-safe language) is this one: the compiler generates an error, requiring us to decide what to do whenever we try to append a nullable value to a string.

Now let's see what happens at run-time.

Besides considering the above six code variations from the consumer side of customer.getEmail(), we also need to take into account the possible variations on the supplier side.

If there is no email address defined, then customer.getEmail() might:

  • return null

  • return an empty string

As there are six variations on the consumer side, and two on the supplier side, the (surprisingly high) number of combinations is: 6 x 2 = 12.

The following table shows the outcome for all combinations of supplier/consumer code. You don't need to scrutinize this table — you can just skim over it, because the goal of this example is to provide an idea of the complexity and error-proneness involved in this simple example.

Supplier
Returns

Consumer
Checks

Output

Correct

Runtime
Error

Silently
Ignored
Error

empty string

A: empty

   

B: null

   

C: null, then empty

   

D: empty, then null

   

E: wrong operator

   

F: nothing

   

null

A: empty

 

 

B: null

   

C: null, then empty

   

D: empty, then null

 

 

E: wrong operator

 

 

F: nothing

   

 

Count

6

3

3

Points of interest:

  • The outcome is correct in 50% of the cases.

    There's a 25% chance for the worst outcome: a silently ignored error.

  • Only the code of developer C, which checks for null, then for an empty string, and uses the right operator (||), works correctly in all cases.

  • If the code on the supplier- or consumer-side is changed later on, then the outcome might change too. Code that worked correctly might become buggy, or vice versa.

  • If customer.getEmail() sometimes returns null, and sometimes an empty string (depending on the value stored in the database), then the code might work correctly for some customers, but not for others.

Note

Some languages support additional values, besides null and empty strings. For example, Javascript also has undefined. VBScript has four values: Nothing, Empty, Null, and empty strings. When I wrote this article, I initially intended to provide an even more problematic example including undefined, along with null and an empty string. However, because of the exponential increase in combinations, I abandoned this idea swiftly.

The above example demonstrates what you probably knew already: Checking for null and an empty string is cumbersome and error-prone.

Wouldn't it be great if we could get rid of this recurring annoyance?

Testing for the absence of an email address should be straightforward, and there should be only one right way to do it, ideally enforced by the compiler.

Is There a Solution?

"Delete," he said. "Delete, delete, delete."

— Isaacson, Walter. Elon Musk, 2023, pp. 402

Because checking for both null and an empty string is a common pattern, C# provides a static String method named IsNullOrEmpty.

Instead of writing:

C#
if ( email == null || email == "" )

... you can simply write:

C#
if ( String.IsNullOrEmpty ( email ) )

Java doesn't provide such a method in its standard library, but some third-party libraries do. For example, Google Guava provides Strings.isNullOrEmpty(), and Apache Commons provides StringUtils.isEmpty().

Such utilities are useful, but no compiler in the world can force us to use them. We are not protected from writing wrong code — all variations shown in the previous section are still allowed. We need a better solution.

Could we eradicate null and only use empty strings to represent the absence of a string value? If you read my previous articles (especially Union Types in the Practical Type System (PTS) and Null-Safety in the Practical Type System (PTS) ), then you know already that this is not an option.

We need null!

What if we eradicated the empty string?

Can we do that?

Should we do that?

Yes and yes!

What might at first look as an unforgivable, barbaric act of destruction will turn out to be a wonderful simplification that increases reliability and makes us sleep better at night!

We can even go a step further.

A string is a sequence/collection of characters or Unicode code points (e.g. "foo" is a collection of the characters 'f', 'o', and 'o'). Hence, if we decide to eradicate the empty string, it's reasonable to ask: Does that mean that we should also eradicate empty collections (list, set, map, etc.)?

Again, the answer is a wholehearted yes, we can and should!

However, as we'll see later, we need to do it properly and keep everything practical.

Important
Remark

This article is part of the How to Design a Practical Type System to Maximize Reliability, Maintainability, and Productivity in Software Development Projects series. My suggestion to eradicate empty strings and empty collections concerns only new languages that implement the Practical Type System (PTS) or a similar paradigm designed for reliability.

I do NOT suggest to remove empty strings/collections in existing mainstream languages such as C, C++, C#, Java, JavaScript, Python, and Rust.

In the next section you'll see why we can remove empty strings and empty collections, even though this contradicts established practices and guidelines. After evaluating the pros and cons, it will become clear why we should also remove them. Finally you'll see how it all works in PTS, and a practical example will demonstrate the benefits.

Can We Do It?

If you think that eliminating empty strings and collections is a bad idea, know that you're not alone. We're so used to them that we take them for granted and can't imagine living without them. In this section we'll have a look at some counter-arguments to my suggestion.

Note

Source code examples in this section and the next one are shown in Java, but the concepts discussed are applicable to other programming languages as well.


Argument #1: Empty strings and empty collections are supported in all popular languages and they are used in pretty much all kinds of applications. There must be a good reason for this. We can't eliminate them.

Arguments like "Everybody does it, so it must be right" or "It has always been done like this, so we should do the same" can be flawed.

Staying open-minded for novel ideas, and daring to challenge entrenched concepts (including those that may appear unassailable), is crucial to drive progress.

Note

When I decided to eliminate empty strings and collections in PPL (a proof-of-concept implementation of PTS, now hibernating), I anticipated that I would later regret my idea, after encountering cases that would show me clearly why empty strings and collections are needed (in addition to null). Nevertheless, I decided to just try it out and see what would happen. What did happen was that I never regretted the decision. In the following sections, I'll explain why this turned out to be a beneficial idea (unlike several other ideas that I ultimately had to discard after experimenting with them).


Argument #2: Using empty strings/collections instead of null simplifies code and eliminates some null pointer errors.

Common wisdom dictates that functions ought to return empty strings and collections, instead of null. Lots of articles have been written about this topic, and the advice is supported by many prominent and influential voices. For example, Microsoft states (in Guidelines for Collections): "DO NOT return null values from collection properties or from methods returning collections. Return an empty collection or an empty array instead."

The rationale for this guideline is easy to understand.

Suppose we want to iterate over food in the fridge. If fridge.getFoods() returns an empty collection to represent "no food in the fridge", we can simply write:

Java
for ( Food food : fridge.getFoods() ) {
    System.out.println ( food.toString() );
}

If there's no food in the fridge, the body of the loop won't be executed. We don't need to write:

Java
List<Food> foods = fridge.getFoods();
if ( ! foods.isEmpty() ) {
    for ( Food food : foods ) {
        System.out.println ( food.toString() );
    }
}

On the other hand, if fridge.getFoods() returns null, then a simple loop:

Java
for ( Food food : fridge.getFoods() ) {
    System.out.println ( food.toString() );
}

... results in a null pointer error if there's no food in the fridge (i.e. whenever fridge.getFoods() returns null).

We have to write:

Java
List<Food> foods = fridge.getFoods();
if ( foods != null ) {
    for ( Food food : foods ) {
        System.out.println ( food.toString() );
    }
}

Obviously, it looks like using null (instead of an empty collection) does indeed add unnecessary complexity and increases error-proneness, doesn't it?

Yes, but ... this is not the whole story — we have to look at it from a different perspective. We must reconsider this argument in the context of more reliable software development, which is the primary goal of PTS.

For now, suffice to say that we can use null instead of an empty collection, even if it seems like we shouldn't.

In the next section we'll come back to this essential point.


Argument #3: Sometimes, an empty string/collection has a different meaning than null, and in such cases they must be handled differently in the code. Therefore we need both.

In our introductory example, the meaning of an empty string was the same as for null, because we handled both cases in the same way:

Java
if ( email == null || email.isEmpty() ) {

Whether email is null or empty, the same code is executed: the "then" branch of the if statement. The meaning is the same in both cases: there is no email address defined for the customer.

It turns out that, in practice, the meaning of an empty string/collection and null is always the same, unless we assign them different meanings in specific cases.

For example, we could specify that an empty string means "the customer doesn't have an email address", whereas null means that "we don't know yet whether the customer has an email address or not".

However, this is bad practice, therefore we shouldn't do it. Assigning different meanings to null and an empty string/collection would just be a convention that must be documented and applied everywhere in the code by everybody working on the codebase. This is error-prone, and the compiler can't enforce such conventions.

If we need to differentiate between two (or more) cases then a safe approach is to use different types for the different cases.

For example, consider a function that returns the allergies of a given person. Obviously, it's crucial to differentiate between "this person has no allergies" and "this person has not yet been tested for allergies". It could be tempting to keep it simple and specify that an empty list returned by the function means that the person has no allergies, whereas null means that the person has not yet been tested for allergies. However, this would turn out to be a terrible idea, because every developer needs to be aware of this convention and apply it correctly. Client code would look like this:

Java
List<Allergy> allergies = person.allergies();
if ( allergies == null ) {
    // the person has not yet been tested for allergies
} else if ( allergies.isEmpty() ) {
    // the person has no allergies
} else {
    // the person has allergies
}

Extremely error-prone — we shouldn't do this.

Instead, person.allergies() should return one of three types: AllergiesNotTested, NoAllergies, or HasAllergies (a type that contains a non-empty list of allergies). Client code is now clear and type-safe, other cases could easily be added in the future (e.g. AllergyTestPending), and the compiler checks for cases we might have forgotten. The code looks like this:

Java
switch ( person.allergies() ) {
    case AllergiesNotTested notTested -> {
        // the person has not yet been tested for allergies
    }
    case NoAllergies noAllergies -> {
        // the person has no allergies
    }
    case HasAllergies hasAllergies -> {
        // the person has allergies
    }
}

In a nutshell, code like this:

Java
if ( collection == null ) {
    // do this
} else if ( collection.isEmpty() ) {
    // do something else 
} else {
    // ...
}

... is a code smell. More precisely, it's a data design smell. It means that different semantics (meanings) have been assigned to the cases "the collection is null" and "the collection is empty," instead of using different types for these different cases.

We can conclude:

  • The meaning of an empty collection (or an empty string) and null is the same, unless we assign different meanings in specific cases, but we shouldn't do this.

If we don't do what we shouldn't do, the conclusion can be simplified:

  • The meaning of an empty collection (or an empty string) and null is the same. They are handled the same way in the code.

Microsoft puts it like this in Guidelines for Collections: "The general rule is that null and empty (0 item) collections or arrays should be treated the same."

Hence, we never need null and empty collections/strings to represent semantically different cases of the absence of a value.

Note

Can we hastily conclude that the integer zero and null (or the boolean false and null) also have the same meaning, in the same sense that an empty string and null mean the same? No, that would be a terrible fallacy. Zero and null, as well as false and null have very different meanings. For example, accountBalance = 0 means that there's no money in the account, while accountBalance = null means that we don't know how much money there's in the account.


Argument #4: Sometimes we need mutable collections, and they must be allowed to be empty — for example to implement stacks, queues, deques, etc.

Yes, that's a valid argument. The short answer (in PTS) is that immutable collections cannot be empty, but the standard library also provides mutable collections that can be empty. This will be covered in a later section.


Argument #5: We need empty collections and null whenever we work with libraries and frameworks (possibly written in different languages) that use both.

Working with third-party APIs is not a problem, because we can simply convert data between the "null and empty" and "only null" worlds. Examples will be shown later.


Conclusion

As seen in this section we can eliminate empty strings and empty collections, and use null instead.

However, that doesn't mean yet that we should do it. If Bob can write a big application using only Windows Notepad, it doesn't mean that he should do it.

Should We Do It?

In this section we'll look at pros and cons of eliminating empty strings/collections, starting with the pros.

Potentially Troublesome Values Eliminated

The first PTS article introduced the following PTS Coding Rule: "All data types in a software project should have the lowest possible cardinality."

Reminder

The cardinality of a type is the number of allowed values in the type. For example, type boolean has a cardinality of two, because two values are allowed: true and false.

By eliminating the empty string and empty collections, the cardinality of all strings and all collections in every application has been reduced by one.

That's nice.

Even better, we've eliminated the most troublesome values in these types.

As every experienced developer knows, empty strings and empty collections are often invalid values, or they must be handled differently. For example: each name has at least one character; there's at least one student in each class; every online retailer sells at least one product, etc. Eliminating empty values by design eliminates potential bugs related to these values.

Simpler Code

Remember the source code examples from section What's the Problem?, where six programmers wrote different code, and the only correct version was this one:

Java
if ( email == null || email.isEmpty() ) {
    writeError ( id + ": no email" );
} else {
    writeInfo ( id + ": " + email );
}

By eliminating empty strings, the risk for wrong code in similar cases (common in many projects) has been eliminated. The correct code becomes simpler, and there's only one right way to do it. In a null-safe language, the check for null is even required by the compiler. We have to write:

Java
if ( email == null ) {
    writeError ( id + ": no email" );
} else {
    writeInfo ( id + ": " + email );
}

You don't need to wonder anymore: "Should I check for empty, or null, or both?" We always just have to check for null, and if we forget to do so (on a bad day), the compiler will remind us to do it.

More Reliable Code

The most important point to remember from this article is this: Using null instead of empty strings/collections increases software reliability. Let's see why.

When working with collections, more often than not we need to distinguish between the following two semantically different cases, and we have to handle them individually:

  • There are no elements.

  • There are one or more elements.

Some pseudo-code examples:

if directory_is_not_empty then
    copy_files_in_directory
else
    report_missing_files_error
.

if no_students_in_the_classroom then
    close_the_windows
    switch_off_the_lights
else
    if its_hot then
        open_the_windows
    .
    switch_on_the_lights
.

// real-life example
if there_are_bugs_in_the_code then
    fix_bugs
else
    work_on_new_features
.

Sometimes, we forget to distinguish between these two cases — typically, we forget to write specific code for the "no elements" edge case. This can result in bugs that remain undetected during the development and test phases, especially for edge cases that occur rarely.

The good news is this: If we use null (in a null-safe language) to represent "there are no elements" then we can't ignore these edge cases anymore, because the compiler gently reminds us to handle them. In other words, we are always required to handle both cases (i.e. "there are elements" and "there are no elements"). This eliminates many potential bugs.

If we use empty collections, we are allowed to write code like this:

Java
for ( Object element : collection ) {
    // handle element
}

An empty collection simply results in "do nothing" behavior. Sometimes this is the right thing to do, but usually it isn't. It's easy to overlook the "there are no elements" edge case. Moreover, if the code was written by somebody else, we don't know if the author intended to do nothing in case of an empty collection.

On the other hand, in null-safe languages that enforces null instead of empty collections, the above code wouldn't compile anymore. We'd have to write:

Java
if ( collection != null ) {
    for ( Object element : collection ) {
        // handle element
    }
}

... or:

Java
if ( collection != null ) {
    for ( Object element : collection ) {
        // handle element
    }
} else {
    // handle edge case
}

Yes, the code is a bit more verbose if no special handling is required for the "no elements" case, but there are two advantages:

  • We can't accidentally forget the "there are no elements" edge case.

    On a bad day we could still forget to add an else branch (whenever needed to handle the edge case), but the risk of this bug is largely mitigated since the test for collection != null is required.

  • The programmer's intention to handle or ignore the edge case is clearly stated in the code.

You'll later see a practical PTS example demonstrating the benefits.

Note

To see some practical Java examples, you can also read my article Is it Really Better to 'Return an Empty List Instead of null'? - Part 2. That article illustrates wrong outcomes caused by empty collections. For example: a majestic house you can get for free, and a runner-up declared as winner in an election.

Simpler and More Reliable APIs

If strings and collections can't be empty, their APIs becomes simpler and less error-prone.

For example:

  • There's no need for an isEmpty method.

  • Method size (aka length, which returns the number of elements in the collection) never returns zero. This eliminates the potential risks of some bugs, such as a division by zero when computing the average value in a list of numbers.

  • Methods first and last always return an element (instead of returning null or throwing an exception if the collection is empty).

  • Methods like allMatch, noneMatch, and anyMatch can't return unexpected and debatable results that can lead to subtle bugs in edge cases.

    People have different opinions of what these methods should return if the collection is empty — as can be seen in the following long discussion in the CodeProject lounge: .NET's Sometimes Nonsensical Logic.

  • Methods that compute an aggregate value (e.g. sum, average, min, max) also become more straightforward.

    For example, we don't need to debate questions like "What should function average do if the list of numbers is empty? Return zero? Return null? Throw an exception?"

More Efficient Code

When it comes to time- and space-efficiency, nothing beats null.

In most languages, null is super fast.

Assigning null to an object reference is typically implemented by simply assigning zero (all bits at 0) to a pointer, and checking for null is done by comparing the pointer to zero. Both are CPU operations that are extremely fast.

Better Code Analysis

If collections can't be empty then loops and iterations involving them are always guaranteed to execute at least once. This certainty can be leveraged by advanced compilers (and complementary tools such as static code analyzers) to make assumptions that can't be made if the body of the loop might be executed. For example, a compiler might be able to optimize target code for loops that are guaranteed to execute at least once.

Disadvantages

So far we've talked about advantages. Are there disadvantages too?

The only disadvantage I can think of is the occasional added verbosity. For example, instead of writing:

Java
int elementCount = collection.size();
boolean weHaveCheese = foodsInFridge.contains ( "cheese" );

... we have to write:

Java
int elementCount = collection == null ? 0 : collection.size();
boolean weHaveCheese = foodsInFridge == null ? false : foodsInFridge.contains ( "cheese" );

Yes, sometimes the code is more verbose — but that's a small price to pay for all the benefits we've seen so far. We can't have our cake and eat it too. On the plus side, the code is also more expressive and (in some cases) less error-prone because, again, the "no elements" edge case is handled explicitly in the code.

Collections in the Physical World

Whenever I struggle to come up with the best way to design data or write code, I often find it useful to look at how things work in the physical world.

So, how do we use collections in the real world?

Consider Bob who collects colorful stones, stored in a box labeled "Stones".

Alice doesn't collect stones. Does this mean she has an empty box labeled "Stones"? No, of course not. Alice simply doesn't have a box.

It's easy to translate this to the digital world: An empty box in the physical world is like an empty list in the digital world; no box at all is like null.

We could come up with many more examples, but after pondering about them for a while we would conclude:

  • Most collections (lists) in the physical world are immutable and non-empty.

    For example: the set of components in your computer model; the list of ingredients in grandma's Christmas cake; the list of the 2023 Nobel price winners, etc.

  • Sometimes we use mutable collections which are empty at their outset, get populated over time, and might finally be discarded or kept as immutable, non-empty lists.

    For example: Bob's "Stones" box; the list of students enrolled so far in a language course; the applications installed on your computer, etc.

As you'll see in the next section, PTS collections are designed to work like collections in the physical world.

How Does It Work?

In this section I'm going to show briefly how the concept of no empty strings/collections works in PTS. It's important to note that the approach described here is not the only viable one — it's the one I used in my proof-of-concept implementation of PTS. To keep this section short, many implementation details are left out.

Source code examples in the following sections use the PTS syntax, introduced in previous PTS articles.

Note

Please be aware that PTS is a new paradigm and still a work-in-progress. As explained in the History section of Essence and Foundation of the Practical Type System (PTS), I created a proof-of-concept implementation which is now a bit outdated — therefore you won’t be able to try out the PTS code examples shown in this article.

Non-empty Immutable Collections

PTS strings and collections are immutable and non-empty. The following types are defined in the standard library:

  • Type string: an immutable string that cannot be empty (i.e. the string must contain at least one character).

  • Types list, set, map, etc.: immutable collections that cannot be empty (i.e. the collection must contain at least one element).

These are the types predominantly used in function signatures (i.e. immutable and non-empty). For example, the following function takes a non-empty, immutable list of non-empty, immutable strings as input, and returns a non-empty, immutable list of integers:

fn foo ( strings list<string> ) -> list<integer>
    // body
.

If input and output are allowed to be null (i.e. there might be "no elements"), the signature will contain union types (t1 or t2), like this:

fn foo ( strings list<string> or null ) -> list<integer> or null
    // body
.

If the input and output lists are also allowed to contain null elements, the signature is as follows:

fn foo ( strings list<string or null> or null ) -> list<integer or null> or null
    // body
.

No Empty Literals

Because strings can't be empty, there is no empty string literal, as shown below:

const name = "Bob"               // OK
const name = ""                  // compile-time error
const name string or null = null // OK

There aren't any empty collection literals either:

const numbers = [list 1 2 3]              // OK
const numbers = [list ]                   // compile-time error
const numbers list<number> or null = null // OK

Immutable Collection Builders

While literals are convenient for hard-coding predefined values, builder types allow us to programmatically create strings and collections. For example, we use a list_builder to build a list.

Builders apply the Builder pattern, commonly used in object-oriented languages. Internally, a builder uses a mutable data structure (e.g. a mutable list) to build the collection. An immutable collection is built in three steps:

  • Create a builder object (e.g. list_builder.create)

  • Add elements (e.g. builder.append ( ... ))

  • Create a non-empty, immutable collection by calling builder.build (or builder.build_or_null if there might be no elements)

Here's an example of a function that creates a range of integers:

fn int_range ( start integer, end integer ) -> list<integer>
    in_check: end >= start
    
    const builder = list_builder<integer>.create
    repeat from i = start to end
        builder.append ( i )
    .
    return builder.build
.

Mutable Collections That Can Be Empty

Sometimes we need mutable collections that can be empty. For example: stacks, queues, deques, collections that are populated by several functions, etc.

To keep these data structures efficient and practical, I opted to provide dedicated mutable collections in PTS.

The name of a mutable collection type always starts with the mutable_ prefix, followed by the name of its immutable counterpart. Thus, the standard PTS library provides:

  • Type mutable_string: a mutable string that can be empty (i.e. its character length might be zero).

  • Types mutable_list, mutable_set, mutable_map, etc.: mutable collections that can be empty (i.e. their size might be zero).

Here's a trivial example of a function that appends one or two elements to a mutable list passed as argument:

fn append_elements ( strings mutable_list<string> )
    
    if strings.is_empty
        strings.append ( "first" )
    .
    strings.append ( "foo" )
.

We can convert a mutable collection into its immutable counterpart (or null if the collection is empty) by calling method to_immutable_or_null (e.g. return customers_found.to_immutable_or_null).

If the mutable collection is assumed to contain at least one element in a given context, the method to_immutable_or_throw should be used: instead of returning null whenever the mutable collection is empty, this method throws an error (exception).

Loops Syntax

We can iterate over collections via a classical loop (imperative style), streams (functional style), or recursion. This section covers only the classical loop construct.

Here's a simple example of using a repeat statement to iterate over a collection:

repeat for each number in [list 1 2 3]
    out.write_line ( number.to_string )
.

Output:

1
2
3

In cases where there might be no elements, the compiler requires a check for null:

const commands list<command> or null = get_commands()
if commands is not null
    repeat for each command in commands
        log_info ( """Executing command {{command.to_string}}.""" )
        // code to execute command
    .
else
    log_warning ( "There are no commands to execute." )
.

The else branch is optional:

const commands list<command> or null = get_commands()
if commands is not null
    repeat for each command in commands
        log_info ( """Executing command {{command.to_string}}.""" )
        // more code
    .
.

We can shorten the above code by using the if_null clause in the repeat statement:

repeat for each command in get_commands() if_null: skip
    log_info ( """Executing command {{command.to_string}}.""" )
    // more code
.

If a collection is declared to be nullable (either explicitly or via type-inference), but supposed to be non-null in a given context, we can use the if_null: throw clause to abort program execution whenever the collection is null, despite our assumption to the contrary:

repeat for each command in get_commands() if_null: throw "'commands' is not supposed to be 'null'."
    log_info ( """Executing command {{command.to_string}}.""" )
    // more code
.

The above code is a shorthand for the following one; both throw an error if the collection is null:

const commands list<command> or null = get_commands()
if commands is not null
    repeat for each command in commands
        log_info ( """Executing command {{command.to_string}}.""" )
        // more code
    .
else
    throw null_iterator_in_loop_error.create (
        message = "'commands' is not supposed to be 'null'.",
        id = "NULL_ITERATOR_IN_LOOP" )
.

Working with Non-PTS Libraries

Empty strings and collections are ubiquitous in non-PTS libraries. How can we use these libraries in a language where strings and collections can't be empty?

This depends largely on the PTS implementation, but let's consider a PTS implementation that generates Java target code and allows Java source code to be embedded between java and end java statements. Then there are at least three solutions to use Java libraries from within a PTS application:

  • Convert input/output arguments

    Before calling a Java library function that doesn't allow null as input, but requires an empty collection, we need to convert null into an empty collection. Here's an example of PTS code using embedded Java:

    java
        sendCommands ( commands == null ? Collections.emptyList() : commands );
    end java

    After calling an external Java function that might return an empty collection, we need to convert the empty collection into null, e.g.:

    java
        List<Command> commands = getCommands();
        if ( commands.isEmpty() ) {
            commands == null;
        }
    end java

    If these transformations are needed often, we can create utility functions that serve as wrappers, so that client code remains idiomatic and succinct.

  • Use mutable collections that can be empty

    Instead of converting collections, an alternative solution is to use mutable PTS collections (covered previously in section Mutable Collections That Can Be Empty) to work with non-PTS libraries.

    However, this solution is not recommended, because we lose the advantages of immutable, non-empty collections.

  • Use dedicated types to work with non-PTS libraries

    A standard PTS library can provide immutable collections that can be empty, dedicated to be used only when working with non-PTS libraries.

    For example, in my proof-of-concept PTS implementation, I created type emptyable_string (in addition to string and mutable_string). This specific type was sometimes handy — it simplified the task of working with external Java libraries, especially in rare cases where null and an empty string had different meanings.

Example

Mistakes are a fact of life. It is the response to the error that counts.

Nikki Giovanni

In this section you'll see a practical example demonstrating the benefits of using a null-safe language that doesn't permit strings and collections to be empty. We'll investigate a simple function under two different paradigms: unsafe and safe.

Unsafe Paradigm

PTS is null-safe and immutable strings and collections can't be empty. However, for the purpose of this exercise, let's first suppose that PTS was designed like many other languages:

  • support for null, but not null-safe

  • strings and collections can be empty

Now imagine we want to compute the average length of remarks entered by users in a data entry form. Consider the following PTS function to achieve this:

fn average_length_of_remarks ( remarks list<string> ) -> decimal
    
    variable sum = 0.0
    repeat for each remark in remarks
        sum = sum + remark.length
    .
    return sum / remarks.size
.

The code would look similar in many other languages. Here's an example in Java:

Java
static double averageLengthOfRemarks ( List<String> remarks ) {

    double sum = 0.0;
    for ( String remark : remarks ) {
        sum+= remark.length();
    }
    return sum / remarks.size();
}

Note

We could also employ a functional style in PTS and Java. Using Java streams, for example, the code would be:

Java
static double averageLengthOfRemarks2 ( List<String> remarks ) {

    return (double) remarks
        .stream()
        .mapToInt ( String::length )
        .sum() / remarks.size();
}

However, whether we use an imperative or functional style is irrelevant for the topic at hand. We'll continue this exercise using a classical loop.

If we call the above PTS function with [list "f" "fo" "foo"] as input, it returns the correct result: 2.0.

Unfortunately, although small and simple, there are three problems:

  • If the function is called with remarks = null, a null pointer runtime-error occurs in the repeat statement (for statement in the Java version).

  • If the function is called with an empty list as input, a division by zero error occurs in the last statement (because remarks.size is zero).

  • Assuming that empty remarks should be ignored, the result will be wrong if input argument remarks contains empty strings. For example, calling the function with [list "foo" "" "foo" ] returns 2.0 (=6/3) instead of 3.0 (=6/2).

    A silently ignored error occurs (worst-case).

Safe Paradigm

Now let's see what actually happens in PTS (which is null-safe and doesn't permit empty strings and collections):

  • A function call with remarks = null is not permitted and results in a compile-time error, because all types are non-nullable by default.

  • We are also not permitted to call the function with an empty list as input, because collections can't be empty. Hence, a division by zero error cannot occur.

  • Input argument remarks cannot contain empty strings, because PTS strings are non-empty too. Therefore, a silently ignored error due to empty remarks cannot occur.

As you can see, all three problems have disappeared. The above function doesn't support the "no remarks" and "some empty remarks" cases, and the compiler ensures that nobody calls the function with invalid input. The function requires a non-null, non-empty list of remarks, and empty remarks are not allowed in the list.

Now suppose that the function should handle empty remarks. Here's the new version:

fn average_length_of_remarks ( remarks list<string or null> ) -> decimal or null
    
    variable sum = 0.0
    variable remarks_count = 0
    repeat for each remark in remarks
        if remark is not null
            sum = sum + remark.length
            remarks_count = remarks_count + 1
        .
    .
    return if remarks_count =v 0 then null else sum / remarks_count
.

Note that:

  • The type of elements contained in parameter remarks changed from string to the union type string or null, which explicitly states that empty remark fields are now supported.

  • The check if remark is not null is required by the compiler because PTS is null-safe, and remark.length would result in a null pointer error if remark were null at run-time.

  • The expression for the return-value if remarks_count =v 0 then null else sum / remarks_count is also required by the compiler. If we simply wrote return sum / remarks_count, the code would not compile, because remarks_count might be zero and thus cause a division by zero (note that this is an advanced compiler feature beyond the scope of this article).

    Because the function might return null, we are also required to change the output type from decimal to decimal or null.

    This, in turn, means that the caller of the function can't forget to handle the edge case where all remark fields are empty, and therefore an average length of remarks could not be computed. We're always safe.

What about variable remarks_count, which is needed to compute the correct result if there are empty remark fields? Suppose we incorrectly wrote:

fn average_length_of_remarks ( remarks list<string or null> ) -> decimal
    
    variable sum = 0.0
    repeat for each remark in remarks
        if remark is not null
            sum = sum + remark.length
        .
    .
    return sum / remarks.size
.

Does the compiler report a bug?

No, it doesn't, because the above code would be correct if we actually wanted empty remarks to be included in the computation of their average length.

As said already in a previous PTS article, the slogan "if it compiles, it works" is just wishful thinking. We still need to write unit tests to detect logical bugs.

Now let's consider the "there are no remarks" edge case. The function doesn't currently handle this case. Therefore, the caller of the function is required to explicitly handle this case (if it occurs). However, if we want the function itself to also handle this case, we can easily do this:

fn average_length_of_remarks ( remarks list<string or null> or null ) -> decimal or null

    if remarks is null then return null

    // rest of code
.

Now the function returns null when called with remarks = null.

Conclusion

Hopefully, these simple examples demonstrate the benefits of a null-safe language that doesn't permit strings and collections to be empty.

Imagine a big application with hundreds or even thousands of edge cases similar to the above ones. Having a compiler that spots all edge cases, including the most elusive ones, is like having a helpful, reliable companion, always by our side, tapping on our shoulder from time to time and telling us: "Look! Here's an edge case that needs to be handled or ignored explicitly."

Nikki Giovanni wisely said:

"Mistakes are a fact of life. It is the response to the error that counts."

We can easily apply this insight to the field of software development:

"Coding mistakes are a fact of life. It is the compiler response to the error that counts."

Fail fast!

Other Languages

This article focuses on languages that support null and ensure null-safety. What about other programming languages? Would it still make sense to apply the ideas presented in this article?

Non-null-Safe Languages

Many popular programming languages support null, but aren't null-safe. Should new languages applying this paradigm also restrict immutable strings/collections to be non-empty?

Yes, if software reliability matters.

The difference between a null-safe and a non-null-safe language is this: In a null-safe language we get a compile-time error if null is handled incorrectly in the code; in a non-null-safe language we get a run-time error instead — the dreaded null pointer error, whenever a pointer to null is dereferenced.

Note

In some languages, dereferencing a null pointer results in undefined behavior (instead of throwing a null pointer error). In this section we'll only consider languages that throw null pointer errors.

A null-pointer-error is "bad". But a silently ignored error due to an empty string or collection is "very bad". "Bad" is better than "very bad". It's generally much better for an application to immediately throw a null pointer error and abort program execution than to ignore the problem and silently continue execution with polluted or corrupted data that crawls through the system, and, sooner or later (maybe much later), risks to result in a mysterious bug that's difficult to identify and fix.

Many people hate null pointer errors, but these errors are in fact very useful, because:

  • They support the Fail-Fast! principle (at run-time) and the bug is therefore more likely to be discovered early in the development process. The program crashes immediately and noisily, instead of ignoring the problem and silently continuing execution with wrong/corrupted data.

    The final outcome of a null pointer error is usually less severe and more predictable.

    Imagine a nightmarish scenario like this one: Application A writes wrong data into a database. Later on, this data is read and handled by applications B, C, and D. Although the bug originated in application A, it manifests in applications B, C, and D.

  • Null pointer errors are usually easy to identify and fix, because their cause and effect are short-distanced.

  • Helpful tools capable of identifying potential null pointer errors in the codebase are available for most popular programming languages.

Languages That Use an Optional/Maybe Type

Some languages don't support null. Instead, they provide a dedicated type for cases where a value might be absent. For example, Rust provides the Option type, F# the Option monad, and Haskell the Maybe monad.

The idea of non-empty strings and collections can also be applied in new languages using this approach. Instead of empty strings/collections, the None/Nothing values of the Option/Maybe types are used, respectively, and pattern matching ensures that we distinguish between the "no elements" and "one or more elements" cases, thereby reaping benefits akin to those demonstrated in this article.

Summary

Preventing immutable strings and collections from being empty provides the following advantages:

  • More reliable code because more bugs can be found at compile-time. The compiler reminds us to handle edge cases we might easily overlook.

  • Simpler and less error-prone code because there is no need to test for "null or empty".

  • Simpler and less error-prone APIs for non-empty, immutable string and collection types (e.g. element_count never returns zero, hence no risk of a division by zero).

These advantages are particularly valuable when working on large codebases.

The following approach is used in PTS:

  • Strings and collections are immutable and can't be empty.

  • Instead of empty strings/collections, null is used to represent "no elements".

  • Builders are used to build (create) immutable strings and collections.

  • PTS also provides mutable strings and collections that can be empty, but they are used rarely.

Acknowledgments

Many thanks to Tristano Ajmone for his useful feedback to improve this article.

History

  • 2024-06-28: initial version
  • 2024-07-02: fixed 'Developer E' example

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)