Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / Java / Java8

Collectors Part 1 – Reductions and Short-Circuiting Operations‏

5.00/5 (2 votes)
14 Aug 2014CPOL6 min read 9.7K  
Reductions and Short-Circuiting Operations

In the first couple of articles, we looked at streams. We saw that we could take something simple such as a list of countries, filter or map their names and then print them via a foreach. We then looked at ranges/loops and generators as a way of supplying values as an alternative to a predefined list.

Although we didn’t explicitly mention this, a stream can be divided into 3 distinct parts:

  1. A source operation such as a supplier or a generator which pushes elements into our stream via a spliterator.
  2. Optional intermediate steps: these can filter values, sort values, map values, affect the stream’s processing (such as go parallel) and so on.
  3. Finally, a terminal operation either consumes the values, reduces the values, short-circuits the values or collects them. Short-circuiting a terminal operation means that the stream may terminate before all values are processed. This is useful if the stream is infinite.

We’ve covered the first two parts reasonably well and also used forEach to do consuming, so let’s now look at collecting. Why collect instead of consume? There are several reasons including:

  • Since it returns nothing, consuming must involve a side-effect (else it wouldn’t do anything) which when running in parallel might not be in the order we expect or to put it in order cause unnecessary synchronisation
  • We want to use the results again later
  • We want to reduce the values into a single result
  • We want to be able to inspect/return the values, such as for unit tests or to build in reusability
  • Side-effects can make testing hard and often require mocking
  • Side-effects break the concept of pure-functions (values in, results out only; same values in give same results out) which make it harder to prove code works

We’ll start by looking at reduction. This is a form of collecting where instead of returning all the results which come out of the stream, we condense them down into [usually] a single result. A common example would be summing all the values. Let’s look at the built in reduction operations using a list of Integers as the source:

Java
public class ListReduction
{
  public static void main(String[] args)
  {
	List<Integer> numbersList = Arrays.asList(1, 2, 5, 4, 3);

	System.out.println(numbersList.stream().count());
		
	System.out.println(numbersList.stream().mapToInt(x -> x).sum());

	System.out.println(numbersList.stream().mapToInt(x -> x).average()
				.getAsDouble());

	System.out.println(numbersList.stream().mapToInt(x -> x).max()
				.getAsInt());

	System.out.println(numbersList.stream().mapToInt(x -> x).min()
				.getAsInt());

	System.out.println(numbersList.stream().mapToInt(x -> x)
				.summaryStatistics());
  }
}

Note:

  • the summaryStatistics() operation calculates all the values
  • average() returns an OptionalDouble – we need to use getAsDouble() to get the value
  • max() and min() return OptionalInt – we need to use getAsInt() to get the value

As already discussed in the article on Optional, if the Optional value happens to be the special empty() value [when we didn't pass any values through or filtered all of them out], we will get a NullPointerException if we try to use get or getAs<type> – we might wish to consider getOrElse for example to supply a default to avoid this.

Also note because we were streaming a list, we had to use mapToInt(x -> x) to change the stream shape from Object to int as IntStream works with int not Integer.

If we used an array of int instead, we could dispense with the map:

Java
public class ArrayReduction
{
  public static void main(String[] args)
  {
	int[] numbersArray = new int[] { 1, 2, 5, 4, 3 };

	System.out.println(Arrays.stream(numbersArray).count());

	System.out.println(Arrays.stream(numbersArray).sum());

	System.out.println(Arrays.stream(numbersArray).average().getAsDouble());

	System.out.println(Arrays.stream(numbersArray).min().getAsInt());

	System.out.println(Arrays.stream(numbersArray).max().getAsInt());

	System.out.println(Arrays.stream(numbersArray).summaryStatistics());
  }
}

This looks a bit tidier. We can’t do anything about having to create the stream each time. If we tried to save a reference to Arrays.stream(numbersArray), it would only be able to be used once. This is why summaryStatistics can be very useful.

What if we want to write our own reductions? There are two ways. The first is to use the reduce operation which we’ll look at here. The other way is to use collect which we’ll look at in the next article.

To do reduction, we need one or two things:

  • a binary function which takes two values and returns a single one
  • we may also need an initial value (termed the identity)

Let’s imagine the stream as a queue of values [assume the stream is sequential]. If an identity value is given, we’ll put that in queue first. All the values from the stream in turn are then added to the queue. Once we have our queue, we remove the first value and assign it to the accumulator. While there are more values in the queue, we remove the next first value from the queue, and then perform the binary function on the accumulator and the value removed. We then assign the result back to the accumulator. This is repeated until the queue is empty.

It’s easy to see why it’s useful to have an identity value. In the case of sum, for example, the identity is zero, and thus zero is assigned to the accumulator before values are taken from the stream. If there are no values in the stream, the final result is just zero.

What if both the streams are empty and there was no identity value? To solve this problem, the version of the API without an identity value returns an appropriate Optional. You can now see why we took a detour to discuss Optional in the last article.

Let’s replace the built in operations above with explicit reductions using reduce:

Java
public class ExplicitReductions
{
  public static void main(String[] args)
  {
    int[] numbersArray = new int[] { 1, 2, 3, 4, 5 };

    System.out.println(Arrays.stream(numbersArray).map(x -> 1)
                             .reduce(0, Integer::sum));

    System.out.println(Arrays.stream(numbersArray)
                             .reduce(0, Integer::sum));

    System.out.println(Arrays.stream(numbersArray)
                             .reduce(Integer::min).getAsInt());

    System.out.println(Arrays.stream(numbersArray)
                             .reduce(Integer::max).getAsInt());
   }
}

A few things to note:

  • To perform count, we have to map the values to a 1 and then do a sum. It might seem that it would be far easier to just use length on the array to get the count, however remember in a stream we might have other operations first such as to filter some of the values. An example use might be to count how many values are even.
  • Average is missing since it’s a bit more complicated. We have to keep both a tally and a sum so the simple call to reduce is not enough to implement it.
  • The reduction operation is also called ‘fold left’ since if we drew a tree, it would be leaning left.

For example with 4 values:

foldleft

This reduces to (((Val1 Op1 Val2) Op2 Val3) Op3 Val4)

We can use our own functions in reduce. For example, to do a factorial, we just need a function which multiplies the accumulator by the next value:

Java
public class Factorial
{
  public static void main(String[] args)
  {
    int n = 6;
		
    System.out.println(IntStream.rangeClosed(1, n)
                                .reduce((x, y) -> x * y).getAsInt());
  }
}

Let’s finish off by looking at the short-circuit operators:

Java
public class ShortCircuit
{
  public static void main(String[] args)
  {
    List<String> countries = Arrays.asList("France", "India", "China",
                                           "USA", "Germany");

    System.out.println(countries.stream()
                       .filter(country -> country.contains("i"))
                       .findFirst().get());

    System.out.println(countries.stream()
		       .filter(country -> country.contains("i"))
                       .findAny().get());

    System.out.println(countries.stream()
                       .allMatch(country -> country.contains("i")));
		
    System.out.println(countries.stream()
                       .allMatch(country -> !country.contains("z")));

    System.out.println(countries.stream()
                       .noneMatch(country -> country.contains("z")));

    System.out.println(countries.stream()
                       .anyMatch(country -> country.contains("i")));

    System.out.println(countries.stream()
                       .anyMatch(country -> country.contains("z")));
  }
}

As said earlier, terminal short-circuit operations may mean we don’t process all the values in the stream. There are built in operations to find the first value that matches [findFirst], any one value that matches [findAny] and to find out if all, any or none match [allMatch, anyMatch, noneMatch].

Note in the case of findFirst or findAny, we only need the first value which matches the predicate (although findAny is not guaranteed to return the first). However, if the stream has no ordering then we’d expect findFirst to behave like findAny. The operations allMatch, noneMatch and anyMatch may not short-circuit the stream at all since it may take evaluating all the values to determine whether the operator is true or false. Thus, an infinite stream using these may not terminate.

We’ve still got collectors to look at, so that will be the focus of the next article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)