Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / programming / performance

Iteration Over Java Collections with High Performance

2.08/5 (4 votes)
6 Jul 2018CPOL2 min read 33.9K  
Travelling over Java collections is just a piece of cake, but when the size of the collections increases you have to choose wisely

Introduction

Java developers usually deal with Collections such as ArrayList, HashSet, Java 8 come with lambda and streaming API helping us to work easily with Collections. In most cases, we work with a few thousands of items and performance isn't a concern. But in strict code, when we have to travel over millions of items several times, performance will become a pain.

forEach vs C style vs Stream API

Iteration is a basic feature so that all programming languages have simple syntax to allow programmers to run through collections. Stream API can iterate over Collections in a very straightforward manner.

public List<Integer> streamSingleThread(BenchMarkState state){
    List<Integer> result = new ArrayList<>(state.testData.size());
    state.testData.stream().forEach(item -> {
        result.add(item);
    });
    return result;
}
public List<Integer> streamMultiThread(BenchMarkState state){
    List<Integer> result = new ArrayList<>(state.testData.size());
    state.testData.stream().parallel().forEach(item -> {
        result.add(item);
    });
    return result;
}

forEach loop is also simple:

public List<Integer> forEach(BenchMarkState state){
    List<Integer> result = new ArrayList<>(state.testData.size());
    for(Integer item : state.testData){
        result.add(item);
    }
    return result;
}

C style is more verbose, but still very compact:

public List<Integer> forCStyle(BenchMarkState state){
    int size = state.testData.size();
    List<Integer> result = new ArrayList<>(size);
    for(int j = 0; j < size; j ++){
        result.add(state.testData.get(j));
    }
    return result;
}

Then the performance:

Benchmark                               Mode  Cnt   Score   Error  Units
TestLoopPerformance.forCStyle           avgt  200  18.068 ± 0.074  ms/op
TestLoopPerformance.forEach             avgt  200  30.566 ± 0.165  ms/op
TestLoopPerformance.streamMultiThread   avgt  200  79.433 ± 0.747  ms/op
TestLoopPerformance.streamSingleThread  avgt  200  37.779 ± 0.485  ms/op

With C style, JVM just simply increases an integer, then reads value directly from memory, so that it is very fast. But forEach is very different. According to  answer on StackOverFlow and document from Oracle, JVM has to convert forEach to Iterator and calls hasNext() with every item, that's why forEach is much slower than C style.

Which is a High Performance Way to Travelling Over Set?

With data:

Java
@State(Scope.Benchmark)
public static class BenchMarkState {
    @Setup(Level.Trial)
    public void doSetup() {
        for(int i = 0; i < 500000; i++){
            testData.add(Integer.valueOf(i));
        }
    }
    @TearDown(Level.Trial)
    public void doTearDown() {
        testData = new HashSet<>(500000);
    }
    public Set<Integer> testData = new HashSet<>();
}

Java Set also supports Stream API and forEach loop, according to the previous test, should we wrap Set to ArrayList, then travel over ArrayList?

Java
    public List<Integer> forCStyle(BenchMarkState state){
        int size = state.testData.size();
        List<Integer> result = new ArrayList<>(size);        
        Integer[] temp = (Integer[]) state.testData.toArray(new Integer[size]);
        for(int j = 0; j < size; j ++){
            result.add(temp[j]);
        }
        return result;
    }

How about combine iterator with C style for loop?

Java
    public List<Integer> forCStyleWithIteration(BenchMarkState state){
        int size = state.testData.size();
        List<Integer> result = new ArrayList<>(size);
        Iterator<Integer> iteration = state.testData.iterator();
        for(int j = 0; j < size; j ++){
            result.add(iteration.next());
        }
        return result;
    }

Or just simple travel?

Java
    public List<Integer> forEach(BenchMarkState state){
        List<Integer> result = new ArrayList<>(state.testData.size());
        for(Integer item : state.testData) {
            result.add(item);
        }
        return result;
    }

Nice idea, but it doesn't work because initializzing new ArrayList also consumes resources.

    Benchmark                                   Mode  Cnt  Score   Error  Units
    TestLoopPerformance.forCStyle               avgt  200  6.013 ± 0.108  ms/op
    TestLoopPerformance.forCStyleWithIteration  avgt  200  4.281 ± 0.049  ms/op
    TestLoopPerformance.forEach                 avgt  200  4.498 ± 0.026  ms/op

HashMap (HashSet uses HashMap<E,Object>) isn't designed for iterating all items, the fastest way to iterate over HashMap is a combination of Iterator and C style for loop, because JVM doesn't have to call hasNext().

Conclusion

Foreach and Stream API are convenient to work with Collections, you can write code faster. However when your system is stable and performance is a major concern, you should think about rewriting your loop.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)