Bart Heupers Software

How to filter a list

on Sept. 7, 2017, 2:35 p.m.

In Java job interviews I often ask how candidates would filter a list of items. Let’s suppose you have list of Integers and you only want to keep the even values. This seems like a simple task but there are some pitfalls.

The first thing to be noticed is that if the list is iterated with a for loop it is not possible to change the list while iterating.

for(Integer i : list) {        
    if( ! isEven(i) )         
        list.remove(i);  // Will throw a exception
}

Some candidates will start using indexes in the loop. Then they have to be clever to either iterate backwards, or decrease the index by one if a element is removed.

for(int index = 0; index < list.size() ;i++ ) {    
    Integer i = list.get(index);    
    if( ! isEven(i) ) {       
        list.remove(index--);    
    }
}

Some candidates just avoid this problem, and they will create a new list, and add only the even Integers. Actually that is the solution that has the best run-time performance. But it needs some extra memory for the new list.

Of course most Java programmers know that it is possible to remove items from a collection by using a Iterator.

Iterator it = list.iterator(;while( it.hasNext()) {    
    Integer i = it.next();    
    if( ! isEven(i) )        
        it.remove();
}

This certainly looks more elegant. Things start to become complicated when I ask for which kind of lists this is a good solution. Usually they know ArrayList and LinkedList, although most people only use ArrayLists.

The solution using indexes has a very bad performance when using LinkedLists. In a LinkedList you do list.get(index) you have to iterate the whole list up to the indexed element. There is no fast indexed access to a LinkedList.

The solution using the iterator works pretty well for LinkedLists. But when we use an ArrayList then both solutions have the drawback, that because in a ArrayList an array is used for storage, when a element is removed, all the following elements have to be moved to make the array a contiguous array again. Again this will have a O(N^2) complexity.

So as I mentioned before, if we need to filter a ArrayList, the solution that creates a new (Array)List and does not modify the original List has the best runtime performance.

AbstractList result = new ArrayList(in.size()); 
Iterator it = in.iterator(); 
while(it.hasNext()) {     
    Integer i = it.next();     
    if( (isEven(i) )         
        result.add(i); 
}

Now this all looks fine in theory, but how does this all work out in reality. After asking this so many times I wanted to see if reality conforms to theory. Therefore I wrote a small program to test this. The program is attached below, but I will share some results here (time in nanoseconds, smaller numbers are better) :

Use array size 100 (repeated 100000 times)
Filter ArrayList with remove   :317578183
Filter LinkedList with remove  :260825358
Filter ArrayList with new list :231603703
----------------------------------------
....
----------------------------------------
Use array size 100000 (repeated 100 times)
Filter ArrayList with remove   :32495193334
Filter LinkedList with remove  :265319754
Filter ArrayList with new list :215448592

Here we see that for a small array of 100 elements there is a small performance improvement, but if the array has 100000 elements, then the performance improvement by either using a LinkedList, or create a new ArrayList instead is more than a factor 100.

If the lists are bigger and the array does not fit in the processor cache anymore the differences become more dramatic.

So this is really a good demonstration of basic computer theory in practice, and it does make sense to think a bit the next time you filter a list.

TestFilter.java