Java 8 Project Lambda

by
Rad Widmer, Senior Software Engineer
Object Computing, Inc. (OCI)

Introduction

Java language designer Daniel Smith describes the forthcoming enhancements in Java 8 as "dramatic and necessary". He isn't exaggerating. On the language side, major new features include

And on the libraries side, there will be major enhancements to the Collections libraries, with the introduction of the Stream framework and related interfaces. All of this will enable us to write better Java code, using a more "fluent", functional, and declarative style, with much less boilerplate code for many common use cases. And the Java language designers have managed to do all this in a way that extends naturally from the existing language. Even so, it will bring a significant paradigm shift in how we program and how application code and the libraries interact. In a nutshell, it will enable applications to focus more on the "what", and let the libraries take care of the "how".

In this article we discuss the new language features coming in Java 8, as well as the most important enhancements to the standard libraries, specifically the new Stream interface. Note that as of this writing (January 2013), the project lambda features have not been finalized, and the final version will likely differ somewhat from what is described here.

Lambda Expressions

Lambdas, also known as closures, are simply anonymous functions - i.e., functions with no name. The syntax is

A lambda expression has the same elements as a Java method: a list of parameters, a body, and a return value. The types of the parameters may be given explicitly, or they can be inferred from the target type (more on target types later). The body can be either an expression or a block of statements. A lambda can also throw exceptions.

Let's look at some examples.

      Runnable r = () -> System.out.println("hello world");
      r.run();  // prints "hello world"
    

Here a lambda expression is assigned to a Runnable and implements the Runnable.run() method. The compiler determines what type the lambda should be by the type of the target (Runnable in this case). This is a simple instance of target typing. Contrast this with how you would have to code the above example in Java 7 using an anonymous inner class. Most of the code here is boiler plate.

        Runnable r = new Runnable() {
            @Override
            public void run() {
                System.out.println("hello world");
            }
        };
      

An example with a single parameter. Note the parentheses around the parameters are optional when there is one parameter.

      FileFilter filter = f -> f.getAbsolutePath().endsWith(".txt");
      return filter.accept(new File("myfile.txt"));  // returns true
    

The lambda expression is assigned to a FileFilter instance. Again, the compiler knows the type of the lambda based on the type of the target, and it also knows the parameter f is a File because of the signature of the FileFilter.accept(File f) method. It doesn't hurt to add the parameter types in the lambda expression, but it usually is not needed.

In the next example a lambda expression implements a Comparator.

        // sort a list by lastName
        List<Person> persons = ...;
        persons.sort((p1, p2) -> p1.getLastName().compareTo(p2.getLastName()));
    

Variable Capture

A lambda expression may reference variables which are visible in the current scope -- this is what makes lambdas closures instead of simply functions. The referenced variables must be either final or "effectively final". An effectively final variable meets the same restrictions as final variables -- it's only assigned to once, but it doesn't have the final keyword.

        void doRun(String msg) {
           Runnable r = () -> System.out.println(msg);
           r.run();
       }
    

A reference to this inside a lambda expression references to the enclosing class instance. The lambda instance itself has no this.

In contrast to anonymous inner classes, lambdas don't introduce another level of scope. Lambdas reside at the same level as the enclosing context. One consequence of this is that the lambda's parameter names can't shadow any local variable names in the enclosing method. The following example would not compile.

        String a, b;

        List<String> strings = ...;
        // compile error -- lambda parameter names shadow local variables
        strings.sort((a, b) -> a.compareTo(b));
    

Functional Interfaces

In the above examples, we saw lambda expressions assigned to common existing JDK interfaces. So what is the type of a lambda expression? The Java language designers chose not to introduce a new "function type" to support lambdas. Instead, they decided to build on what was already in the language by defining a special category of interfaces called Functional Interfaces. A functional interface is any interface that has exactly one abstract method. It is this abstract method which is implemented by the lambda expression, and the type of the lambda expression is an instance of the functional interface. By this definition, many existing interfaces in the JDK are functional interfaces, and can be implemented by lambda expressions. These include

Java 8 introduces several new functional interfaces, designed specifically to work with enhancements to the Collections and related libraries, but useful in their own right. These include the following from the java.util.functions package. Note that the new JDK8 function APIs are still evolving and the released version will likely differ somewhat from what is described here.

interface method purpose
Predicate<T> boolean test(T t); returns true if the input matches some criteria
Supplier<T> T get(); A supplier of objects. The result objects are either created during the invocation of get or by some prior action.
Block<T> void accept(T t); Performs operations on an object. Can change the state of this object or other objects
Function<T, R> R apply(T t); Maps an input object of type T to an appropriate output object of type R
MultiFunction<T,U> void apply(Collector<U> collector, T element); Maps a T value to multiple U values.
BinaryOperator<T> T operate(T left, T right); Performs an operation with two operands and returns a result of the same type.

Target Typing

The syntax for lambda expressions doesn't include the functional interface type. So how does the compiler know the type of the lambda? It is determined by the context in which the lambda is used. The context must have a target type, T, and that is the type of the lambda expression. The compiler determines whether the lambda expression is compatible with the target type by checking the following conditions:

These rules imply that the same lambda expression can have a different type in different contexts.

    interface Worker {
        void doWork();
    }

    // Lambda type is Runnable
    Runnable r = () -> System.out.println("no params lambda");
    // Lambda type is Worker
    Worker w = () -> System.out.println("no params lambda");
      

The target type can be specified with a type cast.

    // compile error -- target type is Object
    Object r = () -> System.out.println("hello"); // Compile Error

    // Casting can be used to specify a target type.
    Object r = (Runnable) () -> System.out.println("hello");
      

Since the compiler "knows" the parameter types, they usually don't need to be specified in the lambda.

    // s is type String because the target function's signature is test(String s).
    Predicate<String> p = s -> s.length() < 80;

    // f is type File because the target function's signature is accept(File f).
    FileFilter filter = f -> f.getAbsolutePath().endsWith(".txt");
      

The lambda must not throw any exceptions that are not declared by the target type.

    DateFormat fmt = new SimpleDateFormat("yyyyMMdd");
    // compile error -- DateFormat.parse(String s) throws ParseException
    Function<String, Date> dateToString = d -> fmt.parse(d);
      

Method References

Frequently a lambda expression will simply make a call to a method, passing along the lambda's parameters. In cases like this, there is a new abbreviated syntax called method reference that omits the parameters. There are actually several types of method references, each with similar syntax and slightly different semantics. The following table summarizes the different types. In the table, C is a class name, m is a method name, v is a variable, and the lambda's parameters are (a, b, c).

type of method called syntax invocation
static method C::m C.m(a, b, c)
instance method C::m a.m(b, c)
particular instance method v::m v.m(a, b, c)
constructor C::new new C(a, b, c)

Note that when referencing an instance method, the first parameter becomes the receiver in the invocation.

The following code shows examples of pairs of equivalent lambdas implemented first without and then with a method reference.


    // invoke a static method
    Block<String[]> b1 = s -> Arrays.sort(s);
    Block<String[]> b2 = Arrays::sort;  // method reference

    // invoke an instance method
    Predicate<String> p1 = s -> s.isEmpty();
    Predicate<String> p2 = String::isEmpty();  // method reference

    // Comparator calls an instance method
    listOfStrings.sort((s1, s2) -> s1.compareTo(s2));
    listOfStrings.sort(String::compareTo());  // method reference

    // use an instance of SimpleDateFormat to convert a Data to a String
    SimpleDateFormat fmt = new SimpleDateFormat("yyyyMMdd");
    Function<Date, String> strToDate1 = d -> fmt.format(d);
    Function<Date, String> strToDate2 = fmt::format; // method reference

    // Supplier.get() returns a new JPanel instance
    Supplier<JPanel> panelMaker1 = () -> new JPanel();
    Supplier<JPanel> panelMaker2 = JPanel::new; // method reference
    JPanel p = panelMaker2.get();  // assigns a new JPanel to p.
      

Default Methods

Up until now, once an interface was added to the JDK, it's specification was frozen. Adding new methods to an existing interface would break compatibility with older programs which implemented the interface. The old program would throw a nasty error if the new method was called, since that method was not implemented. This made it difficult to evolve APIs, and led to misplaced methods and "garbage" classes. Consider the Collections.sort(List list) method. It would have made more sense to add a sort method to the List interface, but that couldn't be done. To improve this situation, Java 8 introduces default methods. The primary motivation is to support API evolution, and more generally to allow building better libraries.

A default method is an interface method which includes an implementation. Default methods are used extensively in Java 8, both in new interfaces, and as additions to existing library interfaces. For example, here is the definition of Iterable in JDK8.

public interface Iterable<T> {
    Iterator<T> iterator();

    public default void forEach(Block<? super T> block) {
        for (T t : this) {
            block.accept(t);
        }
    }
}
      

The forEach method is a default method. Note the use of the new default keyword and the addition of a method implementation. Concrete implementations of Iterable do not need to provide an implementation for the forEach method. If none is provided, then the default implementation is used. If one is provided, then it overrides the default implementation. This means that existing programs will continue to compile and run.

While the primary motivation for adding default methods to Java is to make it easier to evolve APIs, there are other benefits as well, which contribute to the creation of better libraries. Consider the Iterator interface. If you've implemented an Iterator, you probably didn't support the remove method. But it still needed to be implemented, and usually it would just throw an exception:

    @Override
    public void remove() {
        throw new UnsupportedOperationException("remove");
    }
        

In Java 8 the Iterator class contains a default implementation of remove (it's the same as the above implementation), freeing applications from having to implement their own, except in the rare case when a non-default implementation is required.

Default methods are also used in the new functional interfaces. While functional interfaces must contain only one abstract method, they may contain any number of default methods. These come in handy. For example, the Predicate interface contains default methods and, negate, or, and xor. Here's the implementation of negate:

    public default Predicate<T> negate() {
        return (T t) -> !test(t);
    }
        

These default methods enable new predicates to be composed from existing ones.

        Predicate<String> p1 = s -> s.endsWith(".htm");
        Predicate<String> p2 = s -> s.endsWith(".html");
        Predicate<String> p3 = p1.or(p2);
        

Default methods are virtual methods and can be overridden. What about multiple inheritance and the "diamond problem"? First, Java has always has multiple inheritance of type, now it has multiple inheritance of behavior. Second, although interfaces can now include code (behavior), they cannot include state. This simplifies the rules for multiple inheritance when there are diamond inheritance hierarchies. The new rules for inheritance are straightforward and intuitive:

This example shows a case where the class beats the interface.

class A {
    public void m() {System.out.println("A's m()");}
}

interface B {
    default void m() {System.out.println("B's m()");}
}

class C extends A implements B {
    void x() {m();}  // prints "A's m()"
}
        

In this example class C must implement method m because otherwise the compiler could not choose between the default methods in interface A or B. Class C's implementation of m uses the new super syntax for choosing which default method to execute: Interface.super.method().

interface A {
    default void m() {System.out.println("A's m()");}
}

interface B {
    default void m() {System.out.println("B's m()");}
}

class C implements A, B {
    public void m() {B.super.m();}  // prints "B's m()"
}
        

Streams

Two of the primary goals for Java 8 are to modernize the Collections library and make parallelism easier. To see how these goals are being met, the place to start is to look at the new Stream interface. Note that the Stream and related interfaces are not yet finalized as of this writing, so the released version will likely be different than the one described here, which is based on the Jan 7, 2013 binary snapshot.

Some key characteristics of a Stream are:

The Stream interface contains two types of methods:

Stream operations are typically chained together in a pipeline of one or more intermediate operations followed by a single terminal operation. A typical pattern is generally some combination of filter, map, and reduce operations.

The main methods of the Stream interface are listed below.

method return type operation type category
filter(Predicate<? super T> predicate) Stream<T> intermediate/lazy filter
map(Function<? super T, ? extends R> mapper) Stream<R> intermediate/lazy map
map(IntFunction<? super T> mapper) IntStream intermediate/lazy map
mapMulti(MultiFunction<? super T, R> mapper) Stream<R> intermediate/lazy map
uniqueElements() Stream<T> intermediate/lazy filter
sorted(Comparator<? super T> comparator) Stream<T> intermediate/lazy filter
forEach(Block<? super T> block) void terminal/eager
tee(Block<? super T> block);Stream<T> Stream<T> intermediate/lazy
limit(long sizeLimit) Stream<T> intermediate/lazy filter
substream(long startIndex) Stream<T> intermediate/lazy filter
substream(long startIndex, long endIndex) Stream<T> intermediate/lazy filter
into(A target) A extends Destination<? super T>> A terminal/eager
toArray() Object[] terminal/eager
reduce(T zero, BinaryOperator<T> reducer) T terminal/eager reduce
reduce(BinaryOperator<T> reducer) Optional<T> terminal/eager reduce
reduce(U zero, BiFunction<U, T, U> accumulator, BinaryOperator<U> reducer) U terminal/eager reduce
accumulate(Accumulator<T, R> reducer) R terminal/eager reduce
accumulate(Supplier<R> resultFactory, BiBlock<R, T> accumulator, BiBlock<R, R> reducer) R terminal/eager reduce
accumulateConcurrent(ConcurrentTabulator<T, R> tabulator) R terminal/eager reduce
max(Comparator<? super T> comparator) Optional<T> terminal/eager reduce
min(Comparator<? super T> comparator) Optional<T> terminal/eager reduce
anyMatch(Predicate<? super T> predicate) boolean terminal/eager reduce
allMatch(Predicate<? super T> predicate) boolean terminal/eager reduce
noneMatch(Predicate<? super T> predicate) boolean terminal/eager reduce
findFirst() Optional<T> terminal/eager reduce
findAny() Optional<T> terminal/eager reduce
sequential() Stream<T> intermediate/lazy Stream conversion
parallel() Stream<T> intermediate/lazy Stream conversion
unordered() Stream<T> intermediate/lazy Stream conversion

Obtaining a Stream

For Collection classes, two new methods were added to the Collection interface for creating Streams which iterate over the Collection.

A Stream can also be created given an Iterator or Supplier function (for infinite Streams). To create a Stream that supports parallel operations, a Spliterator is required. Spliterator is a new interface which provides the ability to decompose (split) an aggregate data structure and to iterate over the elements of the aggregate. To perform parallel options, a Stream recursively splits the original aggregate into smaller pieces -- each with it's own Spliterator, until a threshold size is reached beyond which any further splitting would just generate additional cost. The resulting Spliterators can then be iterated over in parallel.

Regardless of whether the Stream is sequential or parallel, it can be used in basically the same way. All the details of iterating in parallel are hidden from the application. Here we will focus on sequential streams.

Let's look at how we can handle some common use cases using Streams.

Filter a collection based on some criteria.

    // Given a list of Strings, return a new list with null and empty strings removed
    List<String> strings = ...;
    List<String> filtered = strings.stream().filter(s -> s != null << s.length() > 0)
            .into(new ArrayList<String>());
        

A stream is obtained from the strings list using the stream() method. The filter method returns a new stream containing only those elements which satisfy some condition (a non-empty string in this example). The into() method is a terminal method which populates a destination collection with the elements of a stream. This shows the typical pattern for stream operations - a series of one or more intermediate operations (filter in this case) which return a new Stream, followed by a single terminal operation (into in this example), which returns something other than a stream (an ArrayList in this example).

To extend the above example, let's convert the strings to uppercase and sort them.

    List<String> filtered = strings.stream().filter(s -> s != null << s.length() > 0)
            .map(s -> s.toUpperCase()).sorted(String::compareTo).into(new ArrayList<String>());
        

The map method returns a new Stream with the original strings converted to upper case. The sorted method returns a new stream sorted using the given Comparator. Note the use of the method reference String::compareTo. The sorted call could also be written sorted((s1, s2) -> s1.compareTo(s2)).

The map method can map its input to a different type, as in the next example, which takes a list of albums and compiles a set of artists.

        List<Album> albums = ...;
        Set<String> artists = albums.stream().map(a -> a.getArtist()).into(new HashSet<String>());
		

It's easy to pull multiple values from each element of a Stream and "flatten" them into a new Stream. The next example finds all the tracks by a given composer given a Collection of Albums.

    Collection<Album> albums = ...;

    // Find all tracks by composer
    List<Track> tracks = albums.stream()
            .<Track>mapMulti((collector, album) -> collector.yield(album.getTracks()))
            .filter(track -> track.getComposer().equals("Billy Strayhorn"))
            .into(new ArrayList<Track>());
    // Note: the <Track> cast before the mapMulti call shouldn't really be needed, but the compiler
    // complains if it's not included.
        

The key here is the mapMulti method. As the name implies, it is similar to the map method but yields multiple output values for each input. The type of the lambda expression is MultiFunction<T,U>, which maps a single T to multiple U's. It's function is apply(MultiFunction.Collector<U> collector, T element), where element is the upstream element, and collector is an extra argument supplied by the Stream framework for collecting multiple values from the element. In this example, each Album has one or more tracks, which are yielded to the collector, resulting in a new flattened stream of tracks. This example works whether Album.getTracks() returns a Collection or an array. The yield method is overloaded for a Collection<U>, array of U, Stream<U>, and a single U element, so values can be yielded singly or in aggregates. The results are flattened so yielding an array containing [1, 2] is the same as calling yield[1], yield[2]. The ability to yield values one at a time is important because it avoids the need to create temporary Collections if the aggregate values are not already in a Collection.

The reduce methods perform a binary operation on the previous result and the next Stream element to produce a new result, then repeat this process until there are no more elements.

        int[] intAry = {3, 5, 9};
        // add up the elements in intAry
        int sum = Arrays.stream(intAry).reduce(0, Integer::sum);
        assertEquals(17, sum);

        String[] strings = {"a", "b", "c"};
        // same as Strings.join(",", strings)
        String result = Arrays.stream(strings).reduce("", (s1, s2) -> s1 + "," + s2);
        assertEquals("a,b,c", result);
        

The first parameter in the reduce method is the zero parameter. It is used as the first result, thus ensuring there is always a final result, even if the stream contains no elements.

To conclude, we show two examples of obtaining Streams from an Iterator and generator functions. The first example creates an Iterator to generate the Fibonacci numbers. The second example does the same thing using the new Supplier method, which is simpler to use when the sequence is infinite (we ignore integer overflow for these examples). Both examples use methods in the new java.util.stream.Streams class to obtain a Stream given some input source.

    @Test
    public void streamFromIterator() {
        Iterator<Integer> fibo = new Iterator<Integer>() {
            private int f1 = 0;
            private int f2 = 1;

            @Override
            public boolean hasNext() {
                return true; // infinite iterator
            }

            @Override
            public Integer next() {
                int f = f1;
                int nextf = f1 + f2;
                f1 = f2;
                f2 = nextf;
                return f;
            }
        };
        Stream<Integer> fiboStream = Streams.stream(Streams.spliteratorUnknownSize(fibo),
                StreamOpFlag.NOT_SIZED | StreamOpFlag.IS_ORDERED | StreamOpFlag.IS_SORTED);

        List<Integer> fibos10 = fiboStream.limit(10).into(new ArrayList<Integer>());

        assertArrayEquals(new Integer[]{0, 1, 1, 2, 3, 5, 8, 13, 21, 34}, fibos10.toArray(new Integer[10]));
    }
        

A simpler approach can be used for infinite streams. Instead of an Iterator, a Supplier interface is implemented. It has a single method, get(), which returns the next value in the sequence.

    @Test
    public void streamFromSupplier() {
        Supplier<Integer> fibo = new Supplier<Integer>() {
            private int f1 = 0;
            private int f2 = 1;

            @Override
            public Integer get() {
                int f = f1;
                int nextf = f1 + f2;
                f1 = f2;
                f2 = nextf;
                return f;
            }
        };

        Stream<Integer> fiboStream = Streams.generate(fibo);
        List<Integer> fibos10 = fiboStream.limit(10).into(new ArrayList<Integer>());

        assertArrayEquals(new Integer[]{0, 1, 1, 2, 3, 5, 8, 13, 21, 34}, fibos10.toArray(new Integer[10]));
    }
        

Summary

Project Lambda is a big win for Java developers. It will enable us to write better code -- shorter, more declarative, easier to understand and maintain. And the cost in terms of additional language complexity is kept to a minimum. If you want to try out the new lambda features, you can download the latest JDK8 binary snapshot from jdk8.java.net/lambda (see the references below). For IDE support, IntelliJ IDEA version 12 has full support for jdk8.

References


Valid XHTML 1.0 Strict [Valid RSS]
RSS
Top