In our last SETT article on Clojure we introduced the immutable style of programming favored by Clojure: Immutability in Clojure - Part 1, Programming Without Variables. We showed how recursion can be used to do things like looping over a collection that would ordinarily require setting mutable variables. Traditionally, the functional programming languages Scheme and LISP that Clojure inherits from relied heavily on recursion. However we also said that recursion is not used as quite as extensively in Clojure: instead Clojure provides 'lazy sequences' which can provide a more efficient alternative to recursion in many cases. This SETT will examine how to use lazy sequences in Clojure. We will also show how immutable collection data structures are efficiently shared via Clojure's persistent collections. First however we will introduce Clojure's normal (non-lazy) sequences.

In Clojure, every collection like data structure can be treated like a sequence:

- lists
- vectors
- sets
- maps

In addition many other things can behave like sequences:

- strings
- regular expression matches
- directories
- I/O streams
- XML documents
- etc....

A sequence supports three core functions:

- first
- gets the first item in a sequence
- rest
- gets everything after the first item in a sequence.
- cons
- constructs a new sequence

In the following code examples the result of an expression is shown with the comment character ';' followed by '->' and the result, like this:

(+ 2 2) ; -> 4

`(+ 2 2)`

is the actual Clojure expression, and `4`

is the result of the expression.

The following code creates a vector of the three integers 1, 2, 3, and then retrieves the first element ('1'):

(first [1 2 3]) ; -> 1

The `first`

function returns nil if passed an empty sequence or nil. The `rest`

function returns another sequence every element following the first element:

(rest [1 2 3]) ; -> (2 3)

`rest`

will return an empty sequence when passed a sequence
with less than two elements. The `cons`

function
*cons*tructs a new sequence from a single element and another
sequence:

(cons 4 [1 2 3]) ; -> (4 1 2 3)

Building on these three core three sequence functions, `first`

,
`rest`

and `cons`

, Clojure's standard library
provides functionality for filtering, querying and manipulating sequences
similar to what one finds in the Java collections API, Guava collections,
etc. The full functionality of Clojure's sequences is outside the scope
of this article, but let's take a peek at a few key functions to get the
flavor of things. Note that all of the examples that follow can be
applied to any sequence like data structure in Clojure, including I/O
streams, regular expression matches, XML documents, etc.

(filter even? [1 2 3 4]) ; -> (2 4)

The `filter`

function takes a test function, in this case the
`even?`

function, calls the function on each element in the
original sequence, and returns a new (lazy) sequence containing all the
elements in the original sequence for which the test function returned
true.

(every? even? [1 2 3 4]) ; -> false (every? even? [2 4 6 8]) ; -> true

The `every?`

function takes a test function, in this case
`even?`

, and returns true if the test function returns true for
every element in the sequence.

(map #(* 2 %) [1 2 3]) ; -> (2 4 6)

The `map`

function takes a transformation function, calls the
transformation function on each element on the sequence, and returns a new
sequence of transformed elements. In this case we are using Clojure's
abbreviated syntax for an anonymous function:

#(* 2 %0) ; -> creates an anonymous function that multiplies its argument by 2

The `#`

character marks the beginning of an anonymous function
definition, surrounded by parenthesis. The implicit single argument is
referred to by the `%`

symbol.

Clojure sequences such as lists, vectors, sets and maps are immutable. So how do we 'add' items to a collection, if we can't modify the collection? Instead of modifying the original collection, we create a new collection that contains all the elements of the original collection, plus the new element that we want to add. Here is an example using maps:

(def m1 {"a" 1 "b" 2}) ; -> create map m1 with two key/value pairs (def m2 (assoc m1 "c" 3)) ; -> 'add' new key/value pair to m1, assign to m2

First we create map `m1`

using the Clojure curly brace syntax
for map literals. The `m1`

has two key value pairs, "a" -> 1
and "b" -> 2. Next, we call the `assoc`

function which takes
three arguments: a key ("c"), a value (2), and an existing map
(`m1`

). The `assoc`

call returns a new map
containing all three key / value pairs.

A natural question arises here: isn't this horribly inefficient? Surely we
can't afford to make copies of entire collections every time we want to
'add' one element. The answer is that Clojure's collection use a
technique called 'structural sharing' to only compute the differences
between two different versions of a collection, and share the common bits
in memory. Version control systems like Subversion or GIT use similar
techniques. *Logically*, a version control system stores separate
copies of every version of a file. *Physically*, however, it only
stores the diffs, and computes the full version of a file on demand.
Clojure's 'persistent' collections work the same way. Here is another
example using lists. First let's create an initial list `l1`

with two elements:

(def l1 (list 1 2))

Next let's create new two lists, `l2`

and `l3`

from
contents of `l1`

and 'adding' one extra element:

(def l2 (cons 3 l1)) (def l3 (cons 4 l1))

Logically, we have three separate lists. In memory however, lists
`l2`

and `l3`

share the contents of the first list
`l1`

. We can demonstrate this as follows. First let's verify
that the contents of the 'rest' of `l2`

and `l3`

are
equivalent to the contents of `l1`

:

(= (rest l2) l1) ; -> returns true (= (rest l3) l1) ; -> returns true

That just proves that the `cons`

function worked correctly. But let's
see if the contents are *identical* and therefore shared in memory:

(identical? (rest l2) l1) ; -> returns true (identical? (rest l3) l1) ; -> returns true

The `identical?`

tests to see if two objects point to the same
region in memory. It is similar to the `==`

operator in Java.
As we can see, the contents of list `l1`

are shared between all
three lists, `l1`

, `l2`

and `l3`

.

With the introduction to basic sequences out of the way let's look at lazy
sequences. A lazy sequence in Clojure is a virtual sequence that behaves
just like a 'real' sequence except that its elements are computed on
demand, lazily. The `filter`

and `map`

functions
demonstrated above actually return lazy sequences.

Since each element is realized on demand, we can even create infinite lazy
sequences (as long as we don't try to realize every element in the infinite
sequence). An example of an infinite sequence would be the set of all whole
numbers. Let's create a lazy sequence that produces whole
numbers. To create a lazy sequence we use the `lazy-seq`

macro.
The `lazy-seq`

macro takes an expression that returns a sequence,
and only invokes that expression as it is needed. Our first version
creates a lazy sequence that increments from a starting number:

(defn whole-numbers-from [n] (cons n (lazy-seq (whole-numbers-from (inc n)))))

We use the `cons`

function to create a sequence whose first
element is `n`

, and the rest is the *lazy sequence*
returned by recursively calling `whole-numbers-from`

again,
incrementing the value of `n`

by one. The key is the call to
`lazy-seq`

. Without this the function would recur forever,
until the end of time or we run out of whole numbers, whichever comes
first. (Actually we would get a stack overflow error first). With
the `lazy-seq`

call, the information needed to generate
each subsequent element is remembered, or memoized, to be produced on
demand, as needed. We can retrieve a finite set of elements from our
logically infinite sequence using the `take`

function:

(take 3 (whole-numbers-from 0)) ; -> (0 1 2)

As each element is produced, `whole-numbers-from`

is called
again with the incremented value of `n`

, and the information
for the previous call to `whole-numbers-from`

is discarded and
eligible for garbage collection. Thus nothing is placed on the stack; the
stack does not nest more than one level and we do not get a stack
overflow, even for large values of n.

We can provide a simpler version that starts at 0 using the built in `iterate`

function:

(def whole-numbers (iterate inc 1))

The `iterate`

function takes a function and starting value as arguments, and produces
a lazy sequence of `n, (f n), (f (f n))`

etc. Note that we do not need to create
`whole-numbers`

as a function definition
using `defn`

. Instead we assigned the lazy sequence returned
by the iterate call directly to the symbol definition
`whole-numbers`

using `def`

. We can call our `whole-numbers`

definition like this:

(take 3 whole-numbers) ; -> (1 2 3)

We can call standard sequence functions
including `first`

, `filter`

,
`map`

, etc on our lazy sequence:

(first whole-numbers) ; -> 1 (take 3 (map #(* 2 %) whole-numbers)) ; -> (2 4 6)

The call to `take`

is very important - we can't transform the
entire set of whole numbers without running out of memory first!

To review what we have covered so far, let's look at four implementations for calculating the numbers of the Fibonacci sequence. The Fibonacci numbers are defined as follows: the first two Fibonacci numbers are 0 and 1, and each of the following numbers is the sum of the previous two. So the third Fibonacci number is 1, because 0 + 1 = 1. The fourth Fibonacci number is 2, because 1 + 1 = 2. The first ten Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34.

We will create four implementations for calculating the Fibonacci sequence: the first naive implementation will use normal, non-tail recursion. It should be easy enough to follow, but will not be efficient. We will refactor this implementation to use tail recursion. This version will be more efficient but harder to follow. Next we will implement the sequence using lazy sequences. Finally we will look at an elegant implementation that uses some of the built in functions in Clojure's standard sequence library.

Our first, naive implementation matches the problem definition well:

; naive implementation: (defn fib [n] (condp = n 0 0 1 1 (+ (fib (- n 1)) (fib (- n 2))))) ; call like this: (fib 9) ; -> 34

The `condp`

macro is a kind of switch statement. It takes
the name of a two argument test function, `=`

in this case,
a value to use as the first argument to the test function, and a set
of clauses where the first element of the clause is passed to the test
function and the second part of the clause is the code to execute if
the test evaluates to true.

So, our `fib`

function returns 0 if `n`

is 0, 1
if `n`

is 1, and otherwise returns the sum of the previous
two Fibonacci numbers. This fits our English language description of
the Fibonacci sequence above well: the first Fibonacci number (0) is
0, the second (1) is 1, and each following number is the sum of the
previous two. However, this implementation will not work for large
values of `n`

:

(fib 10000000) ; -> java.lang.StackOverflowError :(

Each call to `fib`

recursively places two more calls on the
stack. You will quickly overflow the JVM stack with this approach.

In the previous
article in this series we saw how to use the `recur`

function to replace simple recursion with tail call recursion. Tail
call recursion allows the compiler to unroll the recursion into a
non-recursive loop that is more efficient. In Clojure this is
achieved by replacing the recursive function call with
`recur`

. The call to `recur`

must be last
expression evaluated in the function (thus the term 'tail call').
However, in our `fib`

function the `+`

function is the last
expression. The two recursive calls to `fib`

happen before
the `+`

call. If we attempt to replace the two calls to
`fib`

with `recur`

our code will throw a
`java.lang.UnsupportedOperationException`

. We will have to
rearrange our function to allow for just one recursive call, and that
call will need to be the last expression evaluated. One approach is
to introduce two accumulator values to 'remember' the two previous
Fibonacci numbers on the stack:

(defn tail-fib ([n] (tail-fib n 0 1)) ([n prev2 prev1] (condp = n 0 prev2 1 prev1 2 (+ prev1 prev2) (recur (dec n) prev1 (+ prev1 prev2)))))

Our new function is overloaded based on the number of arguments based. If
passed a single argument `n`

, it calls itself with three
arguments: `(tail-fib n 0 1)`

. The the last two arguments,
`prev1`

and `prev2`

are the accumulator values that
'remember' the last two Fibonacci numbers calculated. Since the first two
numbers of the sequence are 0 and 1 we can hard code those two and then
start accumulating, with the call to `recur`

. The
`recur`

is the last expression of the function.. This
implementation works, even for large values of `n`

, but is
harder to read. It does not match the problem definition as well.

The phrase 'Fibonacci *sequence*' suggests an alternative
implementation: a lazy sequence that calculates each succeeding Fibonacci
number on demand. This approach turns out to be fairly straightforward
(once one is comfortable with lazy sequences), and efficient:

(def fib-seq ((fn fib [prev1 prev2] (lazy-seq (cons prev1 (fib prev2 (+ prev1 prev2))))) 0 1)) (take 10 fib-seq) ; -> (0 1 1 2 3 5 8 13 21 34) (nth fib-seq 9) ; -> 34

The definition of `fib-seq`

is similar to
`whole-numbers-from`

above. The `fn`

call
creates a little helper function, `fib`

, that takes the two
previously accumulated numbers in the sequence, `prev1`

and
`prev2`

. It returns a lazy sequence whose first element is
`prev1`

. The following elements are the result of the
`fib`

helper function recursively calling itself, passing
`prev2`

and the sum of the previous two numbers. After
creating the helper function `fib`

, our sequence definition
invokes `fib`

with the first two values of the sequence,
`0`

and `1`

. This is the reason for the double
level of parenthesis around the `fn`

call: first the
function `fib`

is defined, and then it is called with the
arguments `0`

and `1`

to see our sequence.
Since this recursive call is placed inside the call to
`lazy-seq`

, it is only invoked on demand, as needed, one
element at a time. The stack does not nest and we do not run out of
stack space, even for large values of `n`

.

The Clojure standard library comes with many useful functions that create
lazy sequences. We have already seen two of them, `iterate`

and `map`

. We can use these functions to create a simpler version
of the Fibonacci sequence:

(def fibs (map first (iterate (fn [[prev1 prev2]] [prev2 (+ prev1 prev2)]) [0 1]))) (take 10 fibs) ; -> (0 1 1 2 3 5 8 13 21 34) (nth fibs 9) ; -> 34

The call to `iterate`

loops over pairs of the two
previously accumulated Fibonacci numbers. Recall
that `iterate`

returns a lazy sequence. The Fibonacci
sequence itself is simply the first element of each pair, so the call
to `(map first...`

simply translates the nested sequence
of pairs into single elements, the elements of the Fibonacci sequence.
The `map`

function also returns a lazy sequence. This
version was suggested by Christophe Grand and described in the book Programming
Clojure by Stuart Halloway. This version reuses existing
functions, is the simplest version, and is efficient.

In this SETT we demonstrated how Clojure's persistent collection library uses structural sharing to allow for efficient use of immutable collections. We also showed how Clojure's lazy sequences can create efficient, virtual collections over large (even infinite) sequences. We implemented different methods of calculating the Fibonacci sequence to show how to replace simple recursion with more efficient tail recursion, how to replace tail (or simple) recursion with lazy sequences, and how the lazy sequence functions in the Clojure standard library often remove the need to implement your own lazy sequences.

- Clojure Home Page - http://clojure.org
- Programming Clojure - http://pragprog.com/titles/shcloj/programming-clojure
- InfoQ article on Clojure Collections - http://www.infoq.com/articles/in-depth-look-clojure-collections