likely initial reaction...

5.1.1

“You keep using that word. I do not think

it means what you think it means.”

Although storage to disk may be the more common meaning of persistent today, Clo-

jure uses an older meaning of the word having to do with immutable in-memory col-

lections with specific properties. In particular, a persistent collection in Clojure allows

you to preserve historical versions (Okasaki 1999) of its state, and promises that all

versions will have the same update and lookup complexity guarantees. The specific

guarantees depend on the collection type, and we’ll cover those details along with

each kind of collection.

Here you can see the difference between a persistent data structure and one that’s

not by using a Java array:

(def ds (into-array [:willie :barnabas :adam]))

(seq ds)

;=> (:willie :barnabas :adam)

What we’ve done is create a three-element array of keywords and used seq to produce

an object that displays nicely in the REPL. Any change to the array ds happens in-

place, thus obliterating any historical version:

(aset ds 1 :quentin)

;=> :quentin

(seq ds)

;=> (:willie :quentin :adam)

But using one of Clojure’s persistent data structures paints a different picture:

Download from Wow! eBook <www.wowebook.com>

78

CHAPTER 5 Composite data types

(def ds [:willie :barnabas :adam])

ds

;=> [:willie :barnabas :adam]

(def ds1 (replace {:barnabas :quentin} ds))

ds

;=> [:willie :barnabas :adam]

ds1

;=> [:willie :quentin :adam]

The original vector ds did not change on the replacement of the keyword :barnabas

but instead created another vector with the changed value. A natural concern when

confronted with this picture of persistence is that a naive implementation would copy

the whole collection on each change, leading to slow operations and poor use of

memory. Clojure’s implementations (Bagwell 2001) are instead efficient by sharing

structural elements from one version of a persistent structure to another. This may

seem magical, but we’ll demystify it in the next chapter. For now it’s sufficient to

understand that each instance of a collection is immutable and efficient. This fact

opens numerous possibilities that wouldn’t work for standard mutable collections.

One of these is the sequence abstraction.

5.1.2

Sequence terms and what they mean

It is better to have 100 functions operate on one data abstraction than 10

functions on 10 data structures.

—Rich Hickey

The words sequential, sequence, and seq don’t sound very different from each other, but

they mean specific things in Clojure. We’ll start with specific definitions of each term

to help you tell them apart, and then go into a bit of detail about how they relate to

equality partitions and the sequence abstraction.

TERMS

A sequential collection is one that holds a series of values without reordering them. As

such it’s one of three broad categories of collection types, which we discuss in the next

subsection.

A sequence is a sequential collection that represents a series of values that may or

may not exist yet. They may be values from a concrete collection or values that are

computed as necessary. A sequence may also be empty.

Clojure has a simple API called seq for navigating collections. It consist of two func-

tions: first and rest. If the collection has anything in it, (first coll) returns the

first element; otherwise it returns nil. (rest coll) returns a sequence of the items

other than the first. If there are no other items, rest returns an empty sequence and

never nil. Functions that promise to return sequences, such as map and filter, work

the same way as rest. A seq is any object that implements the seq API, thereby support-

ing the functions first and rest. You might consider it an immutable variant of an

enumerator or iterator.

Download from Wow! eBook <www.wowebook.com>

Persistence, sequences, and complexity

79

There’s also a function called seq that accepts a wide variety of collection-like

objects. Some collections, such as lists, implement the seq API directly, so calling seq

on them returns the collection itself. More often, calling seq on a collection returns a

new seq object for navigating that collection. In either case, if the collection is empty,

seq returns nil and never an empty sequence. Functions that promise to return seqs

(not sequences), such as next, work the same way.

Clojure’s sequence library manipulates collections, strings, arrays, and so on as if

they were sequences, using the seq function and seq API.

BEWARE TYPE-BASED PREDICATES Clojure includes a few predicates with

names like the words just defined. Though they’re not frequently used, it

seems worth mentioning that they may not mean exactly what the definitions

here might suggest. For example, every object for which sequential? returns

true is a sequential collection, but it returns false for some that are also

sequential. This is because of implementation details that may be improved

sometime after Clojure 1.2.

EQUALITY PARTITIONS

Clojure classifies each composite data type into one of three logical categories or par-

titions: sequentials, maps, and sets. These divisions draw clear distinctions between

the types and help define equality semantics. Specifically, two objects will never be

equal if they belong to different partitions. Few composite types are actually sequences,

though several such as vectors are sequential.

If two sequentials have the same values in the same order, = will return true for

them, even if their concrete types are different, as shown:

(= [1 2 3] '(1 2 3))

;=> true

Conversely, even if two collections have the same values in the same order, if one is a

sequential collection and the other isn’t, = will return false, as shown here:

(= [1 2 3] #{1 2 3})

;=> false

Examples of things that are sequential include Clojure lists and vectors, and Java lists

such as java.util.ArrayList. In fact everything that implements java.util.List is

included in the sequential partition.

Generally things that fall into the other partitions include set or map in their name

and so are easy to identify.

THE SEQUENCE ABSTRACTION

Many Lisps build their data types (McCarthy 1962) on the cons-cell abstraction, an

elegant two-element structure illustrated in figure 5.1.

Clojure also has a couple of cons-cell-like structures that are covered in section 5.4,

but they’re not central to Clojure’s design. Instead, the conceptual interface fulfilled

by the cons-cell has been lifted off the concrete structure illustrated previously and

Download from Wow! eBook <www.wowebook.com>

80

CHAPTER 5 Composite data types

Figure 5.1 Each cons-cell is a simple pair, a car and a cdr. A. A list with two cells,

each of which has a value X and Y as the head (the car in Lisp terminology) and a list

as the tail (the cdr). This is very similar to first and rest in Clojure sequences. 

B. A cons-cell with a simple value for both the head and tail. This is called a dotted

pair but is not supported by any of Clojure's built in types.

been named sequence. All an object needs to do to be a sequence is to support the two

core functions: first and rest. This isn’t much, but it’s all that's required for the

bulk of Clojure’s powerful library of sequence functions and macros to be able to

operate on the collection: filter, map, for, doseq, take, partition, the list goes on.

At the same time, a wide variety of objects satisfy this interface. Every Clojure col-

lection provides at least one kind of seq object for walking through its contents,

exposed via the seq function. Some collections provide more than one; for example

vectors support rseq and maps support the functions keys and vals. All of these func-

tions return a seq, or if the collection is empty, nil.

You can see examples of this by looking at the types of objects returned by various

expressions. Here’s the map class:

(class (hash-map :a 1))

;=> clojure.lang.PersistentHashMap

Unsurprisingly, the hash-map function returns an object of type PersistentHashMap.

Passing that map object to seq returns an entirely new kind of object:

(seq (hash-map :a 1))

;=> ([:a 1])

(class (seq (hash-map :a 1)))

;=> clojure.lang.PersistentHashMap$NodeSeq

This class name suggests it’s a seq of nodes on a hash map. Similarly we can get a seq

of keys on the same map:

(seq (keys (hash-map :a 1)))

;=> (:a)

(class (keys (hash-map :a 1)))

;=> clojure.lang.APersistentMap$KeySeq

Note that these specific class names are an implementation detail that may change in

the future, but the concepts they embody are central to Clojure and unlikely to

change.

Having laid the foundation for a deeper dive into the sequence abstraction, we

now must quickly diverge into a simplified discussion of asymptotic complexity and

Big-O notation. If you’re already comfortable with these topics then by all means skip

forward to section 5.2. If you need a refresher or an overview, then the next section is

a minimalist introduction (Cormen 2009) to the topic.

Download from Wow! eBook <www.wowebook.com>

Persistence, sequences, and complexity

81

5.1.3

Big-O

This book isn’t heavily focused on asymptotic complexity but we do mention it a hand-

ful of times throughout, so here we’ll cover the minimum required for understanding

these few mentions. You may have gone your entire career without having to under-

stand Big-O notation, and you may likely go the remainder similarly. But that’s no rea-

son not to learn more, and a bit of understanding about Big-O and its implications

will go a long way toward helping you in choosing between Clojure collections, as well

as to design and analyze algorithms in general.

Algorithmic complexity is a system for describing the relative space and time costs

for algorithms. Typically the complexity of an algorithm is described using what’s

known as Big-O notation. For example, you may have heard that finding an element

in a linked list is O(n), which is read as “order n.” What this means is that if you have a

list (:a :b :c) of length 3, then to verify that the keyword :c is in that list requires

three comparisons. This highlights the worst case of list access because :c is at the end

of the list, but we don’t worry too much about the worst-case scenario unless that’s the

only difference between two algorithms. On the other hand, to verify that :a is in the

same list is O(1), which is read as constant time. Finding :a represents the best case for

list access because it’s at the front of the list. Rarely do your lists always look exactly

like our example, and therefore you shouldn’t build your hopes that elements will

always be at the front. Therefore, in analyzing algorithms you rarely care about the

best-case scenario because it’s too rare to matter much. What you really care about

when analyzing algorithms is the expected case, or what you’d likely see in practice.

When looking at a few million runs of verifying that some value is contained in a mil-

lion different lists, you’d inevitably see that the average number of comparisons

required approaches whatever the length of a list was, divided by two. But because

doubling the length of the list would also double the number of comparisons done in

both the expected and worst case, they’re all grouped into the same Big-O category:

O(n) also known as linear time.

Thus two algorithms that are in the same Big-O category may perform very differ-

ently, especially on small work loads. This makes the most difference when there’s a

large constant factor, work that the algorithm has to do up front regardless of the size of

the work load.

When the work load is small, an O(1) algorithm with a large constant factor may

be more costly than an O(n) algorithm that’s without extra costs. But as the work load

increases, an O(1) algorithm will always overtake the O(n) algorithm as shown in fig-

ure 5.2. Big-O doesn’t concern itself with these constant factors or small work loads.

When learning about Clojure’s persistent data structures, you’re likely to hear the

term O(log32 n) for those based on the persistent hash trie and O(log2 n) for the

sorted structures. Accessing an element in a Clojure persistent structure by index is

O(log n), or logarithmic. Logarithmic complexity describes a class of algorithms that

are effectively immune from large changes in the size of their data. In the case of Clo-

jure’s persistent structures, what this means is that there’s little difference in “hops”

(such as comparisons) between locating an element in a structure containing 100

Download from Wow! eBook <www.wowebook.com>

82

CHAPTER 5 Composite data types

Figure 5.2 Overtaking the smaller. In Big-O, regardless of the other ancillary costs, the higher order

of magnitude will always overtake the lower eventually.

elements or 1 million elements. In practice you may notice some difference because

for a billion objects O(log2 n) would require approximately 30 comparisons for a

lookup, whereas O(log32 n) would require only about 6. Given the smaller number of

operations required for the O(log32 n) data structures, they can be viewed as provid-

ing a nearly O(1) lookup and update.

We’ve covered the basic ideas behind persistence and the sequence abstraction,

and even touched on the basics of Big-O notation. Now we’ll discuss all of Clojure’s

primary collection types and how these concepts apply to each, starting with vectors.

5.2

Vectors: creating and using them in all their varieties

Vectors store zero or more values sequentially indexed by number, a bit like arrays,

but are immutable and persistent. They’re versatile and make efficient use of memory

and processor resources at both small and large sizes.

Vectors are probably the most frequently used collection type in Clojure code.

They’re used as literals for argument lists and let bindings, for holding large amounts

of application data, and as stacks and as map entries. We’ll also address the efficiency

considerations including growing on the right end, subvectors, and reversals, and

finally discuss where vectors aren’t an optimal solution.

5.2.1

Building vectors

The vector’s literal square-bracket syntax is one reason you might choose to use a vec-

tor over a list. For example, the let form would work perfectly well, and with a nearly

identical implementation, if it took a literal list of bindings instead of a literal vector.

But the square brackets are visually different from the round parentheses surround-

ing the let form itself as well as the likely function calls in the body of the let form,

and this is useful for humans (so we hear). Using vectors to indicate bindings for let,

with-open, fn, and such is idiomatic in Clojure and is a pattern you’re encouraged to

follow in any similar macros you write.

Download from Wow! eBook <www.wowebook.com>

Vectors: creating and using them in all their varieties

83

The most common way to create a vector is with the literal syntax described earlier.

But in many cases you’ll want to create a vector out of the contents of some other kind

of collection. For this there’s the function vec:

(vec (range 10))

;=> [0 1 2 3 4 5 6 7 8 9]

If you already have a vector but want to “pour” several values into it, then into is your

friend:

(let [my-vector [:a :b :c]]

(into my-vector (range 10)))

;=> [:a :b :c 0 1 2 3 4 5 6 7 8 9]

If you want it to return a vector, the first argument to into must be a vector. The sec-

ond arg can be any sequence, such as what range returns, or anything else that works

with seq function. You can view the operation of into as similar to a O(n) concatena-

tion based on the size of the second argument.1 Clojure also provides a vector func-

tion to build a vector from its arguments, which is handy for constructs like (map

vector a b).

PRIMITIVE VECTORS

Clojure can store primitive types inside of vectors using the vector-of function,

which takes any of :int, :long, :float, :double, :byte, :short, :boolean, or :char

as its argument and returns an empty vector. This returned vector will act just like any

other vector, except that it’ll store its contents as primitives internally. All of the nor-

mal vector operations still apply, and the new vector will attempt to coerce any addi-

tions into its internal type when being added:

(into (vector-of :int) [Math/PI 2 1.3])

;=> [3 2 1]

(into (vector-of :char) [100 101 102])

;=> [\d \e \f]

(into (vector-of :int) [1 2 623876371267813267326786327863])

; java.lang.IllegalArgumentException: Value out of range for int:

-8359803716404783817

In addition, all caveats mentioned in section 4.1 regarding overflow, underflow, and

so forth also apply to vectors of primitives.

Using vec and into, it’s easy to build vectors much larger than are conveniently

built using vector literals. But once you have a large vector like that, what are you

going to do with it?

5.2.2

Large vectors

When collections are small, the performance differences between vectors and lists

hardly matters at all. But as both get larger, each becomes dramatically slower at oper-

ations the other can still perform efficiently. Vectors are particularly efficient at three

things relative to lists: adding or removing things from the right end of the collection,

1 Vectors can’t be concatenated any more efficiently than O(n).

Download from Wow! eBook <www.wowebook.com>

84

CHAPTER 5 Composite data types

accessing or changing items in the interior of the collection by numeric index, and

walking in reverse order. Adding and removing from the end is done by treating the

vector as a stack—we’ll cover that later.

Any item in a vector can be accessed by its index number from 0 up to but not

including (count my-vector) in essentially constant time.2 You can do this using the

function nth; the function get, essentially treating the vector like a map; or by invoking

the vector itself as a function. Look at each of these as applied to this example vector:

(def a-to-j (vec (map char (range 65 75))))

a-to-j

;=> [\A \B \C \D \E \F \G \H \I \J]

All three of these do the same work and each returns \E:

(nth a-to-j 4)

(get a-to-j 4)

(a-to-j 4)

Which to use is a judgment call, but table 5.1 highlights some points you might con-

sider when choosing.

Table 5.1 Vector lookup options: the three ways to look up an item in a vector and how each responds

to different exceptional circumstances

nth

get

Vector as a function

If the vector is nil

Returns nil

Returns nil

Throws an exception

If the index is out 

Returns “not found” or

Returns nil

Throws an exception

of range

throws exception

Supports a 

Yes

Yes

No

“not found” arg

(nth [] 9 :whoops)

(get [] 9 :whoops)

Because vectors are indexed, they can be efficiently walked in either direction, left-to-

right or right-to-left. The seq and rseq functions return sequences that do exactly that:

(seq a-to-j)

;=> (\A \B \C \D \E \F \G \H \I \J)

(rseq a-to-j)

;=> (\J \I \H \G \F \E \D \C \B \A)

Any item in a vector can be “changed” using the assoc function. Clojure does this in

essentially constant time using structural sharing between the old and new vectors as

described at the beginning of this chapter:

(assoc a-to-j 4 "no longer E")

;=> [\A \B \C \D "no longer E" \F \G \H \I \J]

The assoc function for vectors only works on indices that already exist in the vector,

or as a special case, exactly one step past the end. In this case, the returned vector will

2 Several operations on Clojure’s persistent data structures are described in this book as “essentially constant

time.” In all cases these are O(log32 n).

Download from Wow! eBook <www.wowebook.com>

Vectors: creating and using them in all their varieties

85

be one item larger than the input vector. More frequently vectors are “grown” using

the conj function as you’ll see in the next section.

There are a few higher-powered functions provided that use assoc internally. For

example, the replace function works on both seqs and vectors, but when given a vec-

tor, it uses assoc to fix up and return a new vector:

(replace {2 :a, 4 :b} [1 2 3 2 3 4])

;=> [1 :a 3 :a 3 :b]

The functions assoc-in and update-in are for working with nested structures of vec-

tors and/or maps, like this one:3

(def matrix

[[1 2 3]

[4 5 6]

[7 8 9]])

All of assoc-in, get-in, and update-in take a series of indices to pick items from

each more deeply nested level. For a vector arranged like the earlier matrix example,

this amounts to row and column coordinates:

(get-in matrix [1 2])

;=> 6

(assoc-in matrix [1 2] 'x)

;=> [[1 2 3] [4 5 x] [7 8 9]]

The update-in function works the same way, but instead of taking a value to overwrite

an existing value, it takes a function to apply to an existing value. It’ll replace the value

at the given coordinates with the return value of the function given:

(update-in matrix [1 2] * 100)

;=> [[1 2 3] [4 5 600] [7 8 9]]

The coordinates refer to the value 6, and the function given here is * taking an argu-

ment 100, so the slot becomes the return value of (* 6 100). There’s also a function

get-in for retrieving a value in a nested vector. Before exploring its operation, we’ll

create a function neighbors in listing 5.1 that given a y-x location in an equilateral 2D

matrix, returns a sequence of the locations surrounding it.

Listing 5.1 A function for finding the neighbors of a spot on a 2D matrix

(defn neighbors

([size yx] (neighbors [[-1 0] [1 0] [0 -1] [0 1]] size yx))

([deltas size yx]

(filter (fn [new-yx]

(every? #(< -1 % size) new-yx))

(map #(map + yx %) deltas))))

3

Nested vectors are far from the most efficient way to store or process matrices, but they’re convenient to

manipulate in Clojure and so make a good example here. More efficient options include a single vector,

arrays, or a library for matrix processing such as Colt or Incanter at http://incanter.org.

Download from Wow! eBook <www.wowebook.com>

86

CHAPTER 5 Composite data types

The operation of neighbors is fairly straightforward. The deltas local describes that a

neighbor can be one spot away, but only along the x or y axis (not diagonal). The

function first walks through deltas and builds a vector of each added to the yx point

provided. This operation will of course generate illegal point coordinates, so those are

then removed using filter, which checks to ensure that the indices lie between -1

and the provided size. You can test this function using get-in as follows:

(map #(get-in matrix %) (neighbors 3 [0 0]))

;=> (4 2)

For each neighbor coordinate returned from neighbors, we use get-in to retrieve

the value at that point. Indeed the position [0 0] corresponding to the value 1 has

the neighboring values 4 and 2. We’ll use neighbors again before this book comes to

an end, but next we’ll look at growing and shrinking vectors—treating them like

stacks.

5.2.3

Vectors as stacks

Classic stacks have at least two operations, push and pop, and with respect to Clojure

vectors these operations are called conj and pop respectively. The conj function adds

elements to and pop removes elements from the right side of the stack. Because vec-

tors are immutable, pop returns a new vector with the rightmost item dropped—this is

different from many mutable stack APIs, which generally return the dropped item.

Consequently, peek becomes more important as the primary way to get an item from

the top of the stack:

(def my-stack [1 2 3])

(peek my-stack)

;=> 3

(pop my-stack)

;=> [1 2]

(conj my-stack 4)

;=> [1 2 3 4]

(+ (peek my-stack) (peek (pop my-stack)))

;=> 5

Each of these operations completes in essentially constant time. Most of the time, a

vector that’s used as a stack is used that way throughout its life. It’s helpful to future

readers of your code to keep this is mind and use the stack operations consistently,

even when other functions might work. For example, last on a vector returns the

same thing as peek, but besides being slower, it leads to unnecessary confusion about

how the collection is being used. If the algorithm involved calls for a stack, use conj

not assoc for growing the vector, peek not last, and pop not dissoc for shrinking it.

The functions conj, pop, and peek work on any object that implements clojure.

lang.IPersistentStack.4 Besides vectors, Clojure lists also implement this interface,

4 The conj function also works with all of Clojure’s other persistent collection types, even if they don’t imple-

ment clojure.lang.IPersistentStack.

Download from Wow! eBook <www.wowebook.com>

Vectors: creating and using them in all their varieties

87

but the functions operate on the left side of lists instead of the right side as with vec-

tors. When operating on either via the stack discipline, it’s best to ignore the ordering,

because it tends to just add confusion.

5.2.4

Using vectors instead of reverse

The ability of vectors to grow efficiently on the right side and then be walked left-to-

right produces a noteworthy emergent behavior: idiomatic Clojure code rarely uses

the reverse function. This is different from most Lisps and schemes. When process-

ing a list, it’s pretty common to want to produce a new list in the same order. But if all

you have are classic Lisp lists, often the most natural algorithm5 leaves you with a back-

ward list that needs to be reversed. Here’s an example of a function similar to Clo-

jure’s map

(defn strict-map1 [f coll]

(loop [coll coll, acc nil]

(if (empty? coll)

(reverse acc)

(recur (next coll) (cons (f (first coll)) acc)))))

(strict-map1 - (range 5))

;=> (0 -1 -2 -3 -4)

This is perfectly good, idiomatic Clojure code, except for that glaring reverse of the

final return value. After the entire list has been walked once to produce the desired

values, reverse walks it again to get them in the right order. This is both inefficient

and nonidiomatic. One way to get rid of the reverse is to use a vector instead of a list

as the accumulator:

(defn strict-map2 [f coll]

(loop [coll coll, acc []]

(if (empty? coll)

acc

(recur (next coll) (conj acc (f (first coll)))))))

(strict-map2 - (range 5))

;=> [0 -1 -2 -3 -4]

A small change, but the code is now a touch cleaner and a bit faster. It does return a

vector instead of a list, but this is rarely a problem, because any client code that wants

to treat this as a seq can usually do so automatically.6

The examples we’ve shown so far have all been plain vectors, but we’ll turn now to

the special features of some other vector types, starting with subvectors.

5

...the most natural tail-recursive algorithm anyway.

6

Another way to get rid of a reverse is to build a lazy sequence instead of a strict collection; this is how Clo-

jure’s own map function is implemented.

Download from Wow! eBook <www.wowebook.com>

88

CHAPTER 5 Composite data types

5.2.5

Subvectors

Although items can’t be removed efficiently from a vector (except the rightmost

item), subvectors provide a fast way to take a slice of an existing vector based on start

and end indices created using the subvec function:

(subvec a-to-j 3 6)

;=> [\D \E \F]

The first index given to subvec is inclusive (starts at index 3) but the second is exclu-

sive (ends before index 6). The new subvector internally hangs onto the entire original

a-to-j vector, making each lookup performed on the new vector cause the subvector

to do a little offset math and then look it up in the original. This makes creating a sub-

vector fast. You can use subvec on any kind of vector and it’ll work fine. But there’s

special logic for taking a subvec of a subvec, in which case the newest subvector keeps

a reference to the original vector, not the intermediate subvector. This prevents

subvectors-of-subvectors from stacking up needlessly, and keeps both the creation and

use of the sub-subvecs fast and efficient.

5.2.6

Vectors as MapEntries

Clojure’s hash map, just like hash tables or dictionaries in many other languages, has a

mechanism to iterate through the entire collection. Clojure’s solution for this iterator

is, unsurprisingly, a seq. Each item of this seq needs to include both the key and the

value, so they’re wrapped in a MapEntry. When printed, each entry looks like a vector:

(first {:width 10, :height 20, :depth 15})

;=> [:width 10]

But not only does a MapEntry look like a vector, it really is one:

(vector? (first {:width 10, :height 20, :depth 15}))

;=> true

This means you can use all the regular vector functions on it: conj, get, and so on. It

even supports destructuring, which can be handy. For example, the following locals

dimension and amount will take on the value of each key/value pair in turn:

(doseq [[dimension amount] {:width 10, :height 20, :depth 15}]

(println (str (name dimension) ":") amount "inches"))

; width: 10 inches

; height: 20 inches

; depth: 15 inches

;=> nil

A MapEntry is its own type and has two functions for retrieving its contents: key and

val, which do exactly the same thing as (nth my-map 0) and (nth my-map 1), respec-

tively. These are sometimes useful for the clarity they can bring to your code, but fre-

quently destructuring is used instead, because it’s so darned handy.

Download from Wow! eBook <www.wowebook.com>

Vectors: creating and using them in all their varieties

89

So now you know what vectors are, what specific kinds of vectors are included in Clo-

jure, and some of the things that they’re good at doing. To round out your understand-

ing of vectors, we’ll conclude with a brief look at things that vectors are bad at doing.

5.2.7

What vectors aren’t

Vectors are versatile, but there are some commonly desired patterns where they might

seem like a good solution but in fact aren’t. Though we prefer to focus on the positive,

we hope a few negative examples will help you escape from using the wrong tool for

the job.

VECTORS AREN’T SPARSE

If you have a vector of length n, the only position where you can insert a value is at

index n —appending to the far right end. You can’t skip some indices and insert at a

higher index number. If you want a collection indexed by nonsequential numbers,

consider a hash map or sorted map. Although you can replace values within a vector,

you can’t insert or delete items such that indices for the subsequent items would have

to be adjusted. Clojure doesn’t currently have a native persistent collection that sup-

ports this kind of operation, but a possible future addition, finger trees, may help for

these use cases.

VECTORS AREN’T QUEUES

Some people have tried to use vectors as queues. One approach would be to push

onto the right end of the vector using conj and then to pop items off the left using

rest or next. The problem with this is that rest and next return seqs, not vectors, so

subsequent conj operations wouldn’t behave as desired. Using into to convert the seq

back into a vector is O(n), which is less than ideal for every pop.

Another approach is to use subvec as a “pop,” leaving off the leftmost item.

Because subvec does return a vector, subsequent conj operations will push onto the

right end as desired. But as described earlier, subvec maintains a reference to the

entire underlying vector, so none of the items being popped this way will ever be gar-

bage collected. Also less than ideal.

So what would be the ideal way to do queue operations on a persistent collection?

Why, use a PersistentQueue, of course. See section 5.5 for details.

VECTORS AREN’T SETS

If you want to find out whether a vector contains a particular value, you might be

tempted to use the contains? function, but you’d be disappointed by the results. Clo-

jure’s contains? is for asking whether a particular key, not value, is in a collection,

which is rarely useful for a vector.

In this section we showed how to create vectors using literal syntax or by building

them up programmatically. We looked at how to push them, pop them, and slice

them. We also looked at some of the things vectors can’t do well. One of these was

adding and removing items from the left side; though vectors can’t do this, lists can,

which we’ll discuss next.

Download from Wow! eBook <www.wowebook.com>

90

CHAPTER 5 Composite data types

5.3

Lists: Clojure’s code form data structure

Clojure’s PersistentLists are by far the simplest of Clojure’s persistent collection types.

A PersistentList is a singly linked list where each node knows its distance from the end.

List elements can only be found by starting with the first element and walking each

prior node in order, and can only be added or removed from the left end.

In idiomatic Clojure code, lists are used almost exclusively to represent code

forms. They’re used literally in code to call functions, macros, and so forth as we’ll dis-

cuss shortly. Code forms are also built programmatically to then be evaled or used as

the return value for a macro. If the final usage of a collection isn’t as Clojure code,

lists rarely offer any value over vectors and are thus rarely used. But lists have rich her-

itage in Lisps so we’ll discuss when they should be used in Clojure, and also when they

shouldn’t—situations in which there are now better options.

5.3.1

Lists like Lisps like

All flavors of Lisp have lists that they like to use, and Clojure lists, already introduced

in chapter 2, are similar enough to be familiar. The functions have different names,

but what other Lisps call car is the same as first on a Clojure list. Similarly cdr is the

same as next. But there are substantial differences as well. Perhaps the most surpris-

ing is the behavior of cons. Both cons and conj add something to the front of a list,

but their arguments in a different order from each other:

(cons 1 '(2 3))

;=> (1 2 3)

(conj '(2 3) 1)

;=> (1 2 3)

In a departure from classic Lisps, the “right” way to add to the front of a list is with

conj. For each concrete type, conj will add elements in the most efficient way, and for

lists this means at the left side. Additionally, a list built using conj is homogeneous—

all the objects on its next chain are guaranteed to be lists, whereas sequences built

with cons only promise that the result will be some kind of seq. So you can use cons to

add to the front of a lazy seq, a range, or any other type of seq, but the only way to get

a bigger list is to use conj.7 Either way, the next part has to be some kind of sequence,

which points out another difference from other Lisps: Clojure has no “dotted pair.” If

you don’t know what that is, don’t worry about it. All you need to know is that if you

want a simple pair in a Clojure program, use a vector of two items.

All seqs print with rounded parentheses, but this does not mean they’re the same

type or will behave the same way. For example many of these seq types don’t know their

own size the way lists do, so calling count on them may be O(n) instead of O(1).8 An

7

Or to conj or cons onto nil. This is a special case, because nil isn’t the same as an empty collection of any

specific type. Clojure could have just left this unsupported, perhaps throwing an exception if you did (cons 1

nil), but instead it provides a reasonable default behavior: building a list one item long.

8

You can test for this property of being countable in constant time using the counted? function. For example

(counted? (range 10)) returns true in Clojure 1.0, but false in 1.1 because the implementation of

range changed between those versions and no longer provided O(1) counting.

Download from Wow! eBook <www.wowebook.com>

How to use persistent queues

91

unsurprising difference between lists in Clojure versus other Lisps is that they’re

immutable. At least that had better not be surprising anymore. Changing values within

a list is generally discouraged in other Lisps anyway, but in Clojure it’s impossible.

5.3.2

Lists as stacks

Lists in all Lisps can be used as stacks, but Clojure goes further by supporting the

IPersistentStack interface. This means you can use the functions peek and pop to do

roughly the same thing as first and next. Two details are worth noting. One is that

next and rest are legal on an empty list, but pop throws an exception. The other is that

next on a one-item list returns nil, whereas rest and pop both return an empty list.

When you want a stack, the choice between using a list versus a vector is a some-

what subtle decision. Their memory organization is quite different, so it may be worth

testing your usage to see which performs better. Also, the order of values returned by

seq on a list is backward compared to seq on a vector, and in rare cases this can point

to one or the other as the best solution. In the end, it may come down primarily to

personal taste.

5.3.3

What lists aren’t

Probably the most common misuse of lists is to hold items that will be looked up by

index. Though you can use nth to get the 42nd (or any other) item from a list, Clo-

jure will have to walk the list from the beginning to find it. Don’t do that. In fact, this

is a practical reason why lists can’t be used as functions, as in ((list :a) 0). Vectors

are good at looking things up by index, so use one of those instead.

Lists are also not sets. All the reasons we gave in the previous section for why it’s a

bad idea to frequently search a vector looking for a particular value apply to lists as

well. Even moreso since contains? will always return false for a list. See the section on

sets later in this chapter instead.

Finally, lists aren’t queues. You can add items to one end of a list, but you can’t

remove things from the other end. So what should you use when you need a queue?

Funny you should ask...

5.4

How to use persistent queues

We mentioned in section 5.2 that new Clojure developers often attempt to implement

simple queues using vectors. Though this is possible, such an implementation leaves

much to be desired. Instead, Clojure provides a persistent immutable queue that will

serve all your queueing needs. In this section we’ll touch on the usage of the

PersistentQueue class, where its first-in-first-out (FIFO) queueing discipline (Knuth

1997) is described by conj adding to the rear, pop removing from the front, and peek

returning the front element without removal.

Before going further, it’s important to point out that Clojure’s PersistentQueue is

a collection, not a workflow mechanism. Java has classes deriving from the

java.util.concurrent.BlockingQueue interface for workflow, which often are use-

ful in Clojure programs, and those aren’t these. If you find yourself wanting to

Download from Wow! eBook <www.wowebook.com>

92

CHAPTER 5 Composite data types

repeatedly check a work queue to see if there’s an item of work to be popped off, or if

you want to use a queue to send a task to another thread, you do not want the

PersistentQueue discussed in this section.

5.4.1

A queue about nothing

Search all you like, but the current implementation of Clojure doesn’t provide9 a core

construction function for creating persistent queues. That being the case, how would

you go about creating a queue? The answer is that there’s a readily available empty

queue instance to use, clojure.lang.PersistentQueue/EMPTY. The printed represen-

tation for Clojure’s queues isn’t incredibly informative, but you can change that by

providing a method for them on the print-method multimethod, as shown:

(defmethod print-method clojure.lang.PersistentQueue

[q, w]

(print-method '<- w) (print-method (seq q) w) (print-method '-< w))

clojure.lang.PersistentQueue/EMPTY

;=> <-nil-<

Using print-method in this way is a convenient mechanism for printing types in logi-

cal ways, as we did earlier with the queue-fish that’s not only fun, but indicates an

direction of flow for conj and pop.

You might think that popping an empty queue would raise an exception, but the

fact is that this action results in just another empty queue. Likewise, peeking an empty

queue will return nil. Not breathtaking for sure, but this behavior helps to ensure

that queues work in place of other sequences. In fact, the functions first, rest, and

next also work on queues and give the results that you might expect, though rest and

next return seqs not queues. Therefore, if you’re using a queue as a queue, it’s best to

use the functions designed for this purpose: peek, pop, and conj.

5.4.2

Putting things on

The mechanism for adding elements to a queue is conj:

(def schedule

(conj clojure.lang.PersistentQueue/EMPTY

:wake-up :shower :brush-teeth))

;=> <-(:wake-up :shower :brush-teeth)-<

Clojure’s persistent queue is implemented internally using two separate collections,

the front being a seq and the rear being a vector, as shown in figure 5.3.

All insertions occur in the rear vector and all removals occur in the front seq, tak-

ing advantage of each collection’s strength. When all the items from the front list have

9

The Clojure core language grows carefully, tending to incorporate only features that have proven useful.

Queues currently stand at the edge of this growth, meaning that there might be more support for them in the

future. Unlike the other collections in this chapter, the code you write with queues might be rendered non-

idiomatic by future improvements.

Download from Wow! eBook <www.wowebook.com>

How to use persistent queues

93

Figure 5.3 The two

collections used internally

in a single queue. peek

returns the front item of

the seq, pop returns a new

queue with the front of the

seq left off, and conj adds

a new item to the back of

the vector.

been popped, the back vector is wrapped in a seq to become the new front, and an

empty vector is used as the new back. Typically, an immutable queue such as this is

implemented with the rear as a list in reverse order, because insertion to the front of a

list is an efficient operation. But using a Clojure vector eliminates the need for a

reversed list.

5.4.3

Getting things

Clojure provides the peek function to get the front element in a queue:

(peek schedule)

;=> :wake-up

The fact that performing peek doesn’t modify the contents of a persistent queue

should be no surprise by now.

5.4.4

Taking things off

To “remove” elements from the front of a queue, use the pop function and not rest:

(pop schedule)

;=> <-(:shower :brush-teeth)-<

(rest schedule)

;=> (:shower :brush-teeth)

Although rest returns something with the same values and even prints the same as

what pop returns, the former is a seq not a queue. This is potentially the source of sub-

tle bugs, because subsequent attempts to use conj on it won’t preserve the speed guar-

antees of the queue type and the queue functions pop peek and conj won’t behave as

expected.

We’ve talked numerous times in this chapter about the sequence abstraction, and

though it’s an important consideration, it shouldn’t always be used. Instead, it’s

important to know your data structures, their sweet spots, and idiomatic operations.

By doing so, you can write code that’s specialized in ways that leverage the perfor-

mance characteristics you need for a given problem space. Clojure’s persistent queues

illustrate this fact perfectly. To further highlight this point, we’ll now explore Clojure’s

set type.

Download from Wow! eBook <www.wowebook.com>

94

CHAPTER 5 Composite data types

5.5

Persistent sets

Clojure sets work the same as mathematical sets, in that they’re collections of

unsorted unique elements. In this section we’ll cover sets by explaining their strong

points, weaknesses, and idioms. We’ll also cover some of the functions from the

clojure.set namespace.

5.5.1

Basic properties of Clojure sets

Sets are functions of their elements that

return the matched element or nil:

Finding items in a sequence 

using a set and some

(#{:a :b :c :d} :c)

This property of sets combines

;=> :c

with the some function to provide

(#{:a :b :c :d} :e)

an extremely useful idiom for

;=> nil

searching a seq for any of multiple

items. The

Set elements can be accessed via the

some function takes a

get func-

predicate and a sequence. It

tion, which will return the queried value if it

applies said predicate to each ele-

exists in the given set:

ment in turn, returning the first

truthy value returned by the predi-

(get #{:a 1 :b 2} :b)

cate or else nil:

;=> :b

(some #{:b} [:a 1 :b 2])

(get #{:a 1 :b 2} :nothing-doing)

;=> :b

;=> nil

(some #{1 :b} [:a 1 :b 2])

As a final point, sets, like all of Clojure’s col-

;=> 1

lections, support heterogeneous values.

Using a set as the predicate sup-

HOW CLOJURE POPULATES SETS

plied to some allows you to check

The key to understanding how Clojure sets

whether any of the truthy values in

the set are contained within the

determine which elements are discrete lies in

given sequence. This is a fre-

one simple statement. Given two elements

quently used Clojure idiom for

evaluating as equal, a set will contain only

searching for containment within a

one, independent of concrete types:

sequence.

#{[] ()}

;=> #{[]}

#{[1 2] (1 2)}

;=> #{[1 2]}

#{[] () #{} {}}

;=> #{#{} {} []}

From the first two examples, even though [] and () are of differing types, they’re con-

sidered equal because their elements are equal or in this case empty. But the last

example illustrates nicely that collections within an equality partition will always be

equal if their elements are equal, but never across partitions.

5.5.2

Keeping your sets in order with sorted-set

There’s not much to say about creating sorted sets with the sorted-set function. But

there’s a simple rule that you should bear in mind:

Download from Wow! eBook <www.wowebook.com>

Persistent sets

95

(sorted-set :b :c :a)

;=> #{:a :b :c}

(sorted-set [3 4] [1 2])

;=> #{[1 2] [3 4]}

(sorted-set :b 2 :c :a 3 1)

; java.lang.ClassCastException: clojure.lang.Keyword cannot be cast to

java.lang.Number

As long as the arguments to the sorted-set function are mutually comparable, you’ll

receive a sorted set; otherwise an exception is thrown. This can manifest itself when

dealing with sorted sets down stream from their point of creation, leading to potential

confusion:

(def my-set (sorted-set :a :b))

;; ... some time later

(conj my-set "a")

;=> java.lang.ClassCastException: clojure.lang.Keyword cannot be cast to

java.lang.String

The difficulty in finding the reason for this exception will increase as the distance

between the creation of my-set and the call to conj increases. You can adjust this rule

a bit by using sorted-set-by instead, and providing your own comparator. This works

exactly like the comparator for sorted-map-by, which we’ll cover in section 6.6.2.

Sorted maps and sorted sets are also similar in their support of subseq to allow effi-

ciently jumping to a particular key in the collection, and walking through it from

there. This is covered in section 5.6.

5.5.3

contains?

As we touched on in section 5.2, there’s sometimes confusion regarding the usage of

Clojure’s contains? function. Many newcomers to Clojure expect this function to

work the same as Java’s java.util.Collection#contains method; this assumption is

false, as shown:

(contains? #{1 2 4 3} 4)

;=> true

(contains? [1 2 4 3] 4)

;=> false

If you were to draw a false analogy between Java’s .contains methods and contains?,

then both of the function calls noted here should’ve returned true. The official

documentation for contains? describes it as a function that returns true if a given key

exists within a collection. When reading the word key, the notion of a map springs to

mind, but the fact that this function also works on sets hints at their implementation

details. Sets are implemented as maps with the same element as the key and value,10

but there’s an additional check for containment before insertion.

10 All implementation caveats apply.

Download from Wow! eBook <www.wowebook.com>

96

CHAPTER 5 Composite data types

5.5.4

clojure.set

Mathematical sets form the basis of much of modern mathematical thought, and Clo-

jure’s basic set functions in the clojure.set namespace are a clear reflection of the

classical set operations. In this subsection we’ll briefly cover each function and talk

about how, when applicable, they differ from the mathematical model. First, we’ll start

with a simple picture.

Figure 5.4 describes the nature of Clojure’s set functions, each of which will be

shown presently. Note that Clojure’s set functions take an arbitrary number of sets and

apply the operation incrementally.

INTERSECTION

Clojure’s clojure.set/intersection function works as you might expect. Given two

sets, intersection returns a set of the common elements. Given n sets, it’ll incremen-

tally return the intersection of resulting sets and the next set, as seen in the following

code:

(clojure.set/intersection #{:humans :fruit-bats :zombies}

#{:chupacabra :zombies :humans})

;=> #{:zombies :humans}

(clojure.set/intersection #{:pez :gum :dots :skor}

#{:pez :skor :pocky}

#{:pocky :gum :skor})

;=> #{:skor}

In the first example, the resulting set is simply the common elements between the

given sets. The second example is the result of the intersection of the first two sets

then intersected with the final set.

UNION

There’s also likely no surprise when using the clojure.set/union function:

(clojure.set/union #{:humans :fruit-bats :zombies}

#{:chupacabra :zombies :humans})

;=> #{:chupacabra :fruit-bats :zombies :humans}

(clojure.set/union #{:pez :gum :dots :skor}

#{:pez :skor :pocky}

#{:pocky :gum :skor})

;=> #{:pez :pocky :gum :skor :dots}

Figure 5.4 Basic set operations. The three Venn diagrams show a graphical

representation of Clojure’s set functions: intersection, union, and difference.

Download from Wow! eBook <www.wowebook.com>

Thinking in maps

97

Given two sets, the resulting set will contain all of the distinct elements from both. In

the first example this means :zombies and :humans only show up once each in the

return value. Note in the second example that more than two sets may be given to

union, but as expected each value given in any of the input sets is included exactly

once in the output set.

DIFFERENCE

The only set function that could potentially cause confusion on first glance is

clojure.set/difference, which by name implies some sort of opposition to a union

operation. Working under this false assumption you might assume that difference

would operate thusly:

(clojure.set/difference #{1 2 3 4} #{3 4 5 6})

;=> #{1 2 5 6}

But if you were to evaluate this expression in your REPL, you’d receive a very different

result:

(clojure.set/difference #{1 2 3 4} #{3 4 5 6})

;=> #{1 2}

The reason for this result is that Clojure’s difference function calculates what’s

known as a relative complement (Stewart 1995) between two sets. In other words,

difference can be viewed as a set subtraction function “removing” all elements in a

set A that are also in another set B.

5.6

Thinking in maps

It’s difficult to write a program of any significant size without the need for a map of

some sort. The use of maps is ubiquitous in writing software because frankly it’s diffi-

cult to imagine a more robust data structure. But we as programmers tend to view

maps as a special case structure outside of the normal realm of data objects and

classes. The object-oriented school of thought has relegated the map as a supporting

player in favor of the class. We’re not going to talk about the merits, or lack thereof,

for this relegation here, but in upcoming sections we’ll discuss moving away from

thinking in classes and instead thinking in the sequence abstraction, maps, protocols,

and types. Having said all of that, it need hardly be mentioned that maps should be

used to store named values. In this section we talk about the different types of maps

and the tradeoffs surrounding each.

5.6.1

Hash maps

Arguably, the most ubiquitous11 form of map found in Clojure programs is the hash

map, which provides an unsorted key/value associative structure. In addition to the

literal syntax touched on in chapter 2, hash maps can be created using the hash-map

function, which likewise takes alternating key/value pairs, with or without commas:

11 Although with the pervasiveness of the map literal, the ubiquity may instead fall to the array map.

Download from Wow! eBook <www.wowebook.com>

98

CHAPTER 5 Composite data types

(hash-map :a 1, :b 2, :c 3, :d 4, :e 5)

;=> {:a 1, :c 3, :b 2, :d 4, :e 5}

Clojure hash maps support heterogeneous keys, meaning that they can be of any type

and each key can be of a differing type, as this code shows:

(let [m {:a 1, 1 :b, [1 2 3] "4 5 6"}]

[(get m :a) (get m [1 2 3])])

;=> [1 "4 5 6"]

As we previously mentioned at the beginning of this chapter, many of Clojure’s com-

posite types can be used as functions, and in the case of maps they’re functions of

their keys. Using maps in this way will act the same as the use of the get function in

the previous code sample, as shown when building a vector of two elements:

(let [m {:a 1, 1 :b, [1 2 3] "4 5 6"}]

[(m :a) (m [1 2 3])])

;=> [1 "4 5 6"]

Providing a map to the seq function will return a sequence of map entries:

(seq {:a 1, :b 2})

;=> ([:a 1] [:b 2])

Of course, this sequence appears to be composed of the sets of key/value pairs con-

tained in vectors, and for all practical purposes should be treated as such. In fact, a

new hash map can be created idiomatically using this precise structure:

(into {} [[:a 1] [:b 2]])

;=> {:a 1, :b 2}

Even if your embedded pairs aren’t vectors, they can be made to be for building a new

map:

(into {} (map vec '[(:a 1) (:b 2)]))

;=> {:a 1, :b 2}

In fact, your pairs don’t have to be explicitly grouped, because you can use apply to cre-

ate a hash map given that the key/value pairs are laid out in a sequence consecutively:

(apply hash-map [:a 1 :b 2])

;=> {:a 1, :b 2}

You can also use apply in this way with sorted-map and array-map. Another idiomatic

way to build a map is to use zipmap to “zip” together two sequences, the first of which

contains the desired keys and the second their corresponding values:

(zipmap [:a :b] [1 2])

;=> {:b 2, :a 1}

The use of zipmap illustrates nicely the final property of map collections. Hash maps

in Clojure have no order guarantees. If you do require ordering, then you should use

sorted maps, discussed next.

Download from Wow! eBook <www.wowebook.com>

Thinking in maps

99

5.6.2

Keeping your keys in order with sorted maps

It’s impossible to rely on a specific ordering of the key/value pairs for a standard Clo-

jure map, because there are no order guarantees at all. Using the sorted-map and

sorted-map-by functions, you can construct maps with order assurances. By default,

the function sorted-map will build a map sorted by the comparison of its keys:

(sorted-map :thx 1138 :r2d 2)

;=> {:r2d 2, :thx 1138}

You may require an alternative key ordering, or perhaps an ordering for keys that isn’t

easily comparable. In these cases you must use sorted-map-by, which takes an addi-

tional comparison function:12

(sorted-map "bac" 2 "abc" 9)

;=> {"abc" 9, "bac" 2}

(sorted-map-by #(compare (subs %1 1) (subs %2 1)) "bac" 2 "abc" 9)

;=> {"bac" 2, "abc" 9}

This means that sorted maps don’t generally support heterogeneous keys the same as

hash maps, although it depends on the comparison function provided. For example,

the preceding one assumes all keys are strings. The default sorted-map comparison

function compare supports maps whose keys are all mutually comparable with each

other. Attempts to use keys that aren’t supported by whichever comparison function

you’re using will generally result in a cast exception:

(sorted-map :a 1, "b" 2)

;=> java.lang.ClassCastException: clojure.lang.Keyword cannot be cast to

java.lang.String

One remarkable feature supported by sorted maps (and also sorted sets) is the ability

to jump efficiently to a particular key and walk forward or backward from there

through the collection. This is done with the subseq and rsubseq functions for for-

ward and backward respectively. Even if you don’t know the exact key you want, these

functions can be used to “round up” the next closest key that exists.

Another way that sorted maps and hash maps differ is in their handling of numeric

keys. A number of a given magnitude can be represented by many different types; for

example 42 can be a long, int, float, and so on. Hash maps would treat each of these

different objects as different, whereas a sorted map would treat them as the same. You

can see the contrast in this example, where the hash map keeps both keys while the

sorted map keeps just one:

(assoc {1 :int} 1.0 :float)

;=> {1.0 :float, 1 :int}

(assoc (sorted-map 1 :int) 1.0 :float)

;=> {1 :float}

12 Note that simple boolean functions like > can be used as comparison functions.

Download from Wow! eBook <www.wowebook.com>

100

CHAPTER 5 Composite data types

This is because the comparison function used by the sorted map not only determines

order by equality, and if two keys compare as equal, only one will be kept. This applies

to comparison functions provided to sorted-map-by as well as the default comparator

shown previously.

Sorted maps will otherwise work just like hash maps and can be used interchange-

ably. You should use sorted maps if you need to specify or guarantee a specific key

ordering. On the other hand, if you need to maintain insertion ordering, then the use

of array maps is required as you’ll see.

5.6.3

Keeping your insertions in order with array maps

If you hope to perform an action under the assumption that a given map is insertion-

ordered, then you’re setting yourself up for disappointment. But you might already

know that Clojure provides a special map that ensures insertion ordering called an

array map:

(seq (hash-map :a 1, :b 2, :c 3))

;=> ([:a 1] [:c 3] [:b 2])

(seq (array-map :a 1, :b 2, :c 3))

;=> ([:a 1] [:b 2] [:c 3])

So when insertion order is important, you should explicitly use an array map. Array

maps can be populated quickly by ignoring the form of the key/value pairs and blindly

copying them into place. For structures sized below a certain count, the cost associated

with map lookup bridges the gap between a linear search through an equally sized

array or list. That’s not to say that the map will be slower; instead, it allows the map and

linear implementations to be comparable. Sometimes your best choice for a map is not

a map at all, and like most things in life there are tradeoffs. Thankfully, Clojure takes

care of these considerations for you by adjusting the concrete implementations behind

the scenes as the size of the map increases. The precise types in play aren’t important,

because Clojure is careful to document its promises and to leave undefined aspects

subject to change and/or improvement. It’s usually a bad idea to build your programs

around concrete types, and always bad to build around undocumented behaviors. Clo-

jure handles the underlying efficiency considerations so you don’t have to. But be aware that if

ordering is important, you should avoid operations that inadvertently change the

underlying map implementation from an array map.

We’ve covered the basics of Clojure maps in this section, including common usage

and construction techniques. Clojure maps, minus some implementation details,

shouldn’t be surprising to anyone. It’ll take a while to grow accustomed to dealing

with immutable maps, but in time even this nuance will become second nature.

Now that we’ve looked at Clojure’s primary collection types and their differences

in detail, we’ll take some time to work through a simple case study. This case study,

creating a function named pos, will illustrate the thought processes you might con-

sider on your way toward designing an API built on the principles of the sequence

abstraction.

Download from Wow! eBook <www.wowebook.com>

Putting it all together: finding the position of items in a sequence

101

5.7

Putting it all together: finding the position of items

in a sequence

We sometimes underestimate the influence of little things.

—Charles W. Chesnutt

The case study for this chapter will be to design and implement a simple function to

locate the positional index13 of an element within a sequence. We’re going to pool

together much of the knowledge that you’ve gained in this chapter in order to illus-

trate the steps you might take in designing, writing, and ultimately optimizing a Clo-

jure collection function. Of course, we’re going to work against the sequence

abstraction and will therefore design the solution accordingly.

The function, named pos, must

 Work on any composite type returning indices corresponding to some value

 Return a numerical index for sequential collections or associated key for maps

and sets

 Otherwise return nil

5.7.1

Implementation

If we were to address each of the requirements for pos literally and directly, we might

come up with a function that looks like the following listing.

Listing 5.2 First cut at our position function

(defn pos [e coll]

(let [cmp (if (map? coll)

#(= (second %1) %2)

Map compare

#(= %1 %2))]

Default compare

(loop [s coll idx 0]

Start at beginning

(when (seq s)

(if (cmp (first s) e)

Compare

(if (map? coll)

(first (first s))

Map returns key

idx)

...Else index

(recur (next s) (inc idx)))))))

(pos 3 [:a 1 :b 2 :c 3 :d 4])

;=> 5

(pos :foo [:a 1 :b 2 :c 3 :d 4])

;=> nil

(pos 3 {:a 1 :b 2 :c 3 :d 4})

;=> :c

(pos 3 '(:a 1 :b 2 :c 3 :d 4))

;=> 5

(pos \3 ":a 1 :b 2 :c 3 :d 4")

;=> 13

13 Stuart Halloway describes a similar function index-of-any in his book Programming Clojure that views the

problem largely through the lens of reduced complexity. We like his example and this one because it’s simple

yet powerful and nicely illustrative of the way that Clojure functions should be written.

Download from Wow! eBook <www.wowebook.com>

102

CHAPTER 5 Composite data types

Pretty hideous right? We think so too. Apart from being overly complicated, it’d likely

be more useful if we instead returned a sequence of all the indices matching the item,

so we’ll add that to the requirements. But we’ve built a heavy load with the first cut at

pos and should probably step back a moment to reflect. First of all, it’s probably the

wrong approach to handle map types and other sequence types differently. The use of

the predicate map? to detect the type of the passed collection is incredibly constrain-

ing, in that it forces different collections to be processed differently. That’s not to say

that the use of type-based predicates is strictly prohibited, only that you should try to

favor more generic algorithms or at least to minimize their usage.

As chance has it, the exact nature of the problem demands that we view collections

as a set of values paired with a given index, be it explicit in the case of maps or implicit

in the case of other sequences’ positional information. Therefore, imagine how easy

this problem would be if all collections were laid out as a sequence of pairs ([index1

value1] [index2 value2] ... [indexn valuen]). Well, there’s no reason why they

couldn’t, as shown next.

Listing 5.3 An index function

(defn index [coll]

(cond

(map? coll) (seq coll)

(set? coll) (map vector coll coll)

:else (map vector (iterate inc 0) coll)))

This simple function14 can generate a uniform representation for indexed collections:

(index [:a 1 :b 2 :c 3 :d 4])

;=> ([0 :a] [1 1] [2 :b] [3 2] [4 :c] [5 3] [6 :d] [7 4])

(index {:a 1 :b 2 :c 3 :d 4})

;=> ([:a 1] [:b 2] [:c 3] [:d 4])

(index #{:a 1 :b 2 :c 3 :d 4})

;=> ([1 1] [2 2] [3 3] [4 4] [:a :a] [:c :c] [:b :b] [:d :d])

As shown, we’re still using type-based predicates, but we’ve raised the level of abstrac-

tion to the equality partitions in order to build contextually relevant indices. Now, the

function for finding the positional indices for the desired value is trivial:

(defn pos [e coll]

(for [[i v] (index coll) :when (= e v)] i))

(pos 3 [:a 1 :b 2 :c 3 :d 4])

;=> (5)

(pos 3 {:a 1, :b 2, :c 3, :d 4})

;=> (:c)

(pos 3 [:a 3 :b 3 :c 3 :d 4])

;=> (1 3 5)

(pos 3 {:a 3, :b 3, :c 3, :d 4})

;=> (:a :c :b)

14 Clojure has a core function keep-indexed that works similarly but doesn’t implicitly build indices along

equality partitions. For a vector, you could build the index as (keep-indexed #(-> [% %2]) [:a :b :c :d]).

Download from Wow! eBook <www.wowebook.com>

Summary

103

Much better! But there’s one more deficiency with the pos function from a Clojure

perspective. Typically in Clojure it’s more useful to pass a predicate function in cases

such as these, so that instead of pos determining raw equality, it can build its result

along any dimension, as shown:

(pos #{3 4} {:a 1 :b 2 :c 3 :d 4})

;=> (:c :d)

(pos even? [2 3 6 7])

;=> (0 2)

We can modify pos only slightly to achieve the ideal level of flexibility, as shown next.

Listing 5.4 Our final version of pos

(defn pos [pred coll]

(for [[i v] (index coll) :when (pred v)] i))

We’ve vastly simplified the original solution and generated two potentially useful func-

tions (Martin 2002) in the process. By following some simple Clojure principles, we

were able to solve the original problem statement in a concise and elegant manner.

5.8

Summary

Clojure favors simplicity in the face of growing software complexity. If problems are

easily solved by collection abstractions then those abstractions should be used. Most

problems can be modeled on such simple types, yet we continue to build monolithic

class hierarchies in a fruitless race toward mirroring the “real world”—whatever that

means. Perhaps it’s time to realize that we no longer need to layer self-imposed com-

plexities on top of software solutions that are already inherently complex. Not only

does Clojure provide the sequential, set, and map types useful for pulling ourselves

from the doldrums of software complexity, but it’s also optimized for dealing with

them.

Now that we’ve discussed each of these types in detail, we’re going to take a step

back and talk about three important properties of Clojure’s collection types that until

now we’ve only touch upon lightly: immutability, persistence, and laziness.

Download from Wow! eBook <www.wowebook.com>

Download from Wow! eBook <www.wowebook.com>

Part 3

Functional

programming

In this part of the book, we’ll expose some of the underpinnings of Clojure’s

approach to functional programming, as well as some practical uses of it. Clo-

jure provides mechanisms for immutability, deferred execution, closures, and

recursion. We’ll show examples of how these can work together to let you create

data structures of your own, and find routes through a weighted graph.

Download from Wow! eBook <www.wowebook.com>

Download from Wow! eBook <www.wowebook.com>

Being lazy and

set in your ways

This chapter covers

 Immutability

 Designing a persistent toy

 Laziness

 Putting it all together: a lazy quicksort

We’ve now reached the apex of imperative knowledge and stand at the precipice

leading toward functional programming. We mentioned in section 2.3 that the def-

initions of functional programming are widely disparate, and unfortunately this

book won’t work to unify them. Instead, we’ll start in this chapter to build a basis

for Clojure’s style of functional programming by digging into its core supporting

maxims. In addition, this chapter covers in greater depth the parts of Clojure’s

composite types that we only touched on.

6.1

On immutability

We’ve touched on immutability throughout this book, but we’ve avoided discussing

why Clojure has chosen it as a cornerstone principle. Though no panacea, fostering

107

Download from Wow! eBook <www.wowebook.com>

108

CHAPTER 6 Being lazy and set in your ways

immutability at the language level solves many difficult problems right out of the box,

while simplifying many others. Coming from a language background where mutability

interwoven with imperative programming methods reign, it often requires a signifi-

cant conceptual leap to twist your mind to accept and utilize immutability and func-

tional programming. In this section, we’ll build a conceptual basis for immutability as

it relates to Clojure’s underlying philosophy as well as why you should work to foster

immutability even when outside the warming confines of Clojure proper.

6.1.1

Defining immutability

In many cases, when talking specifically about Clojure’s immutable data structures, we

could be talking about the broader category of immutable objects without loss of

meaning. But we should probably set down some conditions defining just what’s

meant by immutability.

EVERY DAY IS LIKE SUNDAY

An entire branch of philosophy named predestination is devoted to exploring the

notion that there’s no such thing as free will, but instead, everything that we are or

ever will be is determined beforehand. Though this possibility for our own lives may

seem bleak, the notion does nicely encapsulate the first principle of immutability: all

of the possible properties of immutable objects are defined at the time of their con-

struction and can’t be changed thereafter.

IMMUTABILITY THROUGH CONVENTION

Computer systems are in many ways open systems, providing the keys to the vault if

you’re so inclined to grab them. But in order to foster an air of immutability in your

own systems, it’s important to create a facade of immutability. Creating immutable

classes in Java requires a few steps (Goetz 2006). First, a class itself and all of its fields

should be labeled as final. Next, in no way should an object’s this reference escape

during construction. And finally, any internal mutable objects should originate, either

whole-cloth or through a copy, within the class itself and thus never escape. Obviously

we’re simplifying, because there are finer details to this recipe for Java immutability,

but for now these simplified highlights serve to show that by observing convention,

even an inherently mutable language such as Java can be made to be immutable. Clo-

jure directly supports immutability as a language feature1 with its core data structures.

By providing immutable data structures as a primary language feature, Clojure sepa-

rates (Braithwaite 2007) the complexity of working with immutable structures from

the complexities of their implementation. By providing immutability either as a core

language feature or through convention, you can reap enormous benefit.

1 We’re intentionally glossing over Clojure’s features that support mutability such as reference types and tran-

sients in order to keep this section focused.

Download from Wow! eBook <www.wowebook.com>

On immutability

109

6.1.2

Being set in your ways—immutability

Clojure’s immutable data structures aren’t bolted onto the language as an after-

thought or as a choice in an a-la-carte menu. Instead, their inclusion in the language

runs deep to its philosophical core.

INVARIANTS

Invariant-based programming involves the definition of constraints on classes and

functions in order to provide assurances that if instances enter into certain states,

assertion errors will arise. Providing invariants within a mutable system requires a fair

amount of assertion weaving within the methods of any given class. But by observing a

practice of immutability, invariants are defined solely within the construction mecha-

nism and can never be violated thereafter.

REASONING

Because the life of an immutable object is one of predestiny, the matter of reasoning

about its possible states is simplified. It follows that the act of testing such a system is

simplified, in that the set of possible states and transitions is constrained.

EQUALITY HAS MEANING

Equality in the presence of mutability has no meaning. Equality in the face of mutabil-

ity and concurrency is utter lunacy. If any two objects resolve as being equal now, then

there’s no guarantee that they will a moment from now. And if two objects aren’t

equal forever, then they’re technically never equal (Baker 1993). Providing immuta-

ble objects once again assigns meaning to equality, in that if two objects are equal now,

then they’ll always be so.

SHARING IS CHEAP

If you’re certain that an object will never change, then sharing said object becomes a

simple matter of providing a reference to it. In Java, to do so often requires a lot of

defensive copying. Along this vein, because we can freely share references for immuta-

ble objects, we can likewise intern them for free.

FLATTENING THE LEVELS OF INDIRECTION

There’s a marked difference between a mutable object and a mutable reference. The

default in Java is that there are references that might point to mutable data. But in

Clojure, there are only mutable references. This may seem like a minor detail, but it

certainly works to reduce unnecessary complexities.

IMMUTABILITY FOSTERS CONCURRENT PROGRAMMING

Immutable objects are always thread safe.

—Brian Goetz, 

Java Concurrency in Practice

If an object can’t change, it can be shared freely between different threads of execu-

tion without fear of concurrent modification errors. There can be little debate about

this particular point, but that fact doesn’t answer the question of how mutation

occurs. Without delving into the specifics, you likely already know that Clojure

Download from Wow! eBook <www.wowebook.com>

110

CHAPTER 6 Being lazy and set in your ways

isolates mutation to its reference types while the data wrapped with them is left

unchanged. We’ll leave this alone for now, becuase we’ll devote chapter 11 to this

and related topics.

6.2

Designing a persistent toy

We won’t go into terrible detail about the internals of Clojure’s persistent data struc-

tures—we’ll leave that to others (Krukow 2009). But we do want to explore the notion

of structural sharing. Our example will be highly simplified compared to Clojure’s

implementations, but it should help clarify some of the techniques used.

The simplest shared-structure type is the list. Two different items can be added to

the front of the same list, producing two new lists that share their next parts. We’ll try

this out by creating a base list and then two new lists from that same base:

(def baselist (list :barnabas :adam))

(def lst1 (cons :willie baselist))

(def lst2 (cons :phoenix baselist))

lst1

;=> (:willie :barnabas :adam)

lst2

;=> (:phoenix :barnabas :adam)

You can think of baselist as a historical version of both lst1 and lst2. But it’s also

the shared part of both lists. More than being equal, the next parts of both lists are

identical —the same instance:

(= (next lst1) (next lst2))

;=> true

(identical? (next lst1) (next lst2))

;=> true

So that’s not too complicated, right? But the features supported by lists are also lim-

ited. Clojure’s vectors and maps also provide structural sharing, while allowing you to

change values anywhere in the collection, not just on one end. The key is the struc-

ture each of these datatypes uses internally. We’ll now build a simple tree to help dem-

onstrate how a tree can allow interior changes and maintain shared structure at the

same time.

Each node of our tree will have three fields: a value, a left branch, and a right

branch. We’ll put them in a map, like this:

{:val 5, :L nil, :R nil}

That’s the simplest possible tree—a single node holding the value 5, with empty left

and right branches. This is exactly the kind of tree we want to return when a single

item is added to an empty tree. To represent an empty tree, we’ll use nil. With the

structure decision made, we can write our own conj function xconj to build up our

tree, starting with just the code for this initial case:

Download from Wow! eBook <www.wowebook.com>

Designing a persistent toy

111

(defn xconj [t v]

(cond

(nil? t) {:val v, :L nil, :R nil}))

(xconj nil 5)

;=> {:val 5, :L nil, :R nil}

Hey, it works! Not too impressive yet though, so we need to handle the case where an

item is being added to a nonempty tree. We keep our tree in order by putting values

less than a node’s :val in the left branch, and other values in the right branch. That

means we need a test like this:

(< v (:val t))

When that’s true, we need the new value v to go into the left branch, (:L t). If this

were a mutable tree, we’d change the value of :L to be the new node. Instead, we

should build a new node, copying in the parts of the old node that don’t need to

change. Something like this:

{:val (:val t),

:L (insert-new-val-here),

:R (:R t)}

This will be the new root node. Now we just need to figure out what to put for insert-

new-val-here. If the old value of :L is nil, we simply need a new single-node tree—

we even have code for that already, so we could use (xconj nil v). But what if :L isn’t

nil? In that case, we want to insert v in its proper place within whatever tree :L is

pointing to—so (:L t) instead of nil:

(defn xconj [t v]

(cond

(nil? t) {:val v, :L nil, :R nil}

(< v (:val t)) {:val (:val t),

:L (xconj (:L t) v),

:R (:R t)}))

(def tree1 (xconj nil 5))

tree1

;=> {:val 5, :L nil, :R nil}

(def tree1 (xconj tree1 3))

tree1

;=> {:val 5, :L {:val 3, :L nil, :R nil}, :R nil}

(def tree1 (xconj tree1 2))

tree1

;=> {:val 5, :L {:val 3, :L {:val 2, :L nil, :R nil}, :R nil}, :R nil}

There, it’s working. At least it seems to be—there’s a lot of noise in that output, mak-

ing it difficult to read. Here’s a function to traverse the tree in sorted order, convert-

ing it to a seq that will print more succinctly:

(defn xseq [t]

(when t

(concat (xseq (:L t)) [(:val t)] (xseq (:R t)))))

Download from Wow! eBook <www.wowebook.com>

112

CHAPTER 6 Being lazy and set in your ways

(xseq tree1)

;=> (2 3 5)

Now we just need a final condition for handling the insertion of values that are not less

than the node value:

(defn xconj [t v]

(cond

(nil? t) {:val v, :L nil, :R nil}

(< v (:val t)) {:val (:val t),

:L (xconj (:L t) v),

:R (:R t)}

:else {:val (:val t),

:L (:L t),

:R (xconj (:R t) v)}))

Now that we have the thing built, we hope you understand well enough how it’s put

together that this demonstration of the shared structure will be unsurprising:

(def tree2 (xconj tree1 7))

(xseq tree2)

;=> (2 3 5 7)

(identical? (:L tree1) (:L tree2))

;=> true

Both tree1 and tree2 share a common structure, which is more easily visualized in

figure 6.1.

This example demonstrates several features that it has in common with all of Clo-

jure’s persistent collections:

 Every “change” creates at least a new root node, plus new nodes as needed in

the path through the tree to where the new value is being inserted.

 Values and unchanged branches are never copied, but references to them are

copied from nodes in the old tree to nodes in the new one.

 This implementation is completely thread-safe in a way that’s easy to check—no

object that existed before a call to xconj is changed in any way, and newly cre-

ated nodes are in their final state before being returned. There’s no way for any

other thread, or even any other functions in the same thread, to see anything in

an inconsistent state.

tree1

tree2

L

5 R

L

R

nil

5

L

3 R

L

R

Figure 6.1 Shared structure tree: no matter

nil

nil

7 nil

how big the left side of a tree’s root node is,

something can be inserted on the right side

without copying, changing, or even examining

L

the left side. All those values will be included

nil

2 Rnil

in the new tree, along with the inserted value.

Download from Wow! eBook <www.wowebook.com>

Laziness

113

Our example fails, though, when compared to Clojure’s production-quality code:

 It’s just a binary tree.2

 It can only store numbers.

 It’ll overflow the stack if the tree gets too deep.

 It produces (via xseq) a non-lazy seq that will contain a whole copy of the tree.

 It can create unbalanced trees that’ll have bad “worst case” algorithmic 

complexity.3

Though structural sharing as described using xconj as a basis example can reduce the

memory footprint of persistent data structures, it alone is insufficient. Instead, Clojure

leans heavily on the notion of lazy sequences to further reduce its memory footprint,

as we’ll explore further in the next section.

6.3

Laziness

Through all the windows I only see infinity.

House of Leaves

by Mark Z. Danielewski

Clojure is partially a lazy language. This isn’t to say that Clojure vectors lie around the

house every day after school playing video games and refusing to get a job. Instead,

Clojure is lazy in the way it handles its sequence types—but what does that mean?

First, we’ll start by defining what it means for a language to be eager, or in other words,

not lazy. Many programming languages are eager in that arguments to functions are

immediately evaluated when passed, and Clojure in most cases follows this pattern as

well. Observe the following:

(- 13 (+ 2 2))

;=> 9

The expression (+ 2 2) is eagerly evaluated, in that its result 4 is passed on to the sub-

traction function during the actual call, and not at the point of need. But a lazy pro-

gramming language such as Haskell (Hudak 2000) will evaluate a function argument

only if that argument is needed in an overarching computation.

In this section we’ll discuss how laziness can be used to avoid nontermination,

unnecessary calculations, and even combinatorially exploding computations. We’ll

also discuss the matter of utilizing infinite sequences, a surprisingly powerful tech-

nique. Finally, we’ll use Clojure’s delay and force to build a simple lazy structure. First,

we’ll start with a simple example of laziness that you may be familiar with from Java.

6.3.1

Familiar laziness with logical-and

Laziness isn’t limited to the case of the evaluation of function arguments; a common

example can be found even in eager programming languages. Take the case of Java’s

2

Clojure’s sorted collections are binary trees, but its hash maps, hash sets, and vectors all have up to 32

branches per node. This results in dramatically shallower trees, and therefore faster lookups and updates.

3

Clojure’s sorted map and sorted set do use a binary tree internally, but they implement red-black trees to keep

the left and right sides nicely balanced.

Download from Wow! eBook <www.wowebook.com>

114

CHAPTER 6 Being lazy and set in your ways

logical-and operator &&. Java implementations optimize this particular operator to

avoid performing unnecessary operations should an early subexpression evaluate to

false. This lazy evaluation in Java allows the following idiom:

if (obj != null && obj.isWhatiz()) {

...

}

For those of you unfamiliar with Java, the preceding code says: “if the object obj isn’t

null, then call the method isWhatiz.” Without a short-circuiting (or lazy, if you will)

&& operator, the preceding operation would always throw a java.lang.NullPointer-

Exception whenever obj was set to null. Though this simple example doesn’t qualify

Java as a lazy language, it does illustrate the first advantage of lazy evaluation— laziness

allows the avoidance of errors in the evaluation of compound structures.

Clojure’s and operator also works this way, as do a number of other operators, but

we won’t discuss this type of short-circuiting laziness too deeply. Listing 6.1 illustrates

what we mean using the case of a series of nested if expressions.

Listing 6.1 Short-circuiting if expression

(defn if-chain [x y z]

(if x

(if y

(if z

(do

(println "Made it!")

:all-truthy)))))

(if-chain () 42 true)

; Made it!

;=> :all-truthy

(if-chain true true false)

;=> nil

The call to println is evaluated only in the case of three truthy arguments. But we can

perform the equivalent action given only the and macro:

(defn and-chain [x y z]

(and x y z (do (println "Made it!") :all-truthy)))

(and-chain () 42 true)

; Made it!

;=> :all-truthy

(and-chain true false true)

;=> false

You may see tricks like this from time to time, but they’re not widespread in idiomatic

Clojure code. Regardless, we’ve presented them as a launching point for the rest of

the discussion in the section. We’ll now proceed to discussing how your own Clojure

programs can be made more generally lazy by following an important recipe.

Download from Wow! eBook <www.wowebook.com>

Laziness

115

6.3.2

Understanding the lazy-seq recipe

Here’s a seemingly simple function steps that takes a sequence and makes a deeply

nested structure from it:

(steps [1 2 3 4])

;=> [1 [2 [3 [4 []]]]]

Seems simple enough, no? Your first instinct might be to tackle this problem recur-

sively, as suggested by the form of the desired result:

(defn rec-step [[x & xs]]

(if x

[x (rec-step xs)]

[]))

(rec-step [1 2 3 4])

;=> [1 [2 [3 [4 []]]]]

Things look beautiful at this point; we’ve created a simple solution to a simple prob-

lem. But therein bugbears lurk. What would happen if we ran this same function on a

large set?

(rec-step (range 200000))

;=> java.lang.StackOverflowError

Observing the example, running the same function over a sequence of 200,000 ele-

ments4 causes a stack overflow. How can we fix this problem? Perhaps it’s fine to say

that you’ll never encounter such a large input set in your own programs; such

tradeoffs are made all of the time. But Clojure provides lazy sequences to help tackle

such problems without significantly complicating your source code. Additionally, idi-

omatic Clojure code will always strive to deal with, and produce, lazy sequences.

Stepping back a bit, we should examine the lazy-seq recipe for applying laziness to

your own functions:

1

Use the lazy-seq macro at the outermost level of your lazy sequence producing

expression(s).

2

If you happen to be consuming another sequence during your operations, then

use rest instead of next.

3

Prefer higher-order functions when processing sequences.

4

Don’t hold onto your head.

These rules of thumb are simple, but they take some practice to utilize to their fullest.

For example, #4 is especially subtle in that the trivial case is easy to conceptualize, but

it’s more complex to implement in large cases. For now we’ll gloss over #3, because

we’ll talk about that approach separately in section 7.1.

So how can you leverage these rules of thumb to ensure laziness?

4 On our machines, 200,000 elements is enough to cause a stack overflow, but your machine may require more

or fewer depending on your JVM configuration.

Download from Wow! eBook <www.wowebook.com>

116

CHAPTER 6 Being lazy and set in your ways

rest versus next

The difference between rest and next can be seen in the following example:

(def very-lazy (-> (iterate #(do (print \.) (inc %)) 1)

rest rest rest))

;=> ..#'user/very-lazy

(def less-lazy (-> (iterate #(do (print \.) (inc %)) 1)

next next next))

;=> ...#'user/less-lazy

When building a lazy seq from another, rest doesn’t realize any more elements than

it needs to; next does. In order to determine whether a seq is empty, next needs to

check whether there’s at least one thing in it, thus potentially causing one extra real-

ization. Here’s an example:

(println (first very-lazy)) ; .4

(println (first less-lazy)) ; 4

Grabbing the first element in a lazy seq built with rest causes a realization as

expected. But the same doesn’t happen for a seq built with next because it’s already

been previously realized. Using next causes a lazy seq to be one element less lazy,

which might not be desired if the cost of realization is expensive. In general, we rec-

ommend that you use next unless you’re specifically trying to write code to be as lazy

as possible.

UTILIZING LAZY-SEQ AND REST

In order to be a proper lazy citizen, you should produce lazy sequences using the

lazy-seq macro:

(defn lz-rec-step [s]

(lazy-seq

(if (seq s)

[(first s) (lz-rec-step (rest s))]

[])))

(lz-rec-step [1 2 3 4])

;=> (1 (2 (3 (4 ()))))

(class (lz-rec-step [1 2 3 4]))

;=> clojure.lang.LazySeq

(dorun (lz-rec-step (range 200000)))

;=> nil

There are a few points of note for our new implementation. First, we’ve eliminated

destructuring on the function arguments because the & arguments within are implic-

itly destructured via the nthnext function. As we mentioned in our rules of thumb,

when consuming a sequence within the body of a lazy-seq you’ll want to use rest,

which we did in lz-rec-step. Second, we’re no longer producing nested vectors as the

output of the function, but instead a lazy sequence LazySeq, which is the by-product of

the lazy-seq macro.

Download from Wow! eBook <www.wowebook.com>

Laziness

117

With only minor adjustments, we’ve created a lazy version of the step function

while also maintaining simplicity. The first two rules of the lazy sequence recipe can

be used in all cases when producing lazy sequences. You’ll see this pattern over and

over in idiomatic Clojure code.

If what’s going on here still doesn’t quite make sense to you, consider this even

simpler example:

(defn simple-range [i limit]

(lazy-seq

(when (< i limit)

(cons i (simple-range (inc i) limit)))))

This behaves similarly to Clojure’s built-in function range, but it’s simpler in that it

doesn’t accept a step argument and has no

support for producing chunked seqs:5

lazy-seq

lazy-seq

lazy-seq

cons

cons

thunk

(simple-range 0 9)

0

1

(simple-range 2 9)

;=> (0 1 2 3 4 5 6 7 8)

Note that it follows all the lazy-seq recipe

Figure 6.2 Each step of a lazy seq may be in

one of two states. If the step is unrealized, it’ll

rules you’ve seen so far. Figure 6.2 is a repre-

contain a function or closure of no arguments

sentation of what’s in memory when the

(a thunk) that can be called later to realize

REPL has printed the first two items in a

the step. When this happens, the thunk’s

return value is cached instead, and the thunk

simple-range seq but hasn’t yet printed any

itself is released as pictured in the first two

more than that.

lazy seq boxes, transitioning the step to the

One way in which complications may

realized state. Note that although not shown

here, a realized lazy seq may simply contain

arise is by accidentally holding onto the head

nothing at all, called nil, indicating the end

of a lazy sequence. This is addressed by the

of the seq.

third rule of lazy sequences.

6.3.3

Losing your head

The primary advantage of laziness in Clojure is that it prevents the full realization of

interim results during a calculation. If you manage to hold onto the head of a

sequence somewhere within a function, then that sequence will be prevented from

being garbage collected. The simplest way to retain the head of a sequence is to bind

it to a local. This condition can occur with any type of value bind, be it to a reference

type or through the usage of let or binding:

(let [r (range 1e9)] [(first r) (last r)])

;=> [0 999999999]

(let [r (range 1e9)] [(last r) (first r)])

; java.lang.OutOfMemoryError: GC overhead limit exceeded

Clojure’s compiler can deduce that in the first example, the retention of r is no longer

needed when the computation of (last r) occurs, and therefore aggressively clears it.

5 Chunked seqs are a technique for improving performance that we cover in chapter 12.

Download from Wow! eBook <www.wowebook.com>

118

CHAPTER 6 Being lazy and set in your ways

But in the second example, the head is needed later in the overall computation and

can no longer be safely cleared. Of course, the compiler could perform some rearrang-

ing with the order of operations for this case, but it doesn’t because in order to do so

safely it would have to guarantee that all of the composite functions were pure. It’s

okay if you’re not clear on what a pure function is right now—we’ll cover it in section

7.1. In a nutshell, take to heart that Clojure can’t rearrange operations, because there’s

no way to guarantee that order is unimportant. This is one area where a purely func-

tional lazy language such as Haskell (Thompson 1999) really shines by comparison.

6.3.4

Employing infinite sequences

Because Clojure’s sequences are lazy, they have the potential to be infinitely long. Clo-

jure provides a number of functions for generating and working with infinite

sequences:

; Run at your own risk

(iterate (fn [n] (/ n 2)) 1)

;=> (1 1/2 1/4 1/8 ...)

It sure is a nice trick (although you might not think so had you chosen to ignore our

warning), but what could you possibly use infinite sequences for? Working with infi-

nite sequences often fosters more declarative solutions. Take a simple example as a

start. Imagine that we have a function that calculates a triangle number for a given

integer:

(defn triangle [n]

(/ (* n (+ n 1)) 2))

(triangle 10)

;=> 55

The function triangle can then be used to build a sequence of the first 10 triangle

numbers:

(map triangle (range 1 11))

;=> (1 3 6 10 15 21 28 36 45 55)

There’s nothing wrong with the preceding solution, but it suffers from a lack of flexi-

bility in that it does what it does and that’s all. By defining a sequence of all of the tri-

angle numbers as in the following listing, you can perform more interesting “queries”

in order to retrieve the desired elements.

Listing 6.2 Infinite sequences foster declarative solutions.

(def tri-nums (map triangle (iterate inc 1)))

(take 10 tri-nums)

Get first 10

;=> (1 3 6 10 15 21 28 36 45 55)

(take 10 (filter even? tri-nums))

;=> (6 10 28 36 66 78 120 136 190 210)

Get first 10 even

(nth tri-nums 99)

What Gauss found

Download from Wow! eBook <www.wowebook.com>

Laziness

119

;=> 5050

(double (reduce + (take 1000 (map / tri-nums))))

Converge on 2

;=> 1.998001998001998

(take 2 (drop-while #(< % 10000) tri-nums))

First 2 greater than 10,000

;=> (10011 10153)

;; ...

The queries used three ubiquitous Clojure functions: map, reduce, and filter. The

map function applies a function to each element in a sequence and returns the result-

ing sequence. The reduce function applies a function to each value in the sequence

and the running result to accumulate a final value. Finally, the filter function applies

a function to each element in a sequence and returns a new sequence of those ele-

ments where said function returned a truthy value. All three of these functions retain

the laziness of a given sequence.

Defining the infinite sequence of triangle numbers allows you to take elements

from it as needed, only calculating those particular items.

6.3.5

The delay and force macros

Although Clojure sequences are largely lazy, Clojure itself isn’t. In most cases, expres-

sions in Clojure are evaluated once prior to their being passed into a function rather

than at the time of need. But Clojure does provide mechanisms for implementing

what are known as call-by-need semantics. The most obvious of these mechanisms is its

macro facilities, but we’ll defer that discussion until chapter 8. The other mechanism

for providing what we’ll call explicit laziness are Clojure’s delay and force. In short,

the delay macro is used to defer the evaluation of an expression until explicitly forced

using the force function. Using these laziness primitives, we can wrap an expression

in a call to delay and use it only if necessary on the callee’s side:

(defn defer-expensive [cheap expensive]

(if-let [good-enough (force cheap)]

good-enough

(force expensive)))

(defer-expensive (delay :cheap)

(delay (do (Thread/sleep 5000) :expensive)))

;=> :cheap

(defer-expensive (delay false)

(delay (do (Thread/sleep 5000) :expensive)))

;=> :expensive

You can simulate this behavior with the use of anonymous functions, where delay is

replaced by (fn [] expr) and force by (delayed-fn), but using delay/force allows

you to explicitly check for delayed computations using delay?. Additionally, delay

caches its calculation, therefore allowing its wrapped expression to be calculated only

once. Of course, you could simulate the same behavior using memoization,6 but why

would you in this case when delay and force solve the problem more succinctly?

6 We’ll cover memoization in section 12.4.

Download from Wow! eBook <www.wowebook.com>

120

CHAPTER 6 Being lazy and set in your ways

if-let and when-let

The if-let and when-let macros are useful when you’d like to bind the results of

an expression based on if it returns a truthy value. This helps to avoid the need to

nest if/when and let as shown:

(if :truthy-thing

(let [res :truthy-thing] (println res)))

; :truthy-thing

(if-let [res :truthy-thing] (println res))

; :truthy-thing

The latter is much more succinct.

There are more complicated usage patterns for delay and force besides the simple

scheme outlined previously. For example, we can implement a version of the lazy

sequence of triangular numbers from a few sections prior using delay and force:

(defn inf-triangles [n]

{:head (triangle n)

:tail (delay (inf-triangles (inc n)))})

(defn head [l] (:head l))

(defn tail [l] (force (:tail l)))

The function inf-triangles creates a lazy linked-list of nodes. Each node is a map

containing a value mapped to :head and a link to the remainder of the list keyed as

:tail. The head of the list is the result of applying the function triangle to the incre-

menting counter passed recursively within the body of delay. As you can imagine, the

head of a node is always calculated as we walk down the linked-list, even if it’s never

accessed. This type of lazy structure is known as head strict but differs from Clojure's

lazy-seq, which delays both the head and tail and then realizes them at the same time.

We can now create a structure similar to the original tri-nums and start getting at

its contained elements:

(def tri-nums (inf-triangles 1))

(head tri-nums)

;=> 1

(head (tail tri-nums))

;=> 3

(head (tail (tail tri-nums)))

;=> 6

One thing to note about the preceding code is that accessing the values 3 and 6 were

deferred calculations only occurring on demand. The structure of the example is

shown in figure 6.3.

Download from Wow! eBook <www.wowebook.com>

Putting it all together: a lazy quicksort

121

tri-nums

(inf-triangles 1)

:head 1

:tail

delay

(inf-triangles 2)

:head 3

:tail

delay

(inf-triangles 3)

Figure 6.3 Lazy linked-list example. Each node of

:head 6

this linked list contains a value (the head) and a delay

:tail

delay...

(the tail). The creation of the next part is forced by a

call to tail—it doesn't exist until then.

Though we can navigate the entire chain of triangular numbers using only head and

tail, it’s probably a better idea7 to use them as primitives for more complicated

functions:

(defn taker [n l]

(loop [t n, src l, ret []]

(if (zero? t)

ret

(recur (dec t) (tail src) (conj ret (head src))))))

(defn nthr [l n]

(if (zero? n)

(head l)

(recur (tail l) (dec n))))

(taker 10 tri-nums)

;=> [1 3 6 10 15 21 28 36 45 55]

(nthr tri-nums 99)

;=> 5050

Of course, writing programs using delay and force is an onerous way to go about the

problem of laziness, and you’d be better served by using Clojure’s lazy sequences to

full effect rather than building your own from these basic blocks. But the preceding

code, in addition to being simple to understand, harkens back to chapter 5 and the

entire sequence “protocol” being built entirely on the functions first and rest.

Pretty cool, right?

6.4

Putting it all together: a lazy quicksort

In a time when the landscape of programming languages is rife with new program-

ming languages and pregnant with more, it seems inconceivable that the world would

need another quicksort implementation. Inconceivable or not, we won’t be deterred

from adding yet another to the rich ecosystem of pet problems. Our implementation

7 And as we’ll cover in section 9.3, participating in the ISeq protocol is even better.

Download from Wow! eBook <www.wowebook.com>

122

CHAPTER 6 Being lazy and set in your ways

of quicksort differs from many in a few key ways. First, we’ll implement a lazy, tail-

recursive version. Second, we’ll split the problem such that it can be executed incre-

mentally. Only the calculations required to obtain the part of a sequence desired will

be calculated. This will illustrate the fundamental reason for laziness in Clojure: the

avoidance of full realization of interim results.

THE IMPLEMENTATION

Without further ado, we present our quicksort implementation.8

Listing 6.3 A lazy, tail-recursive quicksort implementation

(ns joy.q)

(defn nom [n] (take n (repeatedly #(rand-int n))))

(defn sort-parts

"Lazy, tail-recursive, incremental quicksort. Works against

and creates partitions based on the pivot, defined as 'work'."

[work]

(lazy-seq

Pull

(loop [[part & parts] work]

apart work

(if-let [[pivot & xs] (seq part)]

Define pivot

(let [smaller? #(< % pivot)]

comparison fn

(recur (list*

(filter smaller? xs)

Work all < pivot

pivot

Work pivot itself

(remove smaller? xs)

Work all > pivot

parts)))

Concat parts

(when-let [[x & parts] parts]

(cons x (sort-parts parts)))))))

Sort rest

(defn qsort [xs]

if more parts

(sort-parts (list xs)))

The key detail in the code above is that sort-parts works not on a plain sequence of

elements but on a carefully constructed list that alternates between lazy seqs and piv-

ots. Every element before each pivot is guaranteed to be less than the pivot and every-

thing after will be greater, but the sequences between the pivots are as yet unsorted.

When qsort is given an input sequence of numbers to sort, it creates a new work list

consisting of just that input sequence and passes this work to sort-parts. The loop

inside sort-parts pulls apart the work, always assuming that the first item, which it

binds to part, is an unsorted sequence. It also assumes that if there is a second item,

which will be at the head of parts, it is a pivot. It recurs on the sequence at the head

of work, splitting out pivots and lazy seqs until the sequence of items less than the

most recent pivot is empty, in which case the if-let test is false, and that most recent

pivot is returned as the first item in the sorted seq. The rest of the built up list of work

8

This listing uses the list* function, which for some reason is somewhat rarely seen. In cases like this, how-

ever, it is exactly what is needed. list* is like list except it expects its last argument to be a list on which to

prepend its other arguments. We’ll use it again in chapter 8.

Download from Wow! eBook <www.wowebook.com>

Putting it all together: a lazy quicksort

123

held by the returned lazy sequence to be passed into

greater than pivot

sort-parts again when subsequent sorted items are

pivot

needed.

(1 (2 4 3))

You can see a snapshot of the work list for the

partition

function call (qsort [2 1 4 3]) in figure 6.4, at an

less than pivot

intermediate point in its process.

The figure includes the characteristics of a stan-

Figure 6.4 The qsort function

dard quicksort implementation, and you can run it

shown earlier would use a structure

like this for its work list when

to see that the final sequence is sorted:

sorting the vector [2 1 4 3]. Note

that all the parts described by a

(qsort [2 1 4 3])

standard quicksort implementation

;=> (1 2 3 4)

are represented here.

(qsort (nom 20))

;=> (0 2 3 5 6 7 7 8 9 10 11 11 11 12 12 13 14 16 17 19)

The implementation of the sort-parts function works to provide an incremental

solution for lazy quicksort. This incremental approach stands in opposition to a

monolithic approach (Okasaki 1996) defined by its performance of the entire calcula-

tion when any segment of the sequence is accessed. For example, grabbing the first

element in a lazy sequence returned from qsort will perform only the necessary calcu-

lations required to get that first item:

(first (qsort (nom 100)))

;=> 1

Of course, the number returned here will likely be different in your REPL, but the

underlying structure of the lazy sequence used internally by sort-parts will be similar

to that shown in figure 6.5.

The lazy qsort will be able to gather the first element because it only takes some

small subset of comparisons to gather the numbers into left-side smaller and right-side

(5 3 1 7 4 2 8 6)

A

Figure 6.5 Internal structure of

qsort. Each filter and remove

(3 1 4 2) 5 (7 8 6)

lazily returns items from its parent

sequence only as required. So to

return the first two items of the seq

B

returned by qsort, no remove steps

are required from either level A or B.

(1 2) 3 (4) 5 (6) 7 (8)

To generate the sequence (4), a

single remove step at level B would

pivot

be needed to eliminate everything less

(filter #(< % 3)

(remove #(< % 7)

(filter #(< % 5)

(remove #(< % 5)

filter

than 3. As more items are forced from

xs)

xs)

the seq returned by qsort, more of

remove

the internal filter and remove

steps will be run.

Download from Wow! eBook <www.wowebook.com>

124

CHAPTER 6 Being lazy and set in your ways

larger partitions and sort those smaller pieces only. The characteristic of the quicksort

algorithm is especially conducive to laziness, because it’s fairly cheap to make and

shuffle partitions where those with a smaller magnitude can be shuffled first. What

then are the benefits of a lazy, tail-recursive, incremental quicksort? The answer is that

you can take sorted portions of a large sequence without having to pay the cost of sort-

ing its entirety, as the following command hints:

(take 10 (qsort (nom 10000)))

;=> (0 0 0 4 4 7 7 8 9 9)

On our machines, this command required roughly 11,000 comparisons, which for all

intents and purposes is an O(n) operation—an order of magnitude less than quick-

sorts’s best case. Bear in mind that as the take value gets closer to the number of