likely initial reaction...
5.1.1
“You keep using that word. I do not think
it means what you think it means.”
Although storage to disk may be the more common meaning of persistent today, Clo-
jure uses an older meaning of the word having to do with immutable in-memory col-
lections with specific properties. In particular, a persistent collection in Clojure allows
you to preserve historical versions (Okasaki 1999) of its state, and promises that all
versions will have the same update and lookup complexity guarantees. The specific
guarantees depend on the collection type, and we’ll cover those details along with
each kind of collection.
Here you can see the difference between a persistent data structure and one that’s
not by using a Java array:
(def ds (into-array [:willie :barnabas :adam]))
(seq ds)
;=> (:willie :barnabas :adam)
What we’ve done is create a three-element array of keywords and used seq to produce
an object that displays nicely in the REPL. Any change to the array ds happens in-
place, thus obliterating any historical version:
(aset ds 1 :quentin)
;=> :quentin
(seq ds)
;=> (:willie :quentin :adam)
But using one of Clojure’s persistent data structures paints a different picture:
Download from Wow! eBook <www.wowebook.com>
78
CHAPTER 5 Composite data types
(def ds [:willie :barnabas :adam])
ds
;=> [:willie :barnabas :adam]
(def ds1 (replace {:barnabas :quentin} ds))
ds
;=> [:willie :barnabas :adam]
ds1
;=> [:willie :quentin :adam]
The original vector ds did not change on the replacement of the keyword :barnabas
but instead created another vector with the changed value. A natural concern when
confronted with this picture of persistence is that a naive implementation would copy
the whole collection on each change, leading to slow operations and poor use of
memory. Clojure’s implementations (Bagwell 2001) are instead efficient by sharing
structural elements from one version of a persistent structure to another. This may
seem magical, but we’ll demystify it in the next chapter. For now it’s sufficient to
understand that each instance of a collection is immutable and efficient. This fact
opens numerous possibilities that wouldn’t work for standard mutable collections.
One of these is the sequence abstraction.
5.1.2
Sequence terms and what they mean
It is better to have 100 functions operate on one data abstraction than 10
functions on 10 data structures.
—Rich Hickey
The words sequential, sequence, and seq don’t sound very different from each other, but
they mean specific things in Clojure. We’ll start with specific definitions of each term
to help you tell them apart, and then go into a bit of detail about how they relate to
equality partitions and the sequence abstraction.
TERMS
A sequential collection is one that holds a series of values without reordering them. As
such it’s one of three broad categories of collection types, which we discuss in the next
subsection.
A sequence is a sequential collection that represents a series of values that may or
may not exist yet. They may be values from a concrete collection or values that are
computed as necessary. A sequence may also be empty.
Clojure has a simple API called seq for navigating collections. It consist of two func-
tions: first and rest. If the collection has anything in it, (first coll) returns the
first element; otherwise it returns nil. (rest coll) returns a sequence of the items
other than the first. If there are no other items, rest returns an empty sequence and
never nil. Functions that promise to return sequences, such as map and filter, work
the same way as rest. A seq is any object that implements the seq API, thereby support-
ing the functions first and rest. You might consider it an immutable variant of an
enumerator or iterator.
Download from Wow! eBook <www.wowebook.com>
Persistence, sequences, and complexity
79
There’s also a function called seq that accepts a wide variety of collection-like
objects. Some collections, such as lists, implement the seq API directly, so calling seq
on them returns the collection itself. More often, calling seq on a collection returns a
new seq object for navigating that collection. In either case, if the collection is empty,
seq returns nil and never an empty sequence. Functions that promise to return seqs
(not sequences), such as next, work the same way.
Clojure’s sequence library manipulates collections, strings, arrays, and so on as if
they were sequences, using the seq function and seq API.
BEWARE TYPE-BASED PREDICATES Clojure includes a few predicates with
names like the words just defined. Though they’re not frequently used, it
seems worth mentioning that they may not mean exactly what the definitions
here might suggest. For example, every object for which sequential? returns
true is a sequential collection, but it returns false for some that are also
sequential. This is because of implementation details that may be improved
sometime after Clojure 1.2.
EQUALITY PARTITIONS
Clojure classifies each composite data type into one of three logical categories or par-
titions: sequentials, maps, and sets. These divisions draw clear distinctions between
the types and help define equality semantics. Specifically, two objects will never be
equal if they belong to different partitions. Few composite types are actually sequences,
though several such as vectors are sequential.
If two sequentials have the same values in the same order, = will return true for
them, even if their concrete types are different, as shown:
(= [1 2 3] '(1 2 3))
;=> true
Conversely, even if two collections have the same values in the same order, if one is a
sequential collection and the other isn’t, = will return false, as shown here:
(= [1 2 3] #{1 2 3})
;=> false
Examples of things that are sequential include Clojure lists and vectors, and Java lists
such as java.util.ArrayList. In fact everything that implements java.util.List is
included in the sequential partition.
Generally things that fall into the other partitions include set or map in their name
and so are easy to identify.
THE SEQUENCE ABSTRACTION
Many Lisps build their data types (McCarthy 1962) on the cons-cell abstraction, an
elegant two-element structure illustrated in figure 5.1.
Clojure also has a couple of cons-cell-like structures that are covered in section 5.4,
but they’re not central to Clojure’s design. Instead, the conceptual interface fulfilled
by the cons-cell has been lifted off the concrete structure illustrated previously and
Download from Wow! eBook <www.wowebook.com>
80
CHAPTER 5 Composite data types
Figure 5.1 Each cons-cell is a simple pair, a car and a cdr. A. A list with two cells,
each of which has a value X and Y as the head (the car in Lisp terminology) and a list
as the tail (the cdr). This is very similar to first and rest in Clojure sequences.
B. A cons-cell with a simple value for both the head and tail. This is called a dotted
pair but is not supported by any of Clojure's built in types.
been named sequence. All an object needs to do to be a sequence is to support the two
core functions: first and rest. This isn’t much, but it’s all that's required for the
bulk of Clojure’s powerful library of sequence functions and macros to be able to
operate on the collection: filter, map, for, doseq, take, partition, the list goes on.
At the same time, a wide variety of objects satisfy this interface. Every Clojure col-
lection provides at least one kind of seq object for walking through its contents,
exposed via the seq function. Some collections provide more than one; for example
vectors support rseq and maps support the functions keys and vals. All of these func-
tions return a seq, or if the collection is empty, nil.
You can see examples of this by looking at the types of objects returned by various
expressions. Here’s the map class:
(class (hash-map :a 1))
;=> clojure.lang.PersistentHashMap
Unsurprisingly, the hash-map function returns an object of type PersistentHashMap.
Passing that map object to seq returns an entirely new kind of object:
(seq (hash-map :a 1))
;=> ([:a 1])
(class (seq (hash-map :a 1)))
;=> clojure.lang.PersistentHashMap$NodeSeq
This class name suggests it’s a seq of nodes on a hash map. Similarly we can get a seq
of keys on the same map:
(seq (keys (hash-map :a 1)))
;=> (:a)
(class (keys (hash-map :a 1)))
;=> clojure.lang.APersistentMap$KeySeq
Note that these specific class names are an implementation detail that may change in
the future, but the concepts they embody are central to Clojure and unlikely to
change.
Having laid the foundation for a deeper dive into the sequence abstraction, we
now must quickly diverge into a simplified discussion of asymptotic complexity and
Big-O notation. If you’re already comfortable with these topics then by all means skip
forward to section 5.2. If you need a refresher or an overview, then the next section is
a minimalist introduction (Cormen 2009) to the topic.
Download from Wow! eBook <www.wowebook.com>
Persistence, sequences, and complexity
81
5.1.3
Big-O
This book isn’t heavily focused on asymptotic complexity but we do mention it a hand-
ful of times throughout, so here we’ll cover the minimum required for understanding
these few mentions. You may have gone your entire career without having to under-
stand Big-O notation, and you may likely go the remainder similarly. But that’s no rea-
son not to learn more, and a bit of understanding about Big-O and its implications
will go a long way toward helping you in choosing between Clojure collections, as well
as to design and analyze algorithms in general.
Algorithmic complexity is a system for describing the relative space and time costs
for algorithms. Typically the complexity of an algorithm is described using what’s
known as Big-O notation. For example, you may have heard that finding an element
in a linked list is O(n), which is read as “order n.” What this means is that if you have a
list (:a :b :c) of length 3, then to verify that the keyword :c is in that list requires
three comparisons. This highlights the worst case of list access because :c is at the end
of the list, but we don’t worry too much about the worst-case scenario unless that’s the
only difference between two algorithms. On the other hand, to verify that :a is in the
same list is O(1), which is read as constant time. Finding :a represents the best case for
list access because it’s at the front of the list. Rarely do your lists always look exactly
like our example, and therefore you shouldn’t build your hopes that elements will
always be at the front. Therefore, in analyzing algorithms you rarely care about the
best-case scenario because it’s too rare to matter much. What you really care about
when analyzing algorithms is the expected case, or what you’d likely see in practice.
When looking at a few million runs of verifying that some value is contained in a mil-
lion different lists, you’d inevitably see that the average number of comparisons
required approaches whatever the length of a list was, divided by two. But because
doubling the length of the list would also double the number of comparisons done in
both the expected and worst case, they’re all grouped into the same Big-O category:
O(n) also known as linear time.
Thus two algorithms that are in the same Big-O category may perform very differ-
ently, especially on small work loads. This makes the most difference when there’s a
large constant factor, work that the algorithm has to do up front regardless of the size of
the work load.
When the work load is small, an O(1) algorithm with a large constant factor may
be more costly than an O(n) algorithm that’s without extra costs. But as the work load
increases, an O(1) algorithm will always overtake the O(n) algorithm as shown in fig-
ure 5.2. Big-O doesn’t concern itself with these constant factors or small work loads.
When learning about Clojure’s persistent data structures, you’re likely to hear the
term O(log32 n) for those based on the persistent hash trie and O(log2 n) for the
sorted structures. Accessing an element in a Clojure persistent structure by index is
O(log n), or logarithmic. Logarithmic complexity describes a class of algorithms that
are effectively immune from large changes in the size of their data. In the case of Clo-
jure’s persistent structures, what this means is that there’s little difference in “hops”
(such as comparisons) between locating an element in a structure containing 100
Download from Wow! eBook <www.wowebook.com>



82
CHAPTER 5 Composite data types
Figure 5.2 Overtaking the smaller. In Big-O, regardless of the other ancillary costs, the higher order
of magnitude will always overtake the lower eventually.
elements or 1 million elements. In practice you may notice some difference because
for a billion objects O(log2 n) would require approximately 30 comparisons for a
lookup, whereas O(log32 n) would require only about 6. Given the smaller number of
operations required for the O(log32 n) data structures, they can be viewed as provid-
ing a nearly O(1) lookup and update.
We’ve covered the basic ideas behind persistence and the sequence abstraction,
and even touched on the basics of Big-O notation. Now we’ll discuss all of Clojure’s
primary collection types and how these concepts apply to each, starting with vectors.
5.2
Vectors: creating and using them in all their varieties
Vectors store zero or more values sequentially indexed by number, a bit like arrays,
but are immutable and persistent. They’re versatile and make efficient use of memory
and processor resources at both small and large sizes.
Vectors are probably the most frequently used collection type in Clojure code.
They’re used as literals for argument lists and let bindings, for holding large amounts
of application data, and as stacks and as map entries. We’ll also address the efficiency
considerations including growing on the right end, subvectors, and reversals, and
finally discuss where vectors aren’t an optimal solution.
5.2.1
Building vectors
The vector’s literal square-bracket syntax is one reason you might choose to use a vec-
tor over a list. For example, the let form would work perfectly well, and with a nearly
identical implementation, if it took a literal list of bindings instead of a literal vector.
But the square brackets are visually different from the round parentheses surround-
ing the let form itself as well as the likely function calls in the body of the let form,
and this is useful for humans (so we hear). Using vectors to indicate bindings for let,
with-open, fn, and such is idiomatic in Clojure and is a pattern you’re encouraged to
follow in any similar macros you write.
Download from Wow! eBook <www.wowebook.com>
Vectors: creating and using them in all their varieties
83
The most common way to create a vector is with the literal syntax described earlier.
But in many cases you’ll want to create a vector out of the contents of some other kind
of collection. For this there’s the function vec:
(vec (range 10))
;=> [0 1 2 3 4 5 6 7 8 9]
If you already have a vector but want to “pour” several values into it, then into is your
friend:
(let [my-vector [:a :b :c]]
(into my-vector (range 10)))
;=> [:a :b :c 0 1 2 3 4 5 6 7 8 9]
If you want it to return a vector, the first argument to into must be a vector. The sec-
ond arg can be any sequence, such as what range returns, or anything else that works
with seq function. You can view the operation of into as similar to a O(n) concatena-
tion based on the size of the second argument.1 Clojure also provides a vector func-
tion to build a vector from its arguments, which is handy for constructs like (map
vector a b).
PRIMITIVE VECTORS
Clojure can store primitive types inside of vectors using the vector-of function,
which takes any of :int, :long, :float, :double, :byte, :short, :boolean, or :char
as its argument and returns an empty vector. This returned vector will act just like any
other vector, except that it’ll store its contents as primitives internally. All of the nor-
mal vector operations still apply, and the new vector will attempt to coerce any addi-
tions into its internal type when being added:
(into (vector-of :int) [Math/PI 2 1.3])
;=> [3 2 1]
(into (vector-of :char) [100 101 102])
;=> [\d \e \f]
(into (vector-of :int) [1 2 623876371267813267326786327863])
; java.lang.IllegalArgumentException: Value out of range for int:
-8359803716404783817
In addition, all caveats mentioned in section 4.1 regarding overflow, underflow, and
so forth also apply to vectors of primitives.
Using vec and into, it’s easy to build vectors much larger than are conveniently
built using vector literals. But once you have a large vector like that, what are you
going to do with it?
5.2.2
Large vectors
When collections are small, the performance differences between vectors and lists
hardly matters at all. But as both get larger, each becomes dramatically slower at oper-
ations the other can still perform efficiently. Vectors are particularly efficient at three
things relative to lists: adding or removing things from the right end of the collection,
1 Vectors can’t be concatenated any more efficiently than O(n).
Download from Wow! eBook <www.wowebook.com>
84
CHAPTER 5 Composite data types
accessing or changing items in the interior of the collection by numeric index, and
walking in reverse order. Adding and removing from the end is done by treating the
vector as a stack—we’ll cover that later.
Any item in a vector can be accessed by its index number from 0 up to but not
including (count my-vector) in essentially constant time.2 You can do this using the
function nth; the function get, essentially treating the vector like a map; or by invoking
the vector itself as a function. Look at each of these as applied to this example vector:
(def a-to-j (vec (map char (range 65 75))))
a-to-j
;=> [\A \B \C \D \E \F \G \H \I \J]
All three of these do the same work and each returns \E:
(nth a-to-j 4)
(get a-to-j 4)
(a-to-j 4)
Which to use is a judgment call, but table 5.1 highlights some points you might con-
sider when choosing.
Table 5.1 Vector lookup options: the three ways to look up an item in a vector and how each responds
to different exceptional circumstances
nth
get
Vector as a function
If the vector is nil
Returns nil
Returns nil
Throws an exception
If the index is out
Returns “not found” or
Returns nil
Throws an exception
of range
throws exception
Supports a
Yes
Yes
No
“not found” arg
(nth [] 9 :whoops)
(get [] 9 :whoops)
Because vectors are indexed, they can be efficiently walked in either direction, left-to-
right or right-to-left. The seq and rseq functions return sequences that do exactly that:
(seq a-to-j)
;=> (\A \B \C \D \E \F \G \H \I \J)
(rseq a-to-j)
;=> (\J \I \H \G \F \E \D \C \B \A)
Any item in a vector can be “changed” using the assoc function. Clojure does this in
essentially constant time using structural sharing between the old and new vectors as
described at the beginning of this chapter:
(assoc a-to-j 4 "no longer E")
;=> [\A \B \C \D "no longer E" \F \G \H \I \J]
The assoc function for vectors only works on indices that already exist in the vector,
or as a special case, exactly one step past the end. In this case, the returned vector will
2 Several operations on Clojure’s persistent data structures are described in this book as “essentially constant
time.” In all cases these are O(log32 n).
Download from Wow! eBook <www.wowebook.com>
Vectors: creating and using them in all their varieties
85
be one item larger than the input vector. More frequently vectors are “grown” using
the conj function as you’ll see in the next section.
There are a few higher-powered functions provided that use assoc internally. For
example, the replace function works on both seqs and vectors, but when given a vec-
tor, it uses assoc to fix up and return a new vector:
(replace {2 :a, 4 :b} [1 2 3 2 3 4])
;=> [1 :a 3 :a 3 :b]
The functions assoc-in and update-in are for working with nested structures of vec-
tors and/or maps, like this one:3
(def matrix
[[1 2 3]
[4 5 6]
[7 8 9]])
All of assoc-in, get-in, and update-in take a series of indices to pick items from
each more deeply nested level. For a vector arranged like the earlier matrix example,
this amounts to row and column coordinates:
(get-in matrix [1 2])
;=> 6
(assoc-in matrix [1 2] 'x)
;=> [[1 2 3] [4 5 x] [7 8 9]]
The update-in function works the same way, but instead of taking a value to overwrite
an existing value, it takes a function to apply to an existing value. It’ll replace the value
at the given coordinates with the return value of the function given:
(update-in matrix [1 2] * 100)
;=> [[1 2 3] [4 5 600] [7 8 9]]
The coordinates refer to the value 6, and the function given here is * taking an argu-
ment 100, so the slot becomes the return value of (* 6 100). There’s also a function
get-in for retrieving a value in a nested vector. Before exploring its operation, we’ll
create a function neighbors in listing 5.1 that given a y-x location in an equilateral 2D
matrix, returns a sequence of the locations surrounding it.
Listing 5.1 A function for finding the neighbors of a spot on a 2D matrix
(defn neighbors
([size yx] (neighbors [[-1 0] [1 0] [0 -1] [0 1]] size yx))
([deltas size yx]
(filter (fn [new-yx]
(every? #(< -1 % size) new-yx))
(map #(map + yx %) deltas))))
3
Nested vectors are far from the most efficient way to store or process matrices, but they’re convenient to
manipulate in Clojure and so make a good example here. More efficient options include a single vector,
arrays, or a library for matrix processing such as Colt or Incanter at http://incanter.org.
Download from Wow! eBook <www.wowebook.com>
86
CHAPTER 5 Composite data types
The operation of neighbors is fairly straightforward. The deltas local describes that a
neighbor can be one spot away, but only along the x or y axis (not diagonal). The
function first walks through deltas and builds a vector of each added to the yx point
provided. This operation will of course generate illegal point coordinates, so those are
then removed using filter, which checks to ensure that the indices lie between -1
and the provided size. You can test this function using get-in as follows:
(map #(get-in matrix %) (neighbors 3 [0 0]))
;=> (4 2)
For each neighbor coordinate returned from neighbors, we use get-in to retrieve
the value at that point. Indeed the position [0 0] corresponding to the value 1 has
the neighboring values 4 and 2. We’ll use neighbors again before this book comes to
an end, but next we’ll look at growing and shrinking vectors—treating them like
stacks.
5.2.3
Vectors as stacks
Classic stacks have at least two operations, push and pop, and with respect to Clojure
vectors these operations are called conj and pop respectively. The conj function adds
elements to and pop removes elements from the right side of the stack. Because vec-
tors are immutable, pop returns a new vector with the rightmost item dropped—this is
different from many mutable stack APIs, which generally return the dropped item.
Consequently, peek becomes more important as the primary way to get an item from
the top of the stack:
(def my-stack [1 2 3])
(peek my-stack)
;=> 3
(pop my-stack)
;=> [1 2]
(conj my-stack 4)
;=> [1 2 3 4]
(+ (peek my-stack) (peek (pop my-stack)))
;=> 5
Each of these operations completes in essentially constant time. Most of the time, a
vector that’s used as a stack is used that way throughout its life. It’s helpful to future
readers of your code to keep this is mind and use the stack operations consistently,
even when other functions might work. For example, last on a vector returns the
same thing as peek, but besides being slower, it leads to unnecessary confusion about
how the collection is being used. If the algorithm involved calls for a stack, use conj
not assoc for growing the vector, peek not last, and pop not dissoc for shrinking it.
The functions conj, pop, and peek work on any object that implements clojure.
lang.IPersistentStack.4 Besides vectors, Clojure lists also implement this interface,
4 The conj function also works with all of Clojure’s other persistent collection types, even if they don’t imple-
ment clojure.lang.IPersistentStack.
Download from Wow! eBook <www.wowebook.com>
Vectors: creating and using them in all their varieties
87
but the functions operate on the left side of lists instead of the right side as with vec-
tors. When operating on either via the stack discipline, it’s best to ignore the ordering,
because it tends to just add confusion.
5.2.4
Using vectors instead of reverse
The ability of vectors to grow efficiently on the right side and then be walked left-to-
right produces a noteworthy emergent behavior: idiomatic Clojure code rarely uses
the reverse function. This is different from most Lisps and schemes. When process-
ing a list, it’s pretty common to want to produce a new list in the same order. But if all
you have are classic Lisp lists, often the most natural algorithm5 leaves you with a back-
ward list that needs to be reversed. Here’s an example of a function similar to Clo-
jure’s map
(defn strict-map1 [f coll]
(loop [coll coll, acc nil]
(if (empty? coll)
(reverse acc)
(recur (next coll) (cons (f (first coll)) acc)))))
(strict-map1 - (range 5))
;=> (0 -1 -2 -3 -4)
This is perfectly good, idiomatic Clojure code, except for that glaring reverse of the
final return value. After the entire list has been walked once to produce the desired
values, reverse walks it again to get them in the right order. This is both inefficient
and nonidiomatic. One way to get rid of the reverse is to use a vector instead of a list
as the accumulator:
(defn strict-map2 [f coll]
(loop [coll coll, acc []]
(if (empty? coll)
acc
(recur (next coll) (conj acc (f (first coll)))))))
(strict-map2 - (range 5))
;=> [0 -1 -2 -3 -4]
A small change, but the code is now a touch cleaner and a bit faster. It does return a
vector instead of a list, but this is rarely a problem, because any client code that wants
to treat this as a seq can usually do so automatically.6
The examples we’ve shown so far have all been plain vectors, but we’ll turn now to
the special features of some other vector types, starting with subvectors.
5
...the most natural tail-recursive algorithm anyway.
6
Another way to get rid of a reverse is to build a lazy sequence instead of a strict collection; this is how Clo-
jure’s own map function is implemented.
Download from Wow! eBook <www.wowebook.com>
88
CHAPTER 5 Composite data types
5.2.5
Subvectors
Although items can’t be removed efficiently from a vector (except the rightmost
item), subvectors provide a fast way to take a slice of an existing vector based on start
and end indices created using the subvec function:
(subvec a-to-j 3 6)
;=> [\D \E \F]
The first index given to subvec is inclusive (starts at index 3) but the second is exclu-
sive (ends before index 6). The new subvector internally hangs onto the entire original
a-to-j vector, making each lookup performed on the new vector cause the subvector
to do a little offset math and then look it up in the original. This makes creating a sub-
vector fast. You can use subvec on any kind of vector and it’ll work fine. But there’s
special logic for taking a subvec of a subvec, in which case the newest subvector keeps
a reference to the original vector, not the intermediate subvector. This prevents
subvectors-of-subvectors from stacking up needlessly, and keeps both the creation and
use of the sub-subvecs fast and efficient.
5.2.6
Vectors as MapEntries
Clojure’s hash map, just like hash tables or dictionaries in many other languages, has a
mechanism to iterate through the entire collection. Clojure’s solution for this iterator
is, unsurprisingly, a seq. Each item of this seq needs to include both the key and the
value, so they’re wrapped in a MapEntry. When printed, each entry looks like a vector:
(first {:width 10, :height 20, :depth 15})
;=> [:width 10]
But not only does a MapEntry look like a vector, it really is one:
(vector? (first {:width 10, :height 20, :depth 15}))
;=> true
This means you can use all the regular vector functions on it: conj, get, and so on. It
even supports destructuring, which can be handy. For example, the following locals
dimension and amount will take on the value of each key/value pair in turn:
(doseq [[dimension amount] {:width 10, :height 20, :depth 15}]
(println (str (name dimension) ":") amount "inches"))
; width: 10 inches
; height: 20 inches
; depth: 15 inches
;=> nil
A MapEntry is its own type and has two functions for retrieving its contents: key and
val, which do exactly the same thing as (nth my-map 0) and (nth my-map 1), respec-
tively. These are sometimes useful for the clarity they can bring to your code, but fre-
quently destructuring is used instead, because it’s so darned handy.
Download from Wow! eBook <www.wowebook.com>
Vectors: creating and using them in all their varieties
89
So now you know what vectors are, what specific kinds of vectors are included in Clo-
jure, and some of the things that they’re good at doing. To round out your understand-
ing of vectors, we’ll conclude with a brief look at things that vectors are bad at doing.
5.2.7
What vectors aren’t
Vectors are versatile, but there are some commonly desired patterns where they might
seem like a good solution but in fact aren’t. Though we prefer to focus on the positive,
we hope a few negative examples will help you escape from using the wrong tool for
the job.
VECTORS AREN’T SPARSE
If you have a vector of length n, the only position where you can insert a value is at
index n —appending to the far right end. You can’t skip some indices and insert at a
higher index number. If you want a collection indexed by nonsequential numbers,
consider a hash map or sorted map. Although you can replace values within a vector,
you can’t insert or delete items such that indices for the subsequent items would have
to be adjusted. Clojure doesn’t currently have a native persistent collection that sup-
ports this kind of operation, but a possible future addition, finger trees, may help for
these use cases.
VECTORS AREN’T QUEUES
Some people have tried to use vectors as queues. One approach would be to push
onto the right end of the vector using conj and then to pop items off the left using
rest or next. The problem with this is that rest and next return seqs, not vectors, so
subsequent conj operations wouldn’t behave as desired. Using into to convert the seq
back into a vector is O(n), which is less than ideal for every pop.
Another approach is to use subvec as a “pop,” leaving off the leftmost item.
Because subvec does return a vector, subsequent conj operations will push onto the
right end as desired. But as described earlier, subvec maintains a reference to the
entire underlying vector, so none of the items being popped this way will ever be gar-
bage collected. Also less than ideal.
So what would be the ideal way to do queue operations on a persistent collection?
Why, use a PersistentQueue, of course. See section 5.5 for details.
VECTORS AREN’T SETS
If you want to find out whether a vector contains a particular value, you might be
tempted to use the contains? function, but you’d be disappointed by the results. Clo-
jure’s contains? is for asking whether a particular key, not value, is in a collection,
which is rarely useful for a vector.
In this section we showed how to create vectors using literal syntax or by building
them up programmatically. We looked at how to push them, pop them, and slice
them. We also looked at some of the things vectors can’t do well. One of these was
adding and removing items from the left side; though vectors can’t do this, lists can,
which we’ll discuss next.
Download from Wow! eBook <www.wowebook.com>
90
CHAPTER 5 Composite data types
5.3
Lists: Clojure’s code form data structure
Clojure’s PersistentLists are by far the simplest of Clojure’s persistent collection types.
A PersistentList is a singly linked list where each node knows its distance from the end.
List elements can only be found by starting with the first element and walking each
prior node in order, and can only be added or removed from the left end.
In idiomatic Clojure code, lists are used almost exclusively to represent code
forms. They’re used literally in code to call functions, macros, and so forth as we’ll dis-
cuss shortly. Code forms are also built programmatically to then be evaled or used as
the return value for a macro. If the final usage of a collection isn’t as Clojure code,
lists rarely offer any value over vectors and are thus rarely used. But lists have rich her-
itage in Lisps so we’ll discuss when they should be used in Clojure, and also when they
shouldn’t—situations in which there are now better options.
5.3.1
Lists like Lisps like
All flavors of Lisp have lists that they like to use, and Clojure lists, already introduced
in chapter 2, are similar enough to be familiar. The functions have different names,
but what other Lisps call car is the same as first on a Clojure list. Similarly cdr is the
same as next. But there are substantial differences as well. Perhaps the most surpris-
ing is the behavior of cons. Both cons and conj add something to the front of a list,
but their arguments in a different order from each other:
(cons 1 '(2 3))
;=> (1 2 3)
(conj '(2 3) 1)
;=> (1 2 3)
In a departure from classic Lisps, the “right” way to add to the front of a list is with
conj. For each concrete type, conj will add elements in the most efficient way, and for
lists this means at the left side. Additionally, a list built using conj is homogeneous—
all the objects on its next chain are guaranteed to be lists, whereas sequences built
with cons only promise that the result will be some kind of seq. So you can use cons to
add to the front of a lazy seq, a range, or any other type of seq, but the only way to get
a bigger list is to use conj.7 Either way, the next part has to be some kind of sequence,
which points out another difference from other Lisps: Clojure has no “dotted pair.” If
you don’t know what that is, don’t worry about it. All you need to know is that if you
want a simple pair in a Clojure program, use a vector of two items.
All seqs print with rounded parentheses, but this does not mean they’re the same
type or will behave the same way. For example many of these seq types don’t know their
own size the way lists do, so calling count on them may be O(n) instead of O(1).8 An
7
Or to conj or cons onto nil. This is a special case, because nil isn’t the same as an empty collection of any
specific type. Clojure could have just left this unsupported, perhaps throwing an exception if you did (cons 1
nil), but instead it provides a reasonable default behavior: building a list one item long.
8
You can test for this property of being countable in constant time using the counted? function. For example
(counted? (range 10)) returns true in Clojure 1.0, but false in 1.1 because the implementation of
range changed between those versions and no longer provided O(1) counting.
Download from Wow! eBook <www.wowebook.com>
How to use persistent queues
91
unsurprising difference between lists in Clojure versus other Lisps is that they’re
immutable. At least that had better not be surprising anymore. Changing values within
a list is generally discouraged in other Lisps anyway, but in Clojure it’s impossible.
5.3.2
Lists as stacks
Lists in all Lisps can be used as stacks, but Clojure goes further by supporting the
IPersistentStack interface. This means you can use the functions peek and pop to do
roughly the same thing as first and next. Two details are worth noting. One is that
next and rest are legal on an empty list, but pop throws an exception. The other is that
next on a one-item list returns nil, whereas rest and pop both return an empty list.
When you want a stack, the choice between using a list versus a vector is a some-
what subtle decision. Their memory organization is quite different, so it may be worth
testing your usage to see which performs better. Also, the order of values returned by
seq on a list is backward compared to seq on a vector, and in rare cases this can point
to one or the other as the best solution. In the end, it may come down primarily to
personal taste.
5.3.3
What lists aren’t
Probably the most common misuse of lists is to hold items that will be looked up by
index. Though you can use nth to get the 42nd (or any other) item from a list, Clo-
jure will have to walk the list from the beginning to find it. Don’t do that. In fact, this
is a practical reason why lists can’t be used as functions, as in ((list :a) 0). Vectors
are good at looking things up by index, so use one of those instead.
Lists are also not sets. All the reasons we gave in the previous section for why it’s a
bad idea to frequently search a vector looking for a particular value apply to lists as
well. Even moreso since contains? will always return false for a list. See the section on
sets later in this chapter instead.
Finally, lists aren’t queues. You can add items to one end of a list, but you can’t
remove things from the other end. So what should you use when you need a queue?
Funny you should ask...
5.4
How to use persistent queues
We mentioned in section 5.2 that new Clojure developers often attempt to implement
simple queues using vectors. Though this is possible, such an implementation leaves
much to be desired. Instead, Clojure provides a persistent immutable queue that will
serve all your queueing needs. In this section we’ll touch on the usage of the
PersistentQueue class, where its first-in-first-out (FIFO) queueing discipline (Knuth
1997) is described by conj adding to the rear, pop removing from the front, and peek
returning the front element without removal.
Before going further, it’s important to point out that Clojure’s PersistentQueue is
a collection, not a workflow mechanism. Java has classes deriving from the
java.util.concurrent.BlockingQueue interface for workflow, which often are use-
ful in Clojure programs, and those aren’t these. If you find yourself wanting to
Download from Wow! eBook <www.wowebook.com>
92
CHAPTER 5 Composite data types
repeatedly check a work queue to see if there’s an item of work to be popped off, or if
you want to use a queue to send a task to another thread, you do not want the
PersistentQueue discussed in this section.
5.4.1
A queue about nothing
Search all you like, but the current implementation of Clojure doesn’t provide9 a core
construction function for creating persistent queues. That being the case, how would
you go about creating a queue? The answer is that there’s a readily available empty
queue instance to use, clojure.lang.PersistentQueue/EMPTY. The printed represen-
tation for Clojure’s queues isn’t incredibly informative, but you can change that by
providing a method for them on the print-method multimethod, as shown:
(defmethod print-method clojure.lang.PersistentQueue
[q, w]
(print-method '<- w) (print-method (seq q) w) (print-method '-< w))
clojure.lang.PersistentQueue/EMPTY
;=> <-nil-<
Using print-method in this way is a convenient mechanism for printing types in logi-
cal ways, as we did earlier with the queue-fish that’s not only fun, but indicates an
direction of flow for conj and pop.
You might think that popping an empty queue would raise an exception, but the
fact is that this action results in just another empty queue. Likewise, peeking an empty
queue will return nil. Not breathtaking for sure, but this behavior helps to ensure
that queues work in place of other sequences. In fact, the functions first, rest, and
next also work on queues and give the results that you might expect, though rest and
next return seqs not queues. Therefore, if you’re using a queue as a queue, it’s best to
use the functions designed for this purpose: peek, pop, and conj.
5.4.2
Putting things on
The mechanism for adding elements to a queue is conj:
(def schedule
(conj clojure.lang.PersistentQueue/EMPTY
:wake-up :shower :brush-teeth))
;=> <-(:wake-up :shower :brush-teeth)-<
Clojure’s persistent queue is implemented internally using two separate collections,
the front being a seq and the rear being a vector, as shown in figure 5.3.
All insertions occur in the rear vector and all removals occur in the front seq, tak-
ing advantage of each collection’s strength. When all the items from the front list have
9
The Clojure core language grows carefully, tending to incorporate only features that have proven useful.
Queues currently stand at the edge of this growth, meaning that there might be more support for them in the
future. Unlike the other collections in this chapter, the code you write with queues might be rendered non-
idiomatic by future improvements.
Download from Wow! eBook <www.wowebook.com>
How to use persistent queues
93
Figure 5.3 The two
collections used internally
in a single queue. peek
returns the front item of
the seq, pop returns a new
queue with the front of the
seq left off, and conj adds
a new item to the back of
the vector.
been popped, the back vector is wrapped in a seq to become the new front, and an
empty vector is used as the new back. Typically, an immutable queue such as this is
implemented with the rear as a list in reverse order, because insertion to the front of a
list is an efficient operation. But using a Clojure vector eliminates the need for a
reversed list.
5.4.3
Getting things
Clojure provides the peek function to get the front element in a queue:
(peek schedule)
;=> :wake-up
The fact that performing peek doesn’t modify the contents of a persistent queue
should be no surprise by now.
5.4.4
Taking things off
To “remove” elements from the front of a queue, use the pop function and not rest:
(pop schedule)
;=> <-(:shower :brush-teeth)-<
(rest schedule)
;=> (:shower :brush-teeth)
Although rest returns something with the same values and even prints the same as
what pop returns, the former is a seq not a queue. This is potentially the source of sub-
tle bugs, because subsequent attempts to use conj on it won’t preserve the speed guar-
antees of the queue type and the queue functions pop peek and conj won’t behave as
expected.
We’ve talked numerous times in this chapter about the sequence abstraction, and
though it’s an important consideration, it shouldn’t always be used. Instead, it’s
important to know your data structures, their sweet spots, and idiomatic operations.
By doing so, you can write code that’s specialized in ways that leverage the perfor-
mance characteristics you need for a given problem space. Clojure’s persistent queues
illustrate this fact perfectly. To further highlight this point, we’ll now explore Clojure’s
set type.
Download from Wow! eBook <www.wowebook.com>
94
CHAPTER 5 Composite data types
5.5
Persistent sets
Clojure sets work the same as mathematical sets, in that they’re collections of
unsorted unique elements. In this section we’ll cover sets by explaining their strong
points, weaknesses, and idioms. We’ll also cover some of the functions from the
clojure.set namespace.
5.5.1
Basic properties of Clojure sets
Sets are functions of their elements that
return the matched element or nil:
Finding items in a sequence
using a set and some
(#{:a :b :c :d} :c)
This property of sets combines
;=> :c
with the some function to provide
(#{:a :b :c :d} :e)
an extremely useful idiom for
;=> nil
searching a seq for any of multiple
items. The
Set elements can be accessed via the
some function takes a
get func-
predicate and a sequence. It
tion, which will return the queried value if it
applies said predicate to each ele-
exists in the given set:
ment in turn, returning the first
truthy value returned by the predi-
(get #{:a 1 :b 2} :b)
cate or else nil:
;=> :b
(some #{:b} [:a 1 :b 2])
(get #{:a 1 :b 2} :nothing-doing)
;=> :b
;=> nil
(some #{1 :b} [:a 1 :b 2])
As a final point, sets, like all of Clojure’s col-
;=> 1
lections, support heterogeneous values.
Using a set as the predicate sup-
HOW CLOJURE POPULATES SETS
plied to some allows you to check
The key to understanding how Clojure sets
whether any of the truthy values in
the set are contained within the
determine which elements are discrete lies in
given sequence. This is a fre-
one simple statement. Given two elements
quently used Clojure idiom for
evaluating as equal, a set will contain only
searching for containment within a
one, independent of concrete types:
sequence.
#{[] ()}
;=> #{[]}
#{[1 2] (1 2)}
;=> #{[1 2]}
#{[] () #{} {}}
;=> #{#{} {} []}
From the first two examples, even though [] and () are of differing types, they’re con-
sidered equal because their elements are equal or in this case empty. But the last
example illustrates nicely that collections within an equality partition will always be
equal if their elements are equal, but never across partitions.
5.5.2
Keeping your sets in order with sorted-set
There’s not much to say about creating sorted sets with the sorted-set function. But
there’s a simple rule that you should bear in mind:
Download from Wow! eBook <www.wowebook.com>
Persistent sets
95
(sorted-set :b :c :a)
;=> #{:a :b :c}
(sorted-set [3 4] [1 2])
;=> #{[1 2] [3 4]}
(sorted-set :b 2 :c :a 3 1)
; java.lang.ClassCastException: clojure.lang.Keyword cannot be cast to
java.lang.Number
As long as the arguments to the sorted-set function are mutually comparable, you’ll
receive a sorted set; otherwise an exception is thrown. This can manifest itself when
dealing with sorted sets down stream from their point of creation, leading to potential
confusion:
(def my-set (sorted-set :a :b))
;; ... some time later
(conj my-set "a")
;=> java.lang.ClassCastException: clojure.lang.Keyword cannot be cast to
java.lang.String
The difficulty in finding the reason for this exception will increase as the distance
between the creation of my-set and the call to conj increases. You can adjust this rule
a bit by using sorted-set-by instead, and providing your own comparator. This works
exactly like the comparator for sorted-map-by, which we’ll cover in section 6.6.2.
Sorted maps and sorted sets are also similar in their support of subseq to allow effi-
ciently jumping to a particular key in the collection, and walking through it from
there. This is covered in section 5.6.
5.5.3
contains?
As we touched on in section 5.2, there’s sometimes confusion regarding the usage of
Clojure’s contains? function. Many newcomers to Clojure expect this function to
work the same as Java’s java.util.Collection#contains method; this assumption is
false, as shown:
(contains? #{1 2 4 3} 4)
;=> true
(contains? [1 2 4 3] 4)
;=> false
If you were to draw a false analogy between Java’s .contains methods and contains?,
then both of the function calls noted here should’ve returned true. The official
documentation for contains? describes it as a function that returns true if a given key
exists within a collection. When reading the word key, the notion of a map springs to
mind, but the fact that this function also works on sets hints at their implementation
details. Sets are implemented as maps with the same element as the key and value,10
but there’s an additional check for containment before insertion.
10 All implementation caveats apply.
Download from Wow! eBook <www.wowebook.com>

















96
CHAPTER 5 Composite data types
5.5.4
clojure.set
Mathematical sets form the basis of much of modern mathematical thought, and Clo-
jure’s basic set functions in the clojure.set namespace are a clear reflection of the
classical set operations. In this subsection we’ll briefly cover each function and talk
about how, when applicable, they differ from the mathematical model. First, we’ll start
with a simple picture.
Figure 5.4 describes the nature of Clojure’s set functions, each of which will be
shown presently. Note that Clojure’s set functions take an arbitrary number of sets and
apply the operation incrementally.
INTERSECTION
Clojure’s clojure.set/intersection function works as you might expect. Given two
sets, intersection returns a set of the common elements. Given n sets, it’ll incremen-
tally return the intersection of resulting sets and the next set, as seen in the following
code:
(clojure.set/intersection #{:humans :fruit-bats :zombies}
#{:chupacabra :zombies :humans})
;=> #{:zombies :humans}
(clojure.set/intersection #{:pez :gum :dots :skor}
#{:pez :skor :pocky}
#{:pocky :gum :skor})
;=> #{:skor}
In the first example, the resulting set is simply the common elements between the
given sets. The second example is the result of the intersection of the first two sets
then intersected with the final set.
UNION
There’s also likely no surprise when using the clojure.set/union function:
(clojure.set/union #{:humans :fruit-bats :zombies}
#{:chupacabra :zombies :humans})
;=> #{:chupacabra :fruit-bats :zombies :humans}
(clojure.set/union #{:pez :gum :dots :skor}
#{:pez :skor :pocky}
#{:pocky :gum :skor})
;=> #{:pez :pocky :gum :skor :dots}
Figure 5.4 Basic set operations. The three Venn diagrams show a graphical
representation of Clojure’s set functions: intersection, union, and difference.
Download from Wow! eBook <www.wowebook.com>
Thinking in maps
97
Given two sets, the resulting set will contain all of the distinct elements from both. In
the first example this means :zombies and :humans only show up once each in the
return value. Note in the second example that more than two sets may be given to
union, but as expected each value given in any of the input sets is included exactly
once in the output set.
DIFFERENCE
The only set function that could potentially cause confusion on first glance is
clojure.set/difference, which by name implies some sort of opposition to a union
operation. Working under this false assumption you might assume that difference
would operate thusly:
(clojure.set/difference #{1 2 3 4} #{3 4 5 6})
;=> #{1 2 5 6}
But if you were to evaluate this expression in your REPL, you’d receive a very different
result:
(clojure.set/difference #{1 2 3 4} #{3 4 5 6})
;=> #{1 2}
The reason for this result is that Clojure’s difference function calculates what’s
known as a relative complement (Stewart 1995) between two sets. In other words,
difference can be viewed as a set subtraction function “removing” all elements in a
set A that are also in another set B.
5.6
Thinking in maps
It’s difficult to write a program of any significant size without the need for a map of
some sort. The use of maps is ubiquitous in writing software because frankly it’s diffi-
cult to imagine a more robust data structure. But we as programmers tend to view
maps as a special case structure outside of the normal realm of data objects and
classes. The object-oriented school of thought has relegated the map as a supporting
player in favor of the class. We’re not going to talk about the merits, or lack thereof,
for this relegation here, but in upcoming sections we’ll discuss moving away from
thinking in classes and instead thinking in the sequence abstraction, maps, protocols,
and types. Having said all of that, it need hardly be mentioned that maps should be
used to store named values. In this section we talk about the different types of maps
and the tradeoffs surrounding each.
5.6.1
Hash maps
Arguably, the most ubiquitous11 form of map found in Clojure programs is the hash
map, which provides an unsorted key/value associative structure. In addition to the
literal syntax touched on in chapter 2, hash maps can be created using the hash-map
function, which likewise takes alternating key/value pairs, with or without commas:
11 Although with the pervasiveness of the map literal, the ubiquity may instead fall to the array map.
Download from Wow! eBook <www.wowebook.com>
98
CHAPTER 5 Composite data types
(hash-map :a 1, :b 2, :c 3, :d 4, :e 5)
;=> {:a 1, :c 3, :b 2, :d 4, :e 5}
Clojure hash maps support heterogeneous keys, meaning that they can be of any type
and each key can be of a differing type, as this code shows:
(let [m {:a 1, 1 :b, [1 2 3] "4 5 6"}]
[(get m :a) (get m [1 2 3])])
;=> [1 "4 5 6"]
As we previously mentioned at the beginning of this chapter, many of Clojure’s com-
posite types can be used as functions, and in the case of maps they’re functions of
their keys. Using maps in this way will act the same as the use of the get function in
the previous code sample, as shown when building a vector of two elements:
(let [m {:a 1, 1 :b, [1 2 3] "4 5 6"}]
[(m :a) (m [1 2 3])])
;=> [1 "4 5 6"]
Providing a map to the seq function will return a sequence of map entries:
(seq {:a 1, :b 2})
;=> ([:a 1] [:b 2])
Of course, this sequence appears to be composed of the sets of key/value pairs con-
tained in vectors, and for all practical purposes should be treated as such. In fact, a
new hash map can be created idiomatically using this precise structure:
(into {} [[:a 1] [:b 2]])
;=> {:a 1, :b 2}
Even if your embedded pairs aren’t vectors, they can be made to be for building a new
map:
(into {} (map vec '[(:a 1) (:b 2)]))
;=> {:a 1, :b 2}
In fact, your pairs don’t have to be explicitly grouped, because you can use apply to cre-
ate a hash map given that the key/value pairs are laid out in a sequence consecutively:
(apply hash-map [:a 1 :b 2])
;=> {:a 1, :b 2}
You can also use apply in this way with sorted-map and array-map. Another idiomatic
way to build a map is to use zipmap to “zip” together two sequences, the first of which
contains the desired keys and the second their corresponding values:
(zipmap [:a :b] [1 2])
;=> {:b 2, :a 1}
The use of zipmap illustrates nicely the final property of map collections. Hash maps
in Clojure have no order guarantees. If you do require ordering, then you should use
sorted maps, discussed next.
Download from Wow! eBook <www.wowebook.com>
Thinking in maps
99
5.6.2
Keeping your keys in order with sorted maps
It’s impossible to rely on a specific ordering of the key/value pairs for a standard Clo-
jure map, because there are no order guarantees at all. Using the sorted-map and
sorted-map-by functions, you can construct maps with order assurances. By default,
the function sorted-map will build a map sorted by the comparison of its keys:
(sorted-map :thx 1138 :r2d 2)
;=> {:r2d 2, :thx 1138}
You may require an alternative key ordering, or perhaps an ordering for keys that isn’t
easily comparable. In these cases you must use sorted-map-by, which takes an addi-
tional comparison function:12
(sorted-map "bac" 2 "abc" 9)
;=> {"abc" 9, "bac" 2}
(sorted-map-by #(compare (subs %1 1) (subs %2 1)) "bac" 2 "abc" 9)
;=> {"bac" 2, "abc" 9}
This means that sorted maps don’t generally support heterogeneous keys the same as
hash maps, although it depends on the comparison function provided. For example,
the preceding one assumes all keys are strings. The default sorted-map comparison
function compare supports maps whose keys are all mutually comparable with each
other. Attempts to use keys that aren’t supported by whichever comparison function
you’re using will generally result in a cast exception:
(sorted-map :a 1, "b" 2)
;=> java.lang.ClassCastException: clojure.lang.Keyword cannot be cast to
java.lang.String
One remarkable feature supported by sorted maps (and also sorted sets) is the ability
to jump efficiently to a particular key and walk forward or backward from there
through the collection. This is done with the subseq and rsubseq functions for for-
ward and backward respectively. Even if you don’t know the exact key you want, these
functions can be used to “round up” the next closest key that exists.
Another way that sorted maps and hash maps differ is in their handling of numeric
keys. A number of a given magnitude can be represented by many different types; for
example 42 can be a long, int, float, and so on. Hash maps would treat each of these
different objects as different, whereas a sorted map would treat them as the same. You
can see the contrast in this example, where the hash map keeps both keys while the
sorted map keeps just one:
(assoc {1 :int} 1.0 :float)
;=> {1.0 :float, 1 :int}
(assoc (sorted-map 1 :int) 1.0 :float)
;=> {1 :float}
12 Note that simple boolean functions like > can be used as comparison functions.
Download from Wow! eBook <www.wowebook.com>
100
CHAPTER 5 Composite data types
This is because the comparison function used by the sorted map not only determines
order by equality, and if two keys compare as equal, only one will be kept. This applies
to comparison functions provided to sorted-map-by as well as the default comparator
shown previously.
Sorted maps will otherwise work just like hash maps and can be used interchange-
ably. You should use sorted maps if you need to specify or guarantee a specific key
ordering. On the other hand, if you need to maintain insertion ordering, then the use
of array maps is required as you’ll see.
5.6.3
Keeping your insertions in order with array maps
If you hope to perform an action under the assumption that a given map is insertion-
ordered, then you’re setting yourself up for disappointment. But you might already
know that Clojure provides a special map that ensures insertion ordering called an
array map:
(seq (hash-map :a 1, :b 2, :c 3))
;=> ([:a 1] [:c 3] [:b 2])
(seq (array-map :a 1, :b 2, :c 3))
;=> ([:a 1] [:b 2] [:c 3])
So when insertion order is important, you should explicitly use an array map. Array
maps can be populated quickly by ignoring the form of the key/value pairs and blindly
copying them into place. For structures sized below a certain count, the cost associated
with map lookup bridges the gap between a linear search through an equally sized
array or list. That’s not to say that the map will be slower; instead, it allows the map and
linear implementations to be comparable. Sometimes your best choice for a map is not
a map at all, and like most things in life there are tradeoffs. Thankfully, Clojure takes
care of these considerations for you by adjusting the concrete implementations behind
the scenes as the size of the map increases. The precise types in play aren’t important,
because Clojure is careful to document its promises and to leave undefined aspects
subject to change and/or improvement. It’s usually a bad idea to build your programs
around concrete types, and always bad to build around undocumented behaviors. Clo-
jure handles the underlying efficiency considerations so you don’t have to. But be aware that if
ordering is important, you should avoid operations that inadvertently change the
underlying map implementation from an array map.
We’ve covered the basics of Clojure maps in this section, including common usage
and construction techniques. Clojure maps, minus some implementation details,
shouldn’t be surprising to anyone. It’ll take a while to grow accustomed to dealing
with immutable maps, but in time even this nuance will become second nature.
Now that we’ve looked at Clojure’s primary collection types and their differences
in detail, we’ll take some time to work through a simple case study. This case study,
creating a function named pos, will illustrate the thought processes you might con-
sider on your way toward designing an API built on the principles of the sequence
abstraction.
Download from Wow! eBook <www.wowebook.com>
Putting it all together: finding the position of items in a sequence
101
5.7
Putting it all together: finding the position of items
in a sequence
We sometimes underestimate the influence of little things.
—Charles W. Chesnutt
The case study for this chapter will be to design and implement a simple function to
locate the positional index13 of an element within a sequence. We’re going to pool
together much of the knowledge that you’ve gained in this chapter in order to illus-
trate the steps you might take in designing, writing, and ultimately optimizing a Clo-
jure collection function. Of course, we’re going to work against the sequence
abstraction and will therefore design the solution accordingly.
The function, named pos, must
Work on any composite type returning indices corresponding to some value
Return a numerical index for sequential collections or associated key for maps
and sets
Otherwise return nil
5.7.1
Implementation
If we were to address each of the requirements for pos literally and directly, we might
come up with a function that looks like the following listing.
Listing 5.2 First cut at our position function
(defn pos [e coll]
(let [cmp (if (map? coll)
#(= (second %1) %2)
Map compare
#(= %1 %2))]
Default compare
(loop [s coll idx 0]
Start at beginning
(when (seq s)
(if (cmp (first s) e)
Compare
(if (map? coll)
(first (first s))
Map returns key
idx)
...Else index
(recur (next s) (inc idx)))))))
(pos 3 [:a 1 :b 2 :c 3 :d 4])
;=> 5
(pos :foo [:a 1 :b 2 :c 3 :d 4])
;=> nil
(pos 3 {:a 1 :b 2 :c 3 :d 4})
;=> :c
(pos 3 '(:a 1 :b 2 :c 3 :d 4))
;=> 5
(pos \3 ":a 1 :b 2 :c 3 :d 4")
;=> 13
13 Stuart Halloway describes a similar function index-of-any in his book Programming Clojure that views the
problem largely through the lens of reduced complexity. We like his example and this one because it’s simple
yet powerful and nicely illustrative of the way that Clojure functions should be written.
Download from Wow! eBook <www.wowebook.com>
102
CHAPTER 5 Composite data types
Pretty hideous right? We think so too. Apart from being overly complicated, it’d likely
be more useful if we instead returned a sequence of all the indices matching the item,
so we’ll add that to the requirements. But we’ve built a heavy load with the first cut at
pos and should probably step back a moment to reflect. First of all, it’s probably the
wrong approach to handle map types and other sequence types differently. The use of
the predicate map? to detect the type of the passed collection is incredibly constrain-
ing, in that it forces different collections to be processed differently. That’s not to say
that the use of type-based predicates is strictly prohibited, only that you should try to
favor more generic algorithms or at least to minimize their usage.
As chance has it, the exact nature of the problem demands that we view collections
as a set of values paired with a given index, be it explicit in the case of maps or implicit
in the case of other sequences’ positional information. Therefore, imagine how easy
this problem would be if all collections were laid out as a sequence of pairs ([index1
value1] [index2 value2] ... [indexn valuen]). Well, there’s no reason why they
couldn’t, as shown next.
Listing 5.3 An index function
(defn index [coll]
(cond
(map? coll) (seq coll)
(set? coll) (map vector coll coll)
:else (map vector (iterate inc 0) coll)))
This simple function14 can generate a uniform representation for indexed collections:
(index [:a 1 :b 2 :c 3 :d 4])
;=> ([0 :a] [1 1] [2 :b] [3 2] [4 :c] [5 3] [6 :d] [7 4])
(index {:a 1 :b 2 :c 3 :d 4})
;=> ([:a 1] [:b 2] [:c 3] [:d 4])
(index #{:a 1 :b 2 :c 3 :d 4})
;=> ([1 1] [2 2] [3 3] [4 4] [:a :a] [:c :c] [:b :b] [:d :d])
As shown, we’re still using type-based predicates, but we’ve raised the level of abstrac-
tion to the equality partitions in order to build contextually relevant indices. Now, the
function for finding the positional indices for the desired value is trivial:
(defn pos [e coll]
(for [[i v] (index coll) :when (= e v)] i))
(pos 3 [:a 1 :b 2 :c 3 :d 4])
;=> (5)
(pos 3 {:a 1, :b 2, :c 3, :d 4})
;=> (:c)
(pos 3 [:a 3 :b 3 :c 3 :d 4])
;=> (1 3 5)
(pos 3 {:a 3, :b 3, :c 3, :d 4})
;=> (:a :c :b)
14 Clojure has a core function keep-indexed that works similarly but doesn’t implicitly build indices along
equality partitions. For a vector, you could build the index as (keep-indexed #(-> [% %2]) [:a :b :c :d]).
Download from Wow! eBook <www.wowebook.com>
Summary
103
Much better! But there’s one more deficiency with the pos function from a Clojure
perspective. Typically in Clojure it’s more useful to pass a predicate function in cases
such as these, so that instead of pos determining raw equality, it can build its result
along any dimension, as shown:
(pos #{3 4} {:a 1 :b 2 :c 3 :d 4})
;=> (:c :d)
(pos even? [2 3 6 7])
;=> (0 2)
We can modify pos only slightly to achieve the ideal level of flexibility, as shown next.
Listing 5.4 Our final version of pos
(defn pos [pred coll]
(for [[i v] (index coll) :when (pred v)] i))
We’ve vastly simplified the original solution and generated two potentially useful func-
tions (Martin 2002) in the process. By following some simple Clojure principles, we
were able to solve the original problem statement in a concise and elegant manner.
5.8
Summary
Clojure favors simplicity in the face of growing software complexity. If problems are
easily solved by collection abstractions then those abstractions should be used. Most
problems can be modeled on such simple types, yet we continue to build monolithic
class hierarchies in a fruitless race toward mirroring the “real world”—whatever that
means. Perhaps it’s time to realize that we no longer need to layer self-imposed com-
plexities on top of software solutions that are already inherently complex. Not only
does Clojure provide the sequential, set, and map types useful for pulling ourselves
from the doldrums of software complexity, but it’s also optimized for dealing with
them.
Now that we’ve discussed each of these types in detail, we’re going to take a step
back and talk about three important properties of Clojure’s collection types that until
now we’ve only touch upon lightly: immutability, persistence, and laziness.
Download from Wow! eBook <www.wowebook.com>
Download from Wow! eBook <www.wowebook.com>
Part 3
Functional
programming
In this part of the book, we’ll expose some of the underpinnings of Clojure’s
approach to functional programming, as well as some practical uses of it. Clo-
jure provides mechanisms for immutability, deferred execution, closures, and
recursion. We’ll show examples of how these can work together to let you create
data structures of your own, and find routes through a weighted graph.
Download from Wow! eBook <www.wowebook.com>
Download from Wow! eBook <www.wowebook.com>
Being lazy and
set in your ways
This chapter covers
Immutability
Designing a persistent toy
Laziness
Putting it all together: a lazy quicksort
We’ve now reached the apex of imperative knowledge and stand at the precipice
leading toward functional programming. We mentioned in section 2.3 that the def-
initions of functional programming are widely disparate, and unfortunately this
book won’t work to unify them. Instead, we’ll start in this chapter to build a basis
for Clojure’s style of functional programming by digging into its core supporting
maxims. In addition, this chapter covers in greater depth the parts of Clojure’s
composite types that we only touched on.
6.1
On immutability
We’ve touched on immutability throughout this book, but we’ve avoided discussing
why Clojure has chosen it as a cornerstone principle. Though no panacea, fostering
107
Download from Wow! eBook <www.wowebook.com>
108
CHAPTER 6 Being lazy and set in your ways
immutability at the language level solves many difficult problems right out of the box,
while simplifying many others. Coming from a language background where mutability
interwoven with imperative programming methods reign, it often requires a signifi-
cant conceptual leap to twist your mind to accept and utilize immutability and func-
tional programming. In this section, we’ll build a conceptual basis for immutability as
it relates to Clojure’s underlying philosophy as well as why you should work to foster
immutability even when outside the warming confines of Clojure proper.
6.1.1
Defining immutability
In many cases, when talking specifically about Clojure’s immutable data structures, we
could be talking about the broader category of immutable objects without loss of
meaning. But we should probably set down some conditions defining just what’s
meant by immutability.
EVERY DAY IS LIKE SUNDAY
An entire branch of philosophy named predestination is devoted to exploring the
notion that there’s no such thing as free will, but instead, everything that we are or
ever will be is determined beforehand. Though this possibility for our own lives may
seem bleak, the notion does nicely encapsulate the first principle of immutability: all
of the possible properties of immutable objects are defined at the time of their con-
struction and can’t be changed thereafter.
IMMUTABILITY THROUGH CONVENTION
Computer systems are in many ways open systems, providing the keys to the vault if
you’re so inclined to grab them. But in order to foster an air of immutability in your
own systems, it’s important to create a facade of immutability. Creating immutable
classes in Java requires a few steps (Goetz 2006). First, a class itself and all of its fields
should be labeled as final. Next, in no way should an object’s this reference escape
during construction. And finally, any internal mutable objects should originate, either
whole-cloth or through a copy, within the class itself and thus never escape. Obviously
we’re simplifying, because there are finer details to this recipe for Java immutability,
but for now these simplified highlights serve to show that by observing convention,
even an inherently mutable language such as Java can be made to be immutable. Clo-
jure directly supports immutability as a language feature1 with its core data structures.
By providing immutable data structures as a primary language feature, Clojure sepa-
rates (Braithwaite 2007) the complexity of working with immutable structures from
the complexities of their implementation. By providing immutability either as a core
language feature or through convention, you can reap enormous benefit.
1 We’re intentionally glossing over Clojure’s features that support mutability such as reference types and tran-
sients in order to keep this section focused.
Download from Wow! eBook <www.wowebook.com>
On immutability
109
6.1.2
Being set in your ways—immutability
Clojure’s immutable data structures aren’t bolted onto the language as an after-
thought or as a choice in an a-la-carte menu. Instead, their inclusion in the language
runs deep to its philosophical core.
INVARIANTS
Invariant-based programming involves the definition of constraints on classes and
functions in order to provide assurances that if instances enter into certain states,
assertion errors will arise. Providing invariants within a mutable system requires a fair
amount of assertion weaving within the methods of any given class. But by observing a
practice of immutability, invariants are defined solely within the construction mecha-
nism and can never be violated thereafter.
REASONING
Because the life of an immutable object is one of predestiny, the matter of reasoning
about its possible states is simplified. It follows that the act of testing such a system is
simplified, in that the set of possible states and transitions is constrained.
EQUALITY HAS MEANING
Equality in the presence of mutability has no meaning. Equality in the face of mutabil-
ity and concurrency is utter lunacy. If any two objects resolve as being equal now, then
there’s no guarantee that they will a moment from now. And if two objects aren’t
equal forever, then they’re technically never equal (Baker 1993). Providing immuta-
ble objects once again assigns meaning to equality, in that if two objects are equal now,
then they’ll always be so.
SHARING IS CHEAP
If you’re certain that an object will never change, then sharing said object becomes a
simple matter of providing a reference to it. In Java, to do so often requires a lot of
defensive copying. Along this vein, because we can freely share references for immuta-
ble objects, we can likewise intern them for free.
FLATTENING THE LEVELS OF INDIRECTION
There’s a marked difference between a mutable object and a mutable reference. The
default in Java is that there are references that might point to mutable data. But in
Clojure, there are only mutable references. This may seem like a minor detail, but it
certainly works to reduce unnecessary complexities.
IMMUTABILITY FOSTERS CONCURRENT PROGRAMMING
Immutable objects are always thread safe.
—Brian Goetz,
Java Concurrency in Practice
If an object can’t change, it can be shared freely between different threads of execu-
tion without fear of concurrent modification errors. There can be little debate about
this particular point, but that fact doesn’t answer the question of how mutation
occurs. Without delving into the specifics, you likely already know that Clojure
Download from Wow! eBook <www.wowebook.com>
110
CHAPTER 6 Being lazy and set in your ways
isolates mutation to its reference types while the data wrapped with them is left
unchanged. We’ll leave this alone for now, becuase we’ll devote chapter 11 to this
and related topics.
6.2
Designing a persistent toy
We won’t go into terrible detail about the internals of Clojure’s persistent data struc-
tures—we’ll leave that to others (Krukow 2009). But we do want to explore the notion
of structural sharing. Our example will be highly simplified compared to Clojure’s
implementations, but it should help clarify some of the techniques used.
The simplest shared-structure type is the list. Two different items can be added to
the front of the same list, producing two new lists that share their next parts. We’ll try
this out by creating a base list and then two new lists from that same base:
(def baselist (list :barnabas :adam))
(def lst1 (cons :willie baselist))
(def lst2 (cons :phoenix baselist))
lst1
;=> (:willie :barnabas :adam)
lst2
;=> (:phoenix :barnabas :adam)
You can think of baselist as a historical version of both lst1 and lst2. But it’s also
the shared part of both lists. More than being equal, the next parts of both lists are
identical —the same instance:
(= (next lst1) (next lst2))
;=> true
(identical? (next lst1) (next lst2))
;=> true
So that’s not too complicated, right? But the features supported by lists are also lim-
ited. Clojure’s vectors and maps also provide structural sharing, while allowing you to
change values anywhere in the collection, not just on one end. The key is the struc-
ture each of these datatypes uses internally. We’ll now build a simple tree to help dem-
onstrate how a tree can allow interior changes and maintain shared structure at the
same time.
Each node of our tree will have three fields: a value, a left branch, and a right
branch. We’ll put them in a map, like this:
{:val 5, :L nil, :R nil}
That’s the simplest possible tree—a single node holding the value 5, with empty left
and right branches. This is exactly the kind of tree we want to return when a single
item is added to an empty tree. To represent an empty tree, we’ll use nil. With the
structure decision made, we can write our own conj function xconj to build up our
tree, starting with just the code for this initial case:
Download from Wow! eBook <www.wowebook.com>
Designing a persistent toy
111
(defn xconj [t v]
(cond
(nil? t) {:val v, :L nil, :R nil}))
(xconj nil 5)
;=> {:val 5, :L nil, :R nil}
Hey, it works! Not too impressive yet though, so we need to handle the case where an
item is being added to a nonempty tree. We keep our tree in order by putting values
less than a node’s :val in the left branch, and other values in the right branch. That
means we need a test like this:
(< v (:val t))
When that’s true, we need the new value v to go into the left branch, (:L t). If this
were a mutable tree, we’d change the value of :L to be the new node. Instead, we
should build a new node, copying in the parts of the old node that don’t need to
change. Something like this:
{:val (:val t),
:L (insert-new-val-here),
:R (:R t)}
This will be the new root node. Now we just need to figure out what to put for insert-
new-val-here. If the old value of :L is nil, we simply need a new single-node tree—
we even have code for that already, so we could use (xconj nil v). But what if :L isn’t
nil? In that case, we want to insert v in its proper place within whatever tree :L is
pointing to—so (:L t) instead of nil:
(defn xconj [t v]
(cond
(nil? t) {:val v, :L nil, :R nil}
(< v (:val t)) {:val (:val t),
:L (xconj (:L t) v),
:R (:R t)}))
(def tree1 (xconj nil 5))
tree1
;=> {:val 5, :L nil, :R nil}
(def tree1 (xconj tree1 3))
tree1
;=> {:val 5, :L {:val 3, :L nil, :R nil}, :R nil}
(def tree1 (xconj tree1 2))
tree1
;=> {:val 5, :L {:val 3, :L {:val 2, :L nil, :R nil}, :R nil}, :R nil}
There, it’s working. At least it seems to be—there’s a lot of noise in that output, mak-
ing it difficult to read. Here’s a function to traverse the tree in sorted order, convert-
ing it to a seq that will print more succinctly:
(defn xseq [t]
(when t
(concat (xseq (:L t)) [(:val t)] (xseq (:R t)))))
Download from Wow! eBook <www.wowebook.com>
112
CHAPTER 6 Being lazy and set in your ways
(xseq tree1)
;=> (2 3 5)
Now we just need a final condition for handling the insertion of values that are not less
than the node value:
(defn xconj [t v]
(cond
(nil? t) {:val v, :L nil, :R nil}
(< v (:val t)) {:val (:val t),
:L (xconj (:L t) v),
:R (:R t)}
:else {:val (:val t),
:L (:L t),
:R (xconj (:R t) v)}))
Now that we have the thing built, we hope you understand well enough how it’s put
together that this demonstration of the shared structure will be unsurprising:
(def tree2 (xconj tree1 7))
(xseq tree2)
;=> (2 3 5 7)
(identical? (:L tree1) (:L tree2))
;=> true
Both tree1 and tree2 share a common structure, which is more easily visualized in
figure 6.1.
This example demonstrates several features that it has in common with all of Clo-
jure’s persistent collections:
Every “change” creates at least a new root node, plus new nodes as needed in
the path through the tree to where the new value is being inserted.
Values and unchanged branches are never copied, but references to them are
copied from nodes in the old tree to nodes in the new one.
This implementation is completely thread-safe in a way that’s easy to check—no
object that existed before a call to xconj is changed in any way, and newly cre-
ated nodes are in their final state before being returned. There’s no way for any
other thread, or even any other functions in the same thread, to see anything in
an inconsistent state.
tree1
tree2
L
5 R
L
R
nil
5
L
3 R
L
R
Figure 6.1 Shared structure tree: no matter
nil
nil
7 nil
how big the left side of a tree’s root node is,
something can be inserted on the right side
without copying, changing, or even examining
L
the left side. All those values will be included
nil
2 Rnil
in the new tree, along with the inserted value.
Download from Wow! eBook <www.wowebook.com>
Laziness
113
Our example fails, though, when compared to Clojure’s production-quality code:
It’s just a binary tree.2
It can only store numbers.
It’ll overflow the stack if the tree gets too deep.
It produces (via xseq) a non-lazy seq that will contain a whole copy of the tree.
It can create unbalanced trees that’ll have bad “worst case” algorithmic
complexity.3
Though structural sharing as described using xconj as a basis example can reduce the
memory footprint of persistent data structures, it alone is insufficient. Instead, Clojure
leans heavily on the notion of lazy sequences to further reduce its memory footprint,
as we’ll explore further in the next section.
6.3
Laziness
Through all the windows I only see infinity.
— House of Leaves
by Mark Z. Danielewski
Clojure is partially a lazy language. This isn’t to say that Clojure vectors lie around the
house every day after school playing video games and refusing to get a job. Instead,
Clojure is lazy in the way it handles its sequence types—but what does that mean?
First, we’ll start by defining what it means for a language to be eager, or in other words,
not lazy. Many programming languages are eager in that arguments to functions are
immediately evaluated when passed, and Clojure in most cases follows this pattern as
well. Observe the following:
(- 13 (+ 2 2))
;=> 9
The expression (+ 2 2) is eagerly evaluated, in that its result 4 is passed on to the sub-
traction function during the actual call, and not at the point of need. But a lazy pro-
gramming language such as Haskell (Hudak 2000) will evaluate a function argument
only if that argument is needed in an overarching computation.
In this section we’ll discuss how laziness can be used to avoid nontermination,
unnecessary calculations, and even combinatorially exploding computations. We’ll
also discuss the matter of utilizing infinite sequences, a surprisingly powerful tech-
nique. Finally, we’ll use Clojure’s delay and force to build a simple lazy structure. First,
we’ll start with a simple example of laziness that you may be familiar with from Java.
6.3.1
Familiar laziness with logical-and
Laziness isn’t limited to the case of the evaluation of function arguments; a common
example can be found even in eager programming languages. Take the case of Java’s
2
Clojure’s sorted collections are binary trees, but its hash maps, hash sets, and vectors all have up to 32
branches per node. This results in dramatically shallower trees, and therefore faster lookups and updates.
3
Clojure’s sorted map and sorted set do use a binary tree internally, but they implement red-black trees to keep
the left and right sides nicely balanced.
Download from Wow! eBook <www.wowebook.com>
114
CHAPTER 6 Being lazy and set in your ways
logical-and operator &&. Java implementations optimize this particular operator to
avoid performing unnecessary operations should an early subexpression evaluate to
false. This lazy evaluation in Java allows the following idiom:
if (obj != null && obj.isWhatiz()) {
...
}
For those of you unfamiliar with Java, the preceding code says: “if the object obj isn’t
null, then call the method isWhatiz.” Without a short-circuiting (or lazy, if you will)
&& operator, the preceding operation would always throw a java.lang.NullPointer-
Exception whenever obj was set to null. Though this simple example doesn’t qualify
Java as a lazy language, it does illustrate the first advantage of lazy evaluation— laziness
allows the avoidance of errors in the evaluation of compound structures.
Clojure’s and operator also works this way, as do a number of other operators, but
we won’t discuss this type of short-circuiting laziness too deeply. Listing 6.1 illustrates
what we mean using the case of a series of nested if expressions.
Listing 6.1 Short-circuiting if expression
(defn if-chain [x y z]
(if x
(if y
(if z
(do
(println "Made it!")
:all-truthy)))))
(if-chain () 42 true)
; Made it!
;=> :all-truthy
(if-chain true true false)
;=> nil
The call to println is evaluated only in the case of three truthy arguments. But we can
perform the equivalent action given only the and macro:
(defn and-chain [x y z]
(and x y z (do (println "Made it!") :all-truthy)))
(and-chain () 42 true)
; Made it!
;=> :all-truthy
(and-chain true false true)
;=> false
You may see tricks like this from time to time, but they’re not widespread in idiomatic
Clojure code. Regardless, we’ve presented them as a launching point for the rest of
the discussion in the section. We’ll now proceed to discussing how your own Clojure
programs can be made more generally lazy by following an important recipe.
Download from Wow! eBook <www.wowebook.com>
Laziness
115
6.3.2
Understanding the lazy-seq recipe
Here’s a seemingly simple function steps that takes a sequence and makes a deeply
nested structure from it:
(steps [1 2 3 4])
;=> [1 [2 [3 [4 []]]]]
Seems simple enough, no? Your first instinct might be to tackle this problem recur-
sively, as suggested by the form of the desired result:
(defn rec-step [[x & xs]]
(if x
[x (rec-step xs)]
[]))
(rec-step [1 2 3 4])
;=> [1 [2 [3 [4 []]]]]
Things look beautiful at this point; we’ve created a simple solution to a simple prob-
lem. But therein bugbears lurk. What would happen if we ran this same function on a
large set?
(rec-step (range 200000))
;=> java.lang.StackOverflowError
Observing the example, running the same function over a sequence of 200,000 ele-
ments4 causes a stack overflow. How can we fix this problem? Perhaps it’s fine to say
that you’ll never encounter such a large input set in your own programs; such
tradeoffs are made all of the time. But Clojure provides lazy sequences to help tackle
such problems without significantly complicating your source code. Additionally, idi-
omatic Clojure code will always strive to deal with, and produce, lazy sequences.
Stepping back a bit, we should examine the lazy-seq recipe for applying laziness to
your own functions:
1
Use the lazy-seq macro at the outermost level of your lazy sequence producing
expression(s).
2
If you happen to be consuming another sequence during your operations, then
use rest instead of next.
3
Prefer higher-order functions when processing sequences.
4
Don’t hold onto your head.
These rules of thumb are simple, but they take some practice to utilize to their fullest.
For example, #4 is especially subtle in that the trivial case is easy to conceptualize, but
it’s more complex to implement in large cases. For now we’ll gloss over #3, because
we’ll talk about that approach separately in section 7.1.
So how can you leverage these rules of thumb to ensure laziness?
4 On our machines, 200,000 elements is enough to cause a stack overflow, but your machine may require more
or fewer depending on your JVM configuration.
Download from Wow! eBook <www.wowebook.com>
116
CHAPTER 6 Being lazy and set in your ways
rest versus next
The difference between rest and next can be seen in the following example:
(def very-lazy (-> (iterate #(do (print \.) (inc %)) 1)
rest rest rest))
;=> ..#'user/very-lazy
(def less-lazy (-> (iterate #(do (print \.) (inc %)) 1)
next next next))
;=> ...#'user/less-lazy
When building a lazy seq from another, rest doesn’t realize any more elements than
it needs to; next does. In order to determine whether a seq is empty, next needs to
check whether there’s at least one thing in it, thus potentially causing one extra real-
ization. Here’s an example:
(println (first very-lazy)) ; .4
(println (first less-lazy)) ; 4
Grabbing the first element in a lazy seq built with rest causes a realization as
expected. But the same doesn’t happen for a seq built with next because it’s already
been previously realized. Using next causes a lazy seq to be one element less lazy,
which might not be desired if the cost of realization is expensive. In general, we rec-
ommend that you use next unless you’re specifically trying to write code to be as lazy
as possible.
UTILIZING LAZY-SEQ AND REST
In order to be a proper lazy citizen, you should produce lazy sequences using the
lazy-seq macro:
(defn lz-rec-step [s]
(lazy-seq
(if (seq s)
[(first s) (lz-rec-step (rest s))]
[])))
(lz-rec-step [1 2 3 4])
;=> (1 (2 (3 (4 ()))))
(class (lz-rec-step [1 2 3 4]))
;=> clojure.lang.LazySeq
(dorun (lz-rec-step (range 200000)))
;=> nil
There are a few points of note for our new implementation. First, we’ve eliminated
destructuring on the function arguments because the & arguments within are implic-
itly destructured via the nthnext function. As we mentioned in our rules of thumb,
when consuming a sequence within the body of a lazy-seq you’ll want to use rest,
which we did in lz-rec-step. Second, we’re no longer producing nested vectors as the
output of the function, but instead a lazy sequence LazySeq, which is the by-product of
the lazy-seq macro.
Download from Wow! eBook <www.wowebook.com>
Laziness
117
With only minor adjustments, we’ve created a lazy version of the step function
while also maintaining simplicity. The first two rules of the lazy sequence recipe can
be used in all cases when producing lazy sequences. You’ll see this pattern over and
over in idiomatic Clojure code.
If what’s going on here still doesn’t quite make sense to you, consider this even
simpler example:
(defn simple-range [i limit]
(lazy-seq
(when (< i limit)
(cons i (simple-range (inc i) limit)))))
This behaves similarly to Clojure’s built-in function range, but it’s simpler in that it
doesn’t accept a step argument and has no
support for producing chunked seqs:5
lazy-seq
lazy-seq
lazy-seq
cons
cons
thunk
(simple-range 0 9)
0
1
(simple-range 2 9)
;=> (0 1 2 3 4 5 6 7 8)
Note that it follows all the lazy-seq recipe
Figure 6.2 Each step of a lazy seq may be in
one of two states. If the step is unrealized, it’ll
rules you’ve seen so far. Figure 6.2 is a repre-
contain a function or closure of no arguments
sentation of what’s in memory when the
(a thunk) that can be called later to realize
REPL has printed the first two items in a
the step. When this happens, the thunk’s
return value is cached instead, and the thunk
simple-range seq but hasn’t yet printed any
itself is released as pictured in the first two
more than that.
lazy seq boxes, transitioning the step to the
One way in which complications may
realized state. Note that although not shown
here, a realized lazy seq may simply contain
arise is by accidentally holding onto the head
nothing at all, called nil, indicating the end
of a lazy sequence. This is addressed by the
of the seq.
third rule of lazy sequences.
6.3.3
Losing your head
The primary advantage of laziness in Clojure is that it prevents the full realization of
interim results during a calculation. If you manage to hold onto the head of a
sequence somewhere within a function, then that sequence will be prevented from
being garbage collected. The simplest way to retain the head of a sequence is to bind
it to a local. This condition can occur with any type of value bind, be it to a reference
type or through the usage of let or binding:
(let [r (range 1e9)] [(first r) (last r)])
;=> [0 999999999]
(let [r (range 1e9)] [(last r) (first r)])
; java.lang.OutOfMemoryError: GC overhead limit exceeded
Clojure’s compiler can deduce that in the first example, the retention of r is no longer
needed when the computation of (last r) occurs, and therefore aggressively clears it.
5 Chunked seqs are a technique for improving performance that we cover in chapter 12.
Download from Wow! eBook <www.wowebook.com>
118
CHAPTER 6 Being lazy and set in your ways
But in the second example, the head is needed later in the overall computation and
can no longer be safely cleared. Of course, the compiler could perform some rearrang-
ing with the order of operations for this case, but it doesn’t because in order to do so
safely it would have to guarantee that all of the composite functions were pure. It’s
okay if you’re not clear on what a pure function is right now—we’ll cover it in section
7.1. In a nutshell, take to heart that Clojure can’t rearrange operations, because there’s
no way to guarantee that order is unimportant. This is one area where a purely func-
tional lazy language such as Haskell (Thompson 1999) really shines by comparison.
6.3.4
Employing infinite sequences
Because Clojure’s sequences are lazy, they have the potential to be infinitely long. Clo-
jure provides a number of functions for generating and working with infinite
sequences:
; Run at your own risk
(iterate (fn [n] (/ n 2)) 1)
;=> (1 1/2 1/4 1/8 ...)
It sure is a nice trick (although you might not think so had you chosen to ignore our
warning), but what could you possibly use infinite sequences for? Working with infi-
nite sequences often fosters more declarative solutions. Take a simple example as a
start. Imagine that we have a function that calculates a triangle number for a given
integer:
(defn triangle [n]
(/ (* n (+ n 1)) 2))
(triangle 10)
;=> 55
The function triangle can then be used to build a sequence of the first 10 triangle
numbers:
(map triangle (range 1 11))
;=> (1 3 6 10 15 21 28 36 45 55)
There’s nothing wrong with the preceding solution, but it suffers from a lack of flexi-
bility in that it does what it does and that’s all. By defining a sequence of all of the tri-
angle numbers as in the following listing, you can perform more interesting “queries”
in order to retrieve the desired elements.
Listing 6.2 Infinite sequences foster declarative solutions.
(def tri-nums (map triangle (iterate inc 1)))
(take 10 tri-nums)
Get first 10
;=> (1 3 6 10 15 21 28 36 45 55)
(take 10 (filter even? tri-nums))
;=> (6 10 28 36 66 78 120 136 190 210)
Get first 10 even
(nth tri-nums 99)
What Gauss found
Download from Wow! eBook <www.wowebook.com>
Laziness
119
;=> 5050
(double (reduce + (take 1000 (map / tri-nums))))
Converge on 2
;=> 1.998001998001998
(take 2 (drop-while #(< % 10000) tri-nums))
First 2 greater than 10,000
;=> (10011 10153)
;; ...
The queries used three ubiquitous Clojure functions: map, reduce, and filter. The
map function applies a function to each element in a sequence and returns the result-
ing sequence. The reduce function applies a function to each value in the sequence
and the running result to accumulate a final value. Finally, the filter function applies
a function to each element in a sequence and returns a new sequence of those ele-
ments where said function returned a truthy value. All three of these functions retain
the laziness of a given sequence.
Defining the infinite sequence of triangle numbers allows you to take elements
from it as needed, only calculating those particular items.
6.3.5
The delay and force macros
Although Clojure sequences are largely lazy, Clojure itself isn’t. In most cases, expres-
sions in Clojure are evaluated once prior to their being passed into a function rather
than at the time of need. But Clojure does provide mechanisms for implementing
what are known as call-by-need semantics. The most obvious of these mechanisms is its
macro facilities, but we’ll defer that discussion until chapter 8. The other mechanism
for providing what we’ll call explicit laziness are Clojure’s delay and force. In short,
the delay macro is used to defer the evaluation of an expression until explicitly forced
using the force function. Using these laziness primitives, we can wrap an expression
in a call to delay and use it only if necessary on the callee’s side:
(defn defer-expensive [cheap expensive]
(if-let [good-enough (force cheap)]
good-enough
(force expensive)))
(defer-expensive (delay :cheap)
(delay (do (Thread/sleep 5000) :expensive)))
;=> :cheap
(defer-expensive (delay false)
(delay (do (Thread/sleep 5000) :expensive)))
;=> :expensive
You can simulate this behavior with the use of anonymous functions, where delay is
replaced by (fn [] expr) and force by (delayed-fn), but using delay/force allows
you to explicitly check for delayed computations using delay?. Additionally, delay
caches its calculation, therefore allowing its wrapped expression to be calculated only
once. Of course, you could simulate the same behavior using memoization,6 but why
would you in this case when delay and force solve the problem more succinctly?
6 We’ll cover memoization in section 12.4.
Download from Wow! eBook <www.wowebook.com>
120
CHAPTER 6 Being lazy and set in your ways
if-let and when-let
The if-let and when-let macros are useful when you’d like to bind the results of
an expression based on if it returns a truthy value. This helps to avoid the need to
nest if/when and let as shown:
(if :truthy-thing
(let [res :truthy-thing] (println res)))
; :truthy-thing
(if-let [res :truthy-thing] (println res))
; :truthy-thing
The latter is much more succinct.
There are more complicated usage patterns for delay and force besides the simple
scheme outlined previously. For example, we can implement a version of the lazy
sequence of triangular numbers from a few sections prior using delay and force:
(defn inf-triangles [n]
{:head (triangle n)
:tail (delay (inf-triangles (inc n)))})
(defn head [l] (:head l))
(defn tail [l] (force (:tail l)))
The function inf-triangles creates a lazy linked-list of nodes. Each node is a map
containing a value mapped to :head and a link to the remainder of the list keyed as
:tail. The head of the list is the result of applying the function triangle to the incre-
menting counter passed recursively within the body of delay. As you can imagine, the
head of a node is always calculated as we walk down the linked-list, even if it’s never
accessed. This type of lazy structure is known as head strict but differs from Clojure's
lazy-seq, which delays both the head and tail and then realizes them at the same time.
We can now create a structure similar to the original tri-nums and start getting at
its contained elements:
(def tri-nums (inf-triangles 1))
(head tri-nums)
;=> 1
(head (tail tri-nums))
;=> 3
(head (tail (tail tri-nums)))
;=> 6
One thing to note about the preceding code is that accessing the values 3 and 6 were
deferred calculations only occurring on demand. The structure of the example is
shown in figure 6.3.
Download from Wow! eBook <www.wowebook.com>
Putting it all together: a lazy quicksort
121
tri-nums
(inf-triangles 1)
:head 1
:tail
delay
(inf-triangles 2)
:head 3
:tail
delay
(inf-triangles 3)
Figure 6.3 Lazy linked-list example. Each node of
:head 6
this linked list contains a value (the head) and a delay
:tail
delay...
(the tail). The creation of the next part is forced by a
call to tail—it doesn't exist until then.
Though we can navigate the entire chain of triangular numbers using only head and
tail, it’s probably a better idea7 to use them as primitives for more complicated
functions:
(defn taker [n l]
(loop [t n, src l, ret []]
(if (zero? t)
ret
(recur (dec t) (tail src) (conj ret (head src))))))
(defn nthr [l n]
(if (zero? n)
(head l)
(recur (tail l) (dec n))))
(taker 10 tri-nums)
;=> [1 3 6 10 15 21 28 36 45 55]
(nthr tri-nums 99)
;=> 5050
Of course, writing programs using delay and force is an onerous way to go about the
problem of laziness, and you’d be better served by using Clojure’s lazy sequences to
full effect rather than building your own from these basic blocks. But the preceding
code, in addition to being simple to understand, harkens back to chapter 5 and the
entire sequence “protocol” being built entirely on the functions first and rest.
Pretty cool, right?
6.4
Putting it all together: a lazy quicksort
In a time when the landscape of programming languages is rife with new program-
ming languages and pregnant with more, it seems inconceivable that the world would
need another quicksort implementation. Inconceivable or not, we won’t be deterred
from adding yet another to the rich ecosystem of pet problems. Our implementation
7 And as we’ll cover in section 9.3, participating in the ISeq protocol is even better.
Download from Wow! eBook <www.wowebook.com>
122
CHAPTER 6 Being lazy and set in your ways
of quicksort differs from many in a few key ways. First, we’ll implement a lazy, tail-
recursive version. Second, we’ll split the problem such that it can be executed incre-
mentally. Only the calculations required to obtain the part of a sequence desired will
be calculated. This will illustrate the fundamental reason for laziness in Clojure: the
avoidance of full realization of interim results.
THE IMPLEMENTATION
Without further ado, we present our quicksort implementation.8
Listing 6.3 A lazy, tail-recursive quicksort implementation
(ns joy.q)
(defn nom [n] (take n (repeatedly #(rand-int n))))
(defn sort-parts
"Lazy, tail-recursive, incremental quicksort. Works against
and creates partitions based on the pivot, defined as 'work'."
[work]
(lazy-seq
Pull
(loop [[part & parts] work]
apart work
(if-let [[pivot & xs] (seq part)]
Define pivot
(let [smaller? #(< % pivot)]
comparison fn
(recur (list*
(filter smaller? xs)
Work all < pivot
pivot
Work pivot itself
(remove smaller? xs)
Work all > pivot
parts)))
Concat parts
(when-let [[x & parts] parts]
(cons x (sort-parts parts)))))))
Sort rest
(defn qsort [xs]
if more parts
(sort-parts (list xs)))
The key detail in the code above is that sort-parts works not on a plain sequence of
elements but on a carefully constructed list that alternates between lazy seqs and piv-
ots. Every element before each pivot is guaranteed to be less than the pivot and every-
thing after will be greater, but the sequences between the pivots are as yet unsorted.
When qsort is given an input sequence of numbers to sort, it creates a new work list
consisting of just that input sequence and passes this work to sort-parts. The loop
inside sort-parts pulls apart the work, always assuming that the first item, which it
binds to part, is an unsorted sequence. It also assumes that if there is a second item,
which will be at the head of parts, it is a pivot. It recurs on the sequence at the head
of work, splitting out pivots and lazy seqs until the sequence of items less than the
most recent pivot is empty, in which case the if-let test is false, and that most recent
pivot is returned as the first item in the sorted seq. The rest of the built up list of work
8
This listing uses the list* function, which for some reason is somewhat rarely seen. In cases like this, how-
ever, it is exactly what is needed. list* is like list except it expects its last argument to be a list on which to
prepend its other arguments. We’ll use it again in chapter 8.
Download from Wow! eBook <www.wowebook.com>












Putting it all together: a lazy quicksort
123
held by the returned lazy sequence to be passed into
greater than pivot
sort-parts again when subsequent sorted items are
pivot
needed.
(1 (2 4 3))
You can see a snapshot of the work list for the
partition
function call (qsort [2 1 4 3]) in figure 6.4, at an
less than pivot
intermediate point in its process.
The figure includes the characteristics of a stan-
Figure 6.4 The qsort function
dard quicksort implementation, and you can run it
shown earlier would use a structure
like this for its work list when
to see that the final sequence is sorted:
sorting the vector [2 1 4 3]. Note
that all the parts described by a
(qsort [2 1 4 3])
standard quicksort implementation
;=> (1 2 3 4)
are represented here.
(qsort (nom 20))
;=> (0 2 3 5 6 7 7 8 9 10 11 11 11 12 12 13 14 16 17 19)
The implementation of the sort-parts function works to provide an incremental
solution for lazy quicksort. This incremental approach stands in opposition to a
monolithic approach (Okasaki 1996) defined by its performance of the entire calcula-
tion when any segment of the sequence is accessed. For example, grabbing the first
element in a lazy sequence returned from qsort will perform only the necessary calcu-
lations required to get that first item:
(first (qsort (nom 100)))
;=> 1
Of course, the number returned here will likely be different in your REPL, but the
underlying structure of the lazy sequence used internally by sort-parts will be similar
to that shown in figure 6.5.
The lazy qsort will be able to gather the first element because it only takes some
small subset of comparisons to gather the numbers into left-side smaller and right-side
(5 3 1 7 4 2 8 6)
A
Figure 6.5 Internal structure of
qsort. Each filter and remove
(3 1 4 2) 5 (7 8 6)
lazily returns items from its parent
sequence only as required. So to
return the first two items of the seq
B
returned by qsort, no remove steps
are required from either level A or B.
(1 2) 3 (4) 5 (6) 7 (8)
To generate the sequence (4), a
single remove step at level B would
pivot
be needed to eliminate everything less
(filter #(< % 3)
(remove #(< % 7)
(filter #(< % 5)
(remove #(< % 5)
filter
than 3. As more items are forced from
xs)
xs)
the seq returned by qsort, more of
remove
the internal filter and remove
steps will be run.
Download from Wow! eBook <www.wowebook.com>
124
CHAPTER 6 Being lazy and set in your ways
larger partitions and sort those smaller pieces only. The characteristic of the quicksort
algorithm is especially conducive to laziness, because it’s fairly cheap to make and
shuffle partitions where those with a smaller magnitude can be shuffled first. What
then are the benefits of a lazy, tail-recursive, incremental quicksort? The answer is that
you can take sorted portions of a large sequence without having to pay the cost of sort-
ing its entirety, as the following command hints:
(take 10 (qsort (nom 10000)))
;=> (0 0 0 4 4 7 7 8 9 9)
On our machines, this command required roughly 11,000 comparisons, which for all
intents and purposes is an O(n) operation—an order of magnitude less than quick-
sorts’s best case. Bear in mind that as the take value gets closer to the number of