If you’ll notice, we used the :refer-clojure namespace directive to :exclude the
array and sequence functions that the SafeArray protocol overrides. We did this not
only because it’s important to know how to use :refer-clojure, but also because
we’re changing the semantics of aset to take a mutating function as its last argument
instead of a raw value. We then used the :require directive to alias the Clojure
namespace as clj, thus avoiding the need to use the fully qualified function names a
la clojure.core/aget.
The dumb array created by make-dumb-array is stored in a closure created by
reify, and unguarded access is provided without concern for concurrent matters.
Using this implementation across threads is disastrous, as shown:
(defn pummel [a]
(dothreads! #(dotimes [i (count a)] (aset a i inc)) :threads 100))
Download from Wow! eBook <www.wowebook.com>
When to use locks
259
(def D (make-dumb-array Integer/TYPE 8))
(pummel D)
;; wait for pummel to terminate
(seq D)
;=> (82 84 65 63 83 65 83 87)
This is very wrong—100 threads incrementing concurrently should result in 100 for
each array slot. To add insult to injury, Clojure didn’t throw a Concurrent-
ModificationException as you might’ve expected, but instead just silently went
along doing very bad things. Next, we’ll talk a little about locking and provide an
alternate implementation for SafeArray using locking primitives.
11.5.1 Safe mutation through locking
Currently, the only5 way to safely modify and see consistent values for a mutable object
across threads in Clojure is through locking.
REFERENCES AROUND EVIL MUTABLE THINGS
Wrapping a mutable object in a Clojure reference type provides absolutely no guarantees
for safe concurrent modification. Doing this will at best explode immediately or, worse,
provide inaccurate results.
If at all possible, locking should be avoided; but for those times when it’s unavoid-
able, the locking macro will help. The locking macro takes a single parameter acting
as the locking monitor and a body that executes in the monitor context. Any writes
and reads to the monitor object are thread safe, and as a bonus the monitor is always
released at the end of the block. One of the major complexities in concurrent pro-
gramming using locks is that all errors must be handled fully and appropriately; other-
wise you risk orphaned locks, and they spell deadlock. But the locking macro will
always release the lock, even in the face of exceptions.
Listing 11.5 An implementation of the SafeArray protocol using the locking macro
(defn make-safe-array [t sz]
(let [a (make-array t sz)]
Array creation
(reify
is same
SafeArray
(count [_] (clj/count a))
(seq [_] (clj/seq a))
(aget [_ i]
aget is locked
(locking a
(clj/aget a i)))
(aset [this i f]
aset is locked
(locking a
(clj/aset a i (f (aget this i))))))))
aset uses aget
(def A (make-safe-array Integer/TYPE 8))
(pummel A)
5 Although a potential future addition to Clojure named pods may provide another.
Download from Wow! eBook <www.wowebook.com>
260
CHAPTER 11 Mutation
;; wait for pummel to terminate
(seq A)
;=> (100 100 100 100 100 100 100 100)
We used the locking macro on both the aget and aset functions so that they can
both maintain consistency concurrently. Because aset calls aget, the locking macro
is called twice. This isn’t a problem because locking is reentrant, or able to be called
multiple times in the same thread. Typically, you’d have to manage the releasing of
reentrant locking mechanism to match the number of times called, but fortunately
locking manages that for us.
The locking macro is the simplest way to perform primitive locking in Clojure.
But the implementation of make-safe-array is coarse in that the locks used are
guarding the entire array. Any readers or writers wishing to access or update any slot
in the array must wait their turn, a bottleneck known as contention. If you need finer-
grained locking, the locking facilities provided by Java will help to gain more control,
a topic we cover next.
11.5.2 Using Java’s explicit locks
Java provides a set of explicit locks in the java.util.concurrent.locks package that
can also be used as shown in the following listing. One such lock is provided by the
java.util.concurrent.locks.ReentrantLock class.
Listing 11.6 An implementation of the SafeArray protocol using ReentrantLock
(defn lock-i [target-index num-locks]
(mod target-index num-locks))
(import 'java.util.concurrent.locks.ReentrantLock)
(defn make-smart-array [t sz]
(let [a (make-array t sz)
Array
Lsz (quot sz 2)
L (into-array (take Lsz
Locks
(repeatedly #(ReentrantLock.))))]
(reify
SafeArray
(count [_] (clojure.core/count a))
(seq [_] (clojure.core/seq a))
(aget [_ i]
(let [lk (clojure.core/aget L (lock-i (inc i) Lsz))]
(.lock lk)
Explicit
(try
locking
(clojure.core/aget a i)
Explicit
(finally (.unlock lk)))))
unlocking
(aset [this i f]
(let [lk (clojure.core/aget L (lock-i (inc i) Lsz))]
(.lock lk)
(try
Reentrant
(clojure.core/aset a i (f (aget this i)))
locking
(finally (.unlock lk))))))))
Download from Wow! eBook <www.wowebook.com>
When to use futures
261
(def S (make-smart-array Integer/TYPE 8))
(pummel S)
;; wait for pummel to terminate
(seq S)
;=> (100 100 100 100 100 100 100 100)
The first point of note is that we use a technique (simplified for clarity) called lock
striping (Herlihy 2008) to reduce the contention of guarding the array as a whole
using locking. The target array a’s slots are guarded by half the number of locks, each
chosen using the simple formula (mod target-index num-locks). This scheme
allows readers and writers to (potentially) act independently when accessing different
array slots. It’s crucial that we closed over the lock instance array L because for explicit
locks to work, each access must lock and unlock the same instance. Additionally, we’re
calling the .unlock method in the body of a finally expression, because failing to do
so is a recipe for disaster. Unlike the locking macro, the ReentrantLock class doesn’t
manage lock release automatically. Finally, you can also use the ReentrantLock in a
way equivalent to using the locking macro, but using ReentrantLock gives you the
choice of using proxy to provide more complex semantics than locking can provide.
One flaw of the make-smart-array function is that it uses the same locks for read-
ers and writers. But you can allow for more concurrency if you enable some number
of readers to access array slots without blocking at all by using the java.util.
concurrent.locks.ReentrantReadWriteLock class. The ReentrantReadWriteLock
class holds two lock instances, one for reads and one for writes, and by adding another
lock array you can take advantage of this fact. We won’t get into that exercise here, but
if you choose to do so then you can use the implementation of make-smart-array as a
guide.
Using the various locking mechanisms, you can guarantee consistency across
threads for mutable objects. But as we showed with explicit locks, there’s an expected
incantation to unlocking that must be strictly observed. Though not necessarily com-
plex in the SafeArray implementations, the conceptual baggage incurred in the
semantics of explicit locking scheme doesn’t scale well. The java.util.concurrent
package contains a cacophony of concurrency primitives above and beyond simple
locks, but it’s not our goal to provide a comprehensive survey herein.
Now that we’ve covered the matter of guaranteeing coordinated state across dispa-
rate threads, we turn our attention to a different topic: parallelization.
11.6
When to use futures
Clojure includes two reference types supporting parallelism: futures and promises.
Futures, the subject of this section, are simple yet elegant constructs useful for parti-
tioning a typically sequential operation into discrete parts. These parts can then be
asynchronously processed across numerous threads that will block if the enclosed
expression hasn’t finished. All subsequent dereferencing will return the calculated
value. The simplest example of the use of a future is as shown:
Download from Wow! eBook <www.wowebook.com>
262
CHAPTER 11 Mutation
(time (let [x (future (do (Thread/sleep 5000) (+ 41 1)))]
[@x @x]))
; "Elapsed time: 5001.682 msecs"
;=> [42 42]
The processing time of the do block is only paid for on the first dereference of the
future x. Futures represent expressions that have yet to be computed.
11.6.1 Futures as callbacks
One nice use case for futures is in the context of a callback mechanism. Normally you
might call out to a remote-procedure call (RPC), wait for it to complete, and then pro-
ceed with some task depending on the return value. But what happens if you need to
make multiple RPC calls? Should you be forced to wait for them all serially? Thanks to
futures, the answer is no. In this section, we’ll use futures to create an aggregate task
that finds the total number of occurrences of a string within a given set of Twitter6
feeds. This aggregate task will be split into numerous parallel subtasks via futures.
COUNTING WORD OCCURRENCES IN A SET OF TWITTER FEEDS
Upon going to a personal Twitter page such as http://twitter.com/fogus, you can find
a link to the RSS 2.0 feed for that user. We’ll use this feed as the input to our functions.
An RSS 2.0 feed is an XML document used to represent a piece of data that’s con-
stantly changing. The layout of a Twitter RSS entry is straightforward:
<rss version="2.0">
<channel>
<title>Twitter / fogus</title>
<link>http://twitter.com/fogus</link>
<item>
<title>fogus: Thinking about #Clojure futures.</title>
<link>http://twitter.com/fogus/statuses/12180102647/</link>
</item>
</channel>
</rss>
There’s more to the content of a typical RSS feed, but for our purposes we wish to only
retrieve the title element of the item elements (there can be more than one). To do
this, we need to first parse the XML and put it into a convenient format. If you recall
from section 8.4, we created a domain DSL to create a tree built on a simple node
structure of tables with the keys :tag, :attrs, and :content. As mentioned, that struc-
ture is leveraged in many Clojure libraries, and we’ll take advantage of this fact. Clo-
jure provides some core functions in the clojure.xml and clojure.zip namespaces
to help make sense of the feed:
(require '(clojure [xml :as xml]))
(require '(clojure [zip :as zip]))
(defmulti rss-children class)
(defmethod rss-children String [uri-str]
6 Twitter is online at http://twitter.com.
Download from Wow! eBook <www.wowebook.com>
When to use futures
263
(-> (xml/parse uri-str)
zip/xml-zip
zip/down
zip/children))
Using the function clojure.xml/parse, we can retrieve the XML for a Twitter RSS
feed and convert it into the familiar tree format. That tree is then passed into a func-
tion clojure.zip/xml-zip that converts that structure into another data structure
called a zipper. The form and semantics of the zipper are beyond the scope of this book
(Huet 1997), but using it in this case allows us to easily navigate down from the root
rss XML node to the channel node, where we then retrieve its children. The child
nodes returned from rss-children contain other items besides item nodes (title,
link, and so forth) that need to be filtered out. Once we have those item nodes, we
then want to retrieve the title text and count the number of occurrences of the tar-
get text (case-insensitive). We perform all of these tasks using the function count-
tweet-text-task, defined in the following listing.
Listing 11.7 Creating a future task to count word occurrences in a tweet
(import '(java.util.regex Pattern))
(defn count-tweet-text-task [txt feed]
(let [items (rss-children feed)
Get kids
re (Pattern/compile (Pattern/quote txt))]
(count
(mapcat #(re-seq re (first %))
Get matches
(for [item (filter (comp #{:item} :tag) items)]
Filter non-items
(-> item :content first :content))))))
Get title
We’ll now try to count some text in a Twitter feed to see what happens:
(count-tweet-text-task
"#clojure"
"http://twitter.com/statuses/user_timeline/46130870.rss")
;=> 7
The result you see is highly dependent on when you run this function, because the
RSS feeds are ever-changing. But using the count-tweet-text-task function, we can
build a sequence of tasks to be performed over some number of Twitter feeds. Before
we do that, we’ll create a convenience macro as-futures to take said sequence and
dispatch the enclosed actions across some futures.
Listing 11.8 A macro to dispatch a sequence of futures
(defmacro as-futures [[a args] & body]
(let [parts (partition-by #{'=>} body)
[acts _ [res]] (partition-by #{:as} (first parts))
[_ _ task] parts]
`(let [~res (for [~a ~args] (future ~@acts))]
~@task)))
Download from Wow! eBook <www.wowebook.com>
264
CHAPTER 11 Mutation
The as-futures macro implemented in listing 11.8 names a binding corresponding
to the arguments for a given action, which is then dispatched across a number of
futures, after which a task is run against the futures sequence. The body of as-
futures is segmented so that we can clearly specify the needed parts—the action
arguments, the action to be performed for each argument, and the tasks to be run
against the resulting sequence of futures:
(as-futures [<arg-name> <all-args>]
<actions-using-args>
:as <results-name>
=>
<actions-using-results>)
To simplify the macro implementation, we use the :as keyword and => symbol to
clearly delineate its segments. The as-futures body only exits after the task body fin-
ishes—as determined by the execution of the futures. We can use as-futures to per-
form the original task with a new function tweet-occurrences, implemented in the
following listing.
Listing 11.9 Counting string occurrences in Twitter feeds fetched in parallel
(defn tweet-occurrences [tag & feeds]
(as-futures [feed feeds]
(count-tweet-text-task tag feed)
:as results
=>
(reduce (fn [total res] (+ total @res))
0
results)))
The as-futures macro builds a sequence of futures named results, enclosing the
call to count-tweet-text-task across the unique set of Twitter feeds provided. We
then sum the counts returned from the dereferencing of the individual futures, as
shown:
(tweet-occurrences "#Clojure"
"http://twitter.com/statuses/user_timeline/46130870.rss"
"http://twitter.com/statuses/user_timeline/14375110.rss"
"http://twitter.com/statuses/user_timeline/5156041.rss"
"http://twitter.com/statuses/user_timeline/21439272.rss")
;=> 22
And that’s that. Using only a handful of functions and macros, plus using the built-in
core facilities for XML parsing and navigation, we’ve created a simple Twitter occur-
rences counter. Our implementation has some trade-offs made in the name of page
count. First, we blindly dereference the future in tweet-occurrences when calculat-
ing the sum. If the future’s computation freezes, then the dereference would likewise
freeze. Using some combination of future-done?, future-cancel, and future-can-
celled? in your own programs, you can skip, retry, or eliminate ornery feeds from the
calculation. Futures are only one way to perform parallel computation in Clojure, and
in the next section we’ll talk about another—promises.
Download from Wow! eBook <www.wowebook.com>
When to use promises
265
11.7 When to use promises
Another tool that Clojure provides for parallel computation is the promise and
deliver mechanisms. Promises are similar to futures, in that they represent a unit of
computation to be performed on a separate thread. Likewise, the blocking semantics
when dereferencing an unfinished promise are also the same. Whereas futures encap-
sulate an arbitrary expression that caches its value in the future upon completion,
promises are placeholders for values whose construction is fulfilled by another thread
via the deliver function. A simple example is as follows:
(def x (promise))
(def y (promise))
(def z (promise))
(dothreads! #(deliver z (+ @x @y)))
(dothreads!
#(do (Thread/sleep 2000) (deliver x 52)))
(dothreads!
#(do (Thread/sleep 4000) (deliver y 86)))
(time @z)
; "Elapsed time: 3995.414 msecs"
;=> 138
Promises are write-once; any further attempt to deliver will throw an exception.
11.7.1 Parallel tasks with promises
We can create a macro similar to as-futures for handling promises, but because of
the more advanced value semantics, the implementation is thus more complicated.
We again wish to provide a named set of tasks, but we’d additionally like to name the
corresponding promises so that we can then execute over the eventual results, which
we do next.
Listing 11.10 A macro to dispatch a sequence of promises across a number of threads
(defmacro with-promises [[n tasks _ as] & body]
(when as
`(let [tasks# ~tasks
n# (count tasks#)
promises# (take n# (repeatedly promise))]
(dotimes [i# n#]
(dothreads!
(fn []
(deliver (nth promises# i#)
((nth tasks# i#))))))
(let [~n tasks#
~as promises#]
~@body))))
We could then build a rudimentary parallel testing facility, dispatching tests across dis-
parate threads and summing the results when all of the tests are done:
Download from Wow! eBook <www.wowebook.com>
266
CHAPTER 11 Mutation
(defrecord TestRun [run passed failed])
(defn pass [] true)
(defn fail [] false)
(defn run-tests [& all-tests]
(with-promises
[tests all-tests :as results]
(into (TestRun. 0 0 0)
(reduce #(merge-with + %1 %2) {}
(for [r results]
(if @r
{:run 1 :passed 1}
{:run 1 :failed 1}))))))
(run-tests pass fail fail fail pass)
;=> #:user.TestRun{:run 5, :passed 2, :failed 3}
This unit-testing model is simplistic by design in order to illustrate parallelization
using promises and not to provide a comprehensive testing framework.
11.7.2 Callback API to blocking API
Promises, much like futures, are useful for executing RPC on separate threads. This
can be useful if you need to parallelize a group of calls to an RPC service, but there’s a
converse use case also. Often, RPC APIs take arguments to the service calls and also a
callback function to be executed when the call completes. Using the rss-children
function from the previous section, we can construct an archetypal RPC function:
(defn tweet-items [k feed]
(k
(for [item (filter (comp #{:item} :tag) (rss-children feed))]
(-> item :content first :content))))
The tweet-items function is a distillation of the count-tweet-text-task function
from the previous chapter, as shown:
(tweet-items
count
"http://twitter.com/statuses/user_timeline/46130870.rss")
;=> 16
The argument k to tweet-items is the callback, or continuation, that’s called with the
filtered RPC results. This API is fine, but there are times when a blocking call is more
appropriate than callback based call. We can use a promise to achieve this blocking
behavior with the following:
(let [p (promise)]
(tweet-items #(deliver p (count %))
"http://twitter.com/statuses/user_timeline/46130870.rss")
@p)
;=> 16
And as you see, the call blocks until the deliver occurs. This is a fine way to transform
the callback into a blocking call, but we’d like a way to do this generically. Fortunately,
Download from Wow! eBook <www.wowebook.com>
When to use promises
267
most well-written RPC APIs follow the same form for their callback functions/methods,
so we can create a macro to wrap this up nicely in the following listing.
Listing 11.11 A macro for transforming a callback-based function to a blocking call
(defmacro cps->fn [f k]
`(fn [& args#]
(let [p# (promise)]
(apply ~f (fn [x#] (deliver p# (~k x#))) args#)
@p#)))
(def count-items (cps->fn tweet-items count))
(count-items "http://twitter.com/statuses/user_timeline/46130870.rss")
;=> 16
This is a simple solution to a common problem that you may have already encoun-
tered in your own applications.
11.7.3 Deterministic deadlocks
You can cause a deadlock in your applications by never delivering to a promise. One
possibly surprising advantage of using promises is that if a promise can deadlock, it’ll
deadlock deterministically. Because only a single thread can ever deliver on a promise,
only that thread will ever cause a deadlock. We can create a cycle in the dependencies
between two promises to observe a deadlock using the following code:
(def kant (promise))
(def hume (promise))
(dothreads!
#(do (println "Kant has" @kant) (deliver hume :thinking)))
(dothreads!
#(do (println "Hume is" @hume) (deliver kant :fork)))
The Kant thread is waiting for the delivery of the value for kant from the Hume
thread, which in turn is waiting for the value for hume from the Kant thread. Attempt-
ing either @kant or @hume in the REPL will cause an immediate deadlock. Further-
more, this deadlock will happen every time; it’s deterministic rather than dependent
on odd thread timings or the like. Deadlocks are never nice, but deterministic dead-
locks are better than nondeterministic.7
We’ve only touched the surface for the potential that promises represent. In fact,
the pieces that we’ve assembled in this section represent some of the basic building
blocks of dataflow (Van Roy 2004) concurrency. But any attempt to serve justice to
dataflow concurrency in a single section would be a futile effort. At its essence,
7
There are experts in concurrent programming who will say that naïve locking schemes are also deterministic.
Our simple example is illustrative, but alas it isn’t representative of a scheme that you may devise for your own
code. In complex designs where promises are created in one place and delivered in a remote locale, deter-
mining deadlock will naturally be more complex. Therefore, we’d like to use this space to coin a new phrase:
“determinism is relative.”
Download from Wow! eBook <www.wowebook.com>
268
CHAPTER 11 Mutation
dataflow deals with the process of dynamic changes in values causing dynamic changes
in dependent “formulas.” This type of processing finds a nice analogy in the way that
spreadsheet cells operate, some representing values and others dependent formulas
that change as the former also change.
Continuing our survey of Clojure’s parallelization primitives, we’ll next discuss
some of the functions provided in the core library.
11.8
Parallelism
In the previous two sections we built two useful macros as-futures and with-
promises, allowing you to parallelize a set of operations across numerous threads. But
Clojure has functions in its core library providing similar functionality named pmap,
pvalues, and pcalls, which we’ll cover briefly in this section.
11.8.1 pvalues
The pvalues macro is analogous to the as-futures macro, in that it executes an arbi-
trary number of expressions in parallel. Where it differs is that it returns a lazy
sequence of the results of all the enclosed expressions, as shown:
(pvalues 1 2 (+ 1 2))
;=> (1 2 3)
The important point to remember when using pvalues is that the return type is a lazy
sequence, meaning that your access costs might not always present themselves as
expected:
(defn sleeper [s thing] (Thread/sleep (* 1000 s)) thing)
(defn pvs [] (pvalues
(sleeper 2 :1st)
(sleeper 3 :2nd)
(keyword "3rd")))
(-> (pvs) first time)
; "Elapsed time: 2000.309 msecs"
;=> :1st
The total time cost of accessing the first value in the result of pvs is only the cost of its
own calculation. But accessing any subsequent element costs as much as the most
expensive element before it, which you can verify by accessing the last element:
(-> (pvs) last time)
; "Elapsed time: 4001.435 msecs"
;=> :3rd
This may prove a disadvantage if you want to access the result of a relatively cheap
expression that happens to be placed after a more costly expression. More accurately,
all seq values within a sliding window 8 are forced, so processing time is limited by the
most costly element therein.
8 Currently, the window size is N+2, where N is the number of CPU cores. But this is an implementation detail, so it’s enough to know only that the sliding window exists.
Download from Wow! eBook <www.wowebook.com>
Parallelism
269
11.8.2 pmap
The pmap function is the parallel version of the core map function. Given a function
and a set of sequences, the application of the function to each matching element hap-
pens in parallel:
(->> [1 2 3]
(pmap (comp inc (partial sleeper 2)))
doall
time)
; "Elapsed time: 2000.811 msecs"
;=> (2 3 4)
The total cost of realizing the result of mapping a costly increment function is again
limited by the most costly execution time within the aforementioned sliding window.
Clearly, in this contrived case, using pmap provides a benefit, so why not just replace
every call to map in your programs with a call to pmap? Surely this would lead to faster
execution times if the map functions were all applied in parallel, no? The answer is a
resounding: it depends. A definite cost is associated with keeping the resulting
sequence result coordinated, and to indiscriminately use pmap might actually incur
that cost unnecessarily, leading to a performance penalty. But if you’re certain that the
cost of the function application outweighs the cost of the coordination, then pmap
might help to realize performance gains. Only through experimentation will you be
able to determine whether pmap is the right choice.
11.8.3 pcalls
Finally, Clojure provides a pcalls function that takes an arbitrary number of func-
tions taking no arguments and calls them in parallel, returning a lazy sequence of the
results. The use shouldn’t be a surprise by now:
(-> (pcalls
#(sleeper 2 :1st)
#(sleeper 3 :2nd)
#(keyword "3rd"))
doall
time)
; "Elapsed time: 3001.039 msecs"
;=> (:1st :2nd :3rd)
The same benefits and trade-offs associated with pvalues and pmap also apply to
pcalls and should be considered before use.
Executing costly operations in parallel can be a great boon when used properly,
but should by no means be considered a magic potion guaranteeing speed gains.
There’s currently no magical formula for determining which parts of an application
can be parallelized—the onus is on you to determine your application’s parallel
potential. What Clojure provides is a set of primitives, including futures, promises,
pmap, pvalues, and pcalls as the building blocks for your own personalized paral-
lelization needs.
Download from Wow! eBook <www.wowebook.com>
270
CHAPTER 11 Mutation
In the next section, we’ll cover the ubiquitous Var, but from a different perspective
than we have thus far.
11.9
Vars and dynamic binding
The last reference type we’ll explore is perhaps the most commonly used—the Var.
Vars are most often used because of two main features:
Vars can be named and interned in a namespace.
Vars can provide thread-local state.
It’s through the second feature that Vars contribute most usefully to the reference
type landscape. The thread-local value of a Var by definition can only be read from or
written to a single thread, and thus provides the thread-safe semantics you’ve come to
expect from a Clojure reference type.
But before you can start experimenting with Vars at the REPL, we to need address
some consequences of the first feature. The other reference objects you’ve looked at
aren’t themselves named and so are generally stored in something with a name. This
means that when the name is evaluated, you get the reference object, not the value.
To get the object’s value, you have to use deref. Named Vars flip this around—evaluat-
ing their name gives the value, so if you want the Var object, you need to pass the
name to the special operator var.
With this knowledge in hand, you can experiment with an existing Var. Clojure
provides a Var named *read-eval*,9 so you can get its current value by evaluating its
name:
*read-eval*
;=> true
No deref needed, because *read-eval* is a named Var. Now for the Var object itself:
(var *read-eval*)
;=> #'clojure.core/*read-eval*
That’s interesting—when a named Var object is printed, it starts with #' and is then
followed by the fully qualified name of the Var. The #' reader feature expands to the
Var operator—it means the same thing:
#'*read-eval*
;=> #'clojure.core/*read-eval*
Now that you’ve seen how to refer to Var objects, you can look at how they behave.
The Var *read-eval* is one of those provided by Clojure that’s specifically meant to
be given thread-local bindings but by default has only a root binding. You should’ve
seen its root binding when you evaluated it earlier—by default, *read-eval* is bound
to true.
9 *read-eval* happens to be a Var that has a default configuration useful for this discussion about Vars —its
actual purpose is unimportant here.
Download from Wow! eBook <www.wowebook.com>



Vars and dynamic binding
271
11.9.1 The binding macro
The root binding of a Var can act as the base of a stack, with each thread’s local bind-
ings pushing onto that stack and popping off of it as requested. The most common
mechanism for pushing and popping thread-local bindings is the macro binding. It
takes one or more Var names and a value for each that will initialize the new binding
when it’s pushed. These bindings remain in effect until control passes out of the
binding macro, at which point they’re popped off the stack.
Here’s a simple example of a function that prints the current value of the Var
*read-eval*, either the root or thread-local value, whichever is currently in effect:
(defn print-read-eval []
(println "*read-eval* is currently" *read-eval*))
This function calls print-read-eval three times, the first and last of which will print
the root binding. The middle time, binding is in effect:
(defn binding-play []
Thread A
(print-read-eval)
(binding [*read-eval* false]
(print-read-eval))
Thread B
Var root
(print-read-eval))
binding
push binding
This results in the Var temporarily having a thread-
local value of false:
pop binding
(binding-play)
; *read-eval* is currently true
Thread C
; *read-eval* is currently false
push binding
; *read-eval* is currently true
This is a like thread B in figure 11.9, which also shows a
push binding
simpler scenario than thread A and a more complex
one than thread C.
pop binding
11.9.2 Creating a named Var
pop binding
Vars are most commonly created with the special oper-
ator def or one of the many macros that expands to a
Figure 11.9 Thread-local Var
form that has a def inside:
bindings. This illustration depicts
a single Var being used from three
defn—For putting a function in a Var
different threads. Each rounded
defmacro—For putting a macro in a Var
box is a Var binding, either thread-
local or root. Each star is the Var
defonce—For setting the value of an unbound Var
being deref'ed, with the solid
defmulti—For putting a multimethod in a Var
arrow pointing to the binding used.
The dotted lines point from a
There are a few others in clojure.core10 and many
thread-local binding to the next
more in contrib. What they have in common is that binding on the stack.
10 It's likely that starting with Clojure 1.3 Vars will only have the ability to take on thread-local values when defined using defdynamic or marked with metadata like ^{:dynamic true}. Throughout this book, we will
take the latter approach with high confidence that it will just work in 1.3.
Download from Wow! eBook <www.wowebook.com>
272
CHAPTER 11 Mutation
each of these will intern a Var in the current namespace. Clojure will search for the
named Var in the current namespace. If one is found, it’s used; otherwise, a new Var is
created and added to the namespace, and that one is used.11 The Var (specifically the
root binding of the Var) is bound to whatever value, function, or macro (and so on)
was given. The Var itself is returned:
(def favorite-color :green)
#'user/favorite-color
When a Var is printed, its fully qualified name is given, along with the namespace where
the Var is interned (user) and the Var’s name itself (favorite-color). These are pre-
ceded by #' because unlike the other reference types, a named Var is automatically
dereferenced when its name is evaluated—no explicit @ or call to deref is required:
favorite-color
;=> :green
So in order to refer to a Var instead of the value it’s bound to, you need to use #' or
the special form var, which are equivalent:
(var favorite-color)
;=> #'user/favorite-color
A Var can exist (or not exist) in any of four states. The precise state a Var is in can be
determined using the functions resolve, bound?, and thread-bound? as shown in
Table 11.1.
Table 11.1 Var states
Initialization mechanism
(resolve 'x)
(bound? #'x)
(thread-bound? #'x)
(def x)
#'user/x
false
false
(def x 5)
#'user/x
true
false
(binding [x 7] ...)
#'user/x
true
true
(with-local-vars [x 9] ...)
nil
true (bound? x)
true (thread-bound? x)
The first row of the table shows the results of resolve, bound?, and thread-bound?
when a var x is unbound. The remaining rows show how to change x to cause those
functions to return the values shown.
11.9.3 Creating anonymous Vars
Vars don’t always have names, nor do they need to be interned in a namespace. The
with-local-vars macro creates Vars and gives them thread-local bindings all at once,
but it won’t intern them. Instead, they’re bound to locals, which means that the
associated Var isn’t implicitly looked up by symbolic name. You need to use deref or
var-get to get the current value of the Var. Here’s an example of a Var x created and
11 Not all macros starting with def necessarily create or intern Vars. Some that don’t: defmethod, defrecord,
and deftype.
Download from Wow! eBook <www.wowebook.com>
Vars and dynamic binding
273
interned with def, and then a local x that shadows it and is bound to a new var via
with-local-vars:
(def x 42)
{:outer-var-value x
:with-locals (with-local-vars [x 9]
{:local-var x
:local-var-value (var-get x)})}
;=> {:outer-var-value 42,
:with-locals {:local-var #<Var: --unnamed-->,
:local-var-value 9}}
Within the body of the with-local-vars macro, the bound value can bet set using
(var-set <var> <value>), which will of course only affect the thread-local value. It’s
almost stunning how rarely with-local-vars is useful.
11.9.4 Dynamic scope
Vars have dynamic scope, which contrasts with the lexical scope of let locals. The most
obvious difference is that with a lexical local, you can easily see where it was initialized
by looking at the nested structure of the code. A Var, on the other hand, may have been
initialized by a binding anywhere earlier in the call stack, not necessarily nearby in the
code at all. This difference can create unexpectedly complex interactions and is one of
the few areas where Clojure does little to help you address such complexity.
An example of this complexity is shown by using the binding macro or any macro
built on top of it, such as with-precision and with-out-str. For example, we can
use the with-precision macro to conveniently set up the *math-context* Var:
(with-precision 4
(/ 1M 3))
;=> 0.3333M
We need to use with-precision here because if we don’t tell BigDecimal we’re okay
with it rounding off the result, it’ll refuse to return anything in this case:
(/ 1M 3)
; java.lang.ArithmeticException: Non-terminating decimal expansion;
; no exact representable decimal result.
With that in mind, can you see why with-precision isn’t doing its job in the next
snippet? The only thing that makes it different from the example that worked earlier
is we’re using map to produce a sequence of three numbers instead of just one:
(with-precision 4
(map (fn [x] (/ x 3)) (range 1M 4M)))
; java.lang.ArithmeticException: Non-terminating decimal expansion;
; no exact representable decimal result.
The problem is that map is lazy and therefore doesn’t call the function given to it
immediately. Instead, it waits until the REPL tries to print it, and then does the divi-
sion. Although the map and the function it calls are within the lexical scope of with-
binding, and with-binding itself uses a thread-local binding internally, it doesn’t care
Download from Wow! eBook <www.wowebook.com>
274
CHAPTER 11 Mutation
about lexical scope. When the division operation is performed, we’ve already left the
dynamic scope of the with-precision, and it no longer has any effect. The BigDecimal
behavior drops back to its default, and it throws an exception.
One way to solve this is to make sure that all the division is done before leaving the
dynamic scope. Clojure’s doall function is perfect for this:
(with-precision 4
(doall (map (fn [x] (/ x 3)) (range 1M 4M))))
;=> (0.3333M 0.6667M 1M)
One drawback is that it completely defeats map’s laziness. An alternate solution is to
have the function provided to map re-create, when it’s run, the dynamic scope in which
the function was created. Clojure provides a handy macro bound-fn to do exactly that:
(with-precision 4
(map (bound-fn [x] (/ x 3)) (range 1M 4M)))
;=> (0.3333M 0.6667M 1M)
Now the sequence being returned is still lazy, but before each item is computed, the
dynamic scope of *math-context* is re-created and the exception is avoided.
This kind of mismatch between a function definition that appears lexically inside a
form like with-precision or binding and yet has a different dynamic scope when
called doesn’t cause problems with lazy sequences alone. You may also see problems
with functions sent to Agents as actions or with the body of a future, because these are
executed in other threads outside the dynamic scope where they’re set up.
Problems related to dynamic scope aren’t even exclusive to Vars. The scope of a
try/catch is also dynamic and can have similarly unexpected behavior. For example,
with-open uses try/finally to close a file automatically when execution leaves its
dynamic scope. Failing to account for this can lead to an error when trying to write to
a closed file, because the dynamic scope of with-open has been left. Though bound-
fn can help make the dynamic scope of a Var borrow from its lexical scope, the only
way to deal with try/catch is to make sure everything is executed before leaving its
dynamic scope.
11.10 Summary
This has been the most complex chapter of the book. State management is a compli-
cated process that can quickly lose all semblance of sanity in the face of concurrent
modifications. Clojure’s main tenet is not to foster concurrency, but instead to pro-
vide the tools for the sane management of state. As a result of this focus, sane concur-
rency follows. Clojure also provides the building blocks for you to parallelize
computations across disparate threads of execution. From the expression-centric
future, to the function-centric set-once “variable” promise, to the core functions
pcalls, pvalues, and pmap, Clojure gives you the raw materials for your specialized
needs. Finally, we talked in depth about Clojure’s Var, dynamic binding, and the
mechanics of thread-locals.
The next chapter deals with performance considerations and how to make your
Clojure programs much faster.
Download from Wow! eBook <www.wowebook.com>
Part 5
Tangential
considerations
Some topics are so interesting and important that we must include them,
even if they don’t fit well in another chapter or warrant a chapter to themselves.
In this part, we’ll cover several such topics, including transient collections,
domain-specific languages, and testing.
Download from Wow! eBook <www.wowebook.com>
Download from Wow! eBook <www.wowebook.com>
Performance
This chapter covers
Type hints
Transients
Chunked sequences
Memoization
Understanding coercion
Now that we’ve spent a book’s worth of material learning the why and how of Clo-
jure, it’s high time we turned our attention to the subject of performance. There’s
a meme in programming that can be summarized as follows: make it work first,
then make it fast. Throughout this book, we’ve taught you the ways that Clojure
allows you to “make it work,” and now we’re going to tell how to make it fast.
In many cases, Clojure’s compiler will be able to highly optimize idiomatic Clo-
jure source code. But there are times when the form of your functions, especially in
interoperability scenarios, will prove to be ambiguous or even outright counter to
compiler optimizations. Therefore, we’ll lead you through optimization techniques
such as type hints, transients, chunked sequences, memoization, and coercion.
Using some combination of these techniques will help you approach, and some-
times exceed, the performance of Java itself.
The most obvious place to start, and the one you’re most likely encounter, is
type hinting—so this is where we’ll begin.
277
Download from Wow! eBook <www.wowebook.com>
278
CHAPTER 12 Performance
12.1
Type hints
The path of least resistance in Clojure often produces the fastest and most efficient
compiled code, but not always. The beauty of Clojure is that this path of least resis-
tance allows simple techniques for gaining speed via type hints. The first thing to
know about type hints is that they’re used to indicate that an object is an instance of
some class—never a primitive.
THE RULE OF TYPE HINTING Write your code so that it’s first and foremost cor-
rect; then add type-hint adornment to gain speed. Don’t trade the efficiency
of the program for the efficiency of the programmer.
12.1.1 Advantages of type adornment
There are epic debates about the virtues of static versus dynamic type systems; we
won’t engage in those arguments here. But there are a few advantages to a dynamic
type system like Clojure’s that also allows type hinting to occur after the bulk of devel-
opment. One such advantage is that in a static type system, the cost of changing argu-
ment lists is extended to all of the callers, whereas in Clojure the cost is deferred until
adornment time or even outright avoided.1 This scenario isn’t limited to the case of
function arguments in Clojure nor to statically typed languages, but instead to any
typed element. This dynamic type system provides an agile experience in general to
Clojure, which can later be optimized when there’s a need.
12.1.2 Type-hinting arguments and returns
If you recall from section 10.3, we created a function asum-sq that took an array of
floats and performed a sum of squares on its contents. Unfortunately, asum-sq wasn’t
as fast as it could’ve been. We can illuminate the cause of its inefficiency using a REPL
flag named *warn-on-reflection*, which by default is set to false:
(set! *warn-on-reflection* true)
;=> true
What this seemingly innocuous statement does is to signal to the REPL to report when
the compiler encounters a condition where it can’t infer the type of an object and
must use reflection to garner it at runtime. You’ll see a reflection warning by entering
asum-sq into the REPL:
(defn asum-sq [xs]
(let [dbl (amap xs i ret
(* (aget xs i)
(aget xs i)))]
(areduce dbl i ret 0
(+ ret (aget dbl i)))))
; Reflection warning - call to aclone can't be resolved.
; ...
1 Aside from the case where type hints don’t require client changes, the use of keyword arguments as seen in
section 7.1 can help to localize additional function requirements to only the callers needing them.
Download from Wow! eBook <www.wowebook.com>
Type hints
279
Though not terribly informative in and of itself, the fact that a reflection warning
occurs is portentous. Running the call to asum-sq in a tight loop verifies that some-
thing is amiss:
(time (dotimes [_ 10000] (asum-sq (float-array [1 2 3 4 5]))))
; "Elapsed time: 410.539 msecs"
;=> nil
Though the reflection warning didn’t point to the precise inefficiency, you can infer
where it could be given that Clojure deals with the java.lang.Object class across
function boundaries. Therefore, you can assume that the problem lies in the argu-
ment xs coming into the function as something unexpected. Adding two type hints to
xs and dbl (because it’s built from xs) might do the trick:
(defn asum-sq [ ^floats xs]
(let [^floats dbl (amap xs i ret
...
Rerunning the tight loop verifies that the assumption was correct:
(time (dotimes [_ 10000] (asum-sq (float-array [1 2 3 4 5]))))
; "Elapsed time: 17.087 msecs"
;=> nil
This is a dramatic increase in speed using a simple type hint that casts the incoming
array xs to one containing primitive floats. The whole range of array type hints is
shown next:
objects
floats
shorts
ints
doubles
bytes
longs
chars
booleans
The problems might still not be solved, especially if you want to do something with the
return value of asum-sq, as shown:
(.intValue (asum-sq (float-array [1 2 3 4 5])))
; Reflection warning, reference to field intValue can't be resolved.
;=> 55
This is because the compiler can’t garner the type of the return value and must there-
fore use reflection to do so. By hinting the return type of asum-sq, the problem goes
away:
(defn ^Float asum-sq [ ^floats xs]
...
(.intValue (asum-sq (float-array [1 2 3 4 5])))
;=> 55
With minor decoration on the asum-sq function, we’ve managed to increase its speed
as well as potentially increasing the speed of expressions downstream.
Download from Wow! eBook <www.wowebook.com>
280
CHAPTER 12 Performance
12.1.3 Type-hinting objects
In addition to allowing for the hinting of function arguments and return values, you
can also hint arbitrary objects. If you didn’t have control over the source to asum-sq,
then these reflection problem would be insurmountable when executing (.intValue
(asum-sq (float-array [1 2 3 4 5]))). But you can instead hint at the point of usage
and gain the same advantage as if asum-sq had been hinted all along:
(.intValue ^Float (asum-sq (float-array [1 2 3 4 5])))
;=> 55
All isn’t lost when you don’t own a piece of code causing performance problems,
because Clojure is flexible in the placement of type hints.
12.2
Transients
We’ve harped on you for this entire book about the virtues of persistent data struc-
tures and how wonderful they are. In this section, we’ll present an optimization tech-
nique provided by Clojure called transients, which offer a mutable view of a collection.
It seems like blasphemy, but we assure you there’s a good reason for their existence,
which we’ll discuss currently.
12.2.1 Ephemeral garbage
The design of Clojure is such that it presumes that the JVM is extremely efficient at
garbage collection of ephemeral (or short-lived) objects, and in fact it is. But as you
can imagine based on what you’ve learned so far, Clojure does create a lot of young
objects that are never again accessed, shown (in spirit) here:
(reduce merge [{1 3} {1 2} {3 4} {3 5}])
;=> {1 2, 3 5}
A naive implementation2 of reduce would build intermediate maps corresponding to
the different phases of accumulation. The accumulation of these short-lived instances
can in some circumstances cause inefficiencies, which transients are meant to address.
THE RULE OF TRANSIENTS Write your code so that it’s first and foremost cor-
rect using the immutable collections and operations; then, make changes to
use transients for gaining speed. But you might be better served by writing idi-
omatic and correct code and letting the natural progression of speed
enhancements introduced in new versions of Clojure take over. Spot optimi-
zations often quickly become counter-optimizations by preventing the lan-
guage/libraries from doing something better.
We’ll explore how you can use transients in the next section.
2 The actual implementation of reduce follows a reduce protocol that delegates to a smart “internal reduce”
mechanism that’s meant for data structures that know the most efficient way to reduce themselves.
Download from Wow! eBook <www.wowebook.com>
Transients
281
12.2.2 Transients compare in efficiency to mutable collections
Mutable objects generally don’t make new allocations during intermediate phases of
an operation on a single collection type, and comparing persistent data structures
against that measure assumes a lesser memory efficiency. But you can use transients to
provide not only efficiency of allocation, but often of execution as well. Take a function
zencat, intended to work similarly to Clojure’s concat, but with vectors exclusively:
(defn zencat1 [x y]
(loop [src y, ret x]
(if (seq src)
(recur (next src) (conj ret (first src)))
ret)))
(zencat1 [1 2 3] [4 5 6])
;=> [1 2 3 4 5 6]
(time (dotimes [_ 1000000] (zencat1 [1 2 3] [4 5 6])))
; "Elapsed time: 486.408 msecs"
;=> nil
The implementation is simple enough, but it’s not all that it could be. The effects of
using transients is shown next.
Listing 12.1 A concatenation function using transients
(defn zencat2 [x y]
(loop [src y, ret (transient x)]
Create transient
(if (seq src)
(recur (next src) (conj! ret (first src)))
Use transient conj!
(persistent! ret))))
Return persistent
(zencat2 [1 2 3] [4 5 6])
;=> [1 2 3 4 5 6]
(time (dotimes [_ 1000000] (zencat2 [1 2 3] [4 5 6])))
; "Elapsed time: 846.258 msecs"
;=> nil
Wait, what? It seems that by using transients, we’ve actually made things worse—but
have we? The answer lies in the question, “what am I actually measuring?” The timing
code is executing zencat2 in a tight loop. This type of timing isn’t likely representa-
tive of actual use, and instead highlights an important consideration: the use of
persistent! and transient, though constant time, aren’t free. By measuring the use
of transients in a tight loop, we’ve introduced a confounding measure, with the dispa-
rate cost of using transients compared to the cost of concatenating two small vectors.
A better benchmark would instead be to measure the concatenation of larger vectors,
therefore minimizing the size-relative cost of transients:
(def bv (vec (range 1e6)))
(first (time (zencat1 bv bv)))
; "Elapsed time: 181.988 msecs"
Download from Wow! eBook <www.wowebook.com>
282
CHAPTER 12 Performance
;=> 0
(first (time (zencat2 bv bv)))
; "Elapsed time: 39.353 msecs"
;=> 0
In the case of concatenating large vectors, the use of transients is ~4.5 times faster
than the purely functional approach. Be careful how you use transients in your own
applications, because as you saw, they’re an incredible boon in some cases, and quite
the opposite in others. Likewise, be careful designing performance measurements,
because they might not always measure what you think.
Because transients are a mutable view of a collection, you should take care when
exposing outside of localized contexts. Fortunately, Clojure doesn’t allow a transient
to be modified across threads and will throw an exception if attempted. But it’s easy
enough to forget that you’re dealing with a transient and return it from a function.
That’s not to say that you couldn’t return a transient from a function—it can be use-
ful to build a pipeline of functions that work in concert against a transient structure.
Instead, we ask that you remain mindful when doing so.
The use of transients can help to gain speed in many circumstances. But be
mindful of the trade-offs when using them, because they’re not cost-free operations.
12.3
Chunked sequences
With the release of Clojure 1.1, the granularity of Clojure’s laziness was changed from
a one-at-a-time model to a chunk-at-a-time model. Instead of walking through a
sequence one node at a time, chunked sequences provide a windowed view (Boncz
2005) on sequences some number of elements wide, as illustrated here:
(def gimme #(do (print \.) %))
(take 1 (map gimme (range 32)))
You might expect that this snippet would print (.0) because we’re only grabbing the
first element, and if you’re running Clojure 1.0, that’s exactly what you’d see. But in
later versions, the picture is different:
;=> (................................0)
If you count the dots, you’ll see exactly 32, which is what you’d expect given the state-
ment from the first paragraph. Expanding a bit further, if you increase the argument
to range to be 33 instead, you’ll see the following:
(take 1 (map gimme (range 33)))
;=> (................................0)
Again you can count 32 dots. Moving the chunky window to the right is as simple as
obtaining the 33rd element:
(take 1 (drop 32 (map gimme (range 64))))
;=> (................................................................32)
Download from Wow! eBook <www.wowebook.com>





Chunked sequences
283
Figure 12.1 Clojure’s chunked
sequences allow a windowed view
of a sequence. This model is more
efficient, in that it allows for larger
swaths of memory to be reclaimed
by the garbage collector and better
cache locality in general. There’s a
cost to total laziness, but often the
benefit gained is worth the cost.
As we showed in chapter 5, Clojure’s sequences are implemented as trees fanning out
at increments of 32 elements per node. Therefore, chunks of size 32 are a natural fit,
allowing for the garbage collection of larger chunks of memory as seen in figure 12.1.
You might be worried that chunked sequences have squashed the entire point of
lazy sequences, and for small sequences this would be correct. But the benefits of lazy
sequences are striking when dealing with cyclopean magnitudes or sequences larger
than memory. Chunked sequences in the extreme cases are an incredible boon
because not only do they make sequence functions more efficient overall, they still ful-
fill the promise of lazy sequences: avoiding full realization of interim results.
12.3.1 Regaining one-at-a-time laziness
There are legitimate concerns about this chunked model, and one such concern is
the desire for a one-at-a-time model to avoid exploding computations. Assuming that
you have such a requirement, one counterpoint against chunked sequences is that of
building an infinite sequence of Mersenne primes.3 Implicit realization of the first 32
Mersenne primes through chunked sequences will finish long after the Sun has died.
But you can use lazy-seq to create a function seq1 that can be used to restrict (or
dechunkify, if you will) a lazy sequence and enforce the one-at-a-time model, as in the
following listing.
Listing 12.2 A dechunkifying seq1 function
(defn seq1 [s]
(lazy-seq
(when-let [[x] (seq s)]
(cons x (seq1 (rest s))))))
(take 1 (map gimme (seq1 (range 32))))
;=> (.0)
(take 1 (drop 32 (map gimme (seq1 (range 64)))))
;=> (.................................32)
3 See http://en.wikipedia.org/wiki/Mersenne_Primes.
Download from Wow! eBook <www.wowebook.com>








284
CHAPTER 12 Performance
You can again safely generate your lazy, infinite
sequence of Mersenne primes. The world
rejoices. But seq1 eliminates the garbage-
collection efficiencies of the chunked model and
again regressed back to that shown in figure 12.2.
Clojure may one day provide an official API
for one-at-a-time lazy sequence granularity, but
for now seq1 will suffice. We advise that you
Figure 12.2 Using seq1, you can again
instead stick to the chunked model, because
reclaim the one-at-a-time sequence model.
Though not as efficient as the chunked
you’ll probably never notice its effects during
model, it does again provide total
normal usage.
sequence laziness.
12.4 Memoization
As we mentioned briefly in section 11.4, memoization (Michie 1968) refers to storing
a cache of values local to a function so that its arguments can be retrieved rather than
calculated on every call. The cache is a simple mapping of a given set of arguments to
a previously calculated result. In order for this to work, the memoized function must
be referentially transparent, which we discussed in section 7.1. Clojure comes with a
memoize function that can be used to build a memoized version of any referentially
transparent function, as shown:
(def gcd (memoize
(fn [x y]
(cond
(> x y) (recur (- x y) y)
(< x y) (recur x (- y x))
:else x))))
(gcd 1000645475 56130776629010010)
;=> 215
Defining a “greatest common denominator” function using memoize helps to speed
subsequent calculations using the arguments 1000645475 and 56130776629010010.
The function memoize wraps another function4 in a cache lookup pass-through func-
tion and returns it. This allows you to use memoize on literally any referentially trans-
parent function. The operation of the memoize is analogous to, but not exactly the
operation of Clojure’s lazy sequences that cache the results of their realized portions.
This general technique can be useful, but the indiscriminate storage provided by
memoize might not always be appropriate. Therefore, we’ll take a step back and devise
a way to generalize the operation of memoization into useful abstractions and build a
framework for employing caching strategies more appropriate to the domain at hand.
4
You might’ve noticed that we explicitly bound the Var gcd to the memoization of an anonymous function but
then used recur for implementing the function body. This approach suffers from the inability to cache the
intermediate results (Norvig 1991) of gcd. We leave the solution to this short-coming as an exercise for the
reader.
Download from Wow! eBook <www.wowebook.com>
Memoization
285
Similar to Haskell’s typeclasses, Clojure’s protocols define a set of signatures pro-
viding a framework of adherence to a given set of features. This section serves a three-
fold goal:
Discussion of memoization
Discussion of protocol design
Discussion of abstraction-oriented programming
12.4.1 Re-examining memoization
As mentioned in section 11.4, memoization is a personal affair, requiring a certain
domain knowledge to perform efficiently and correctly. That’s not to say that the core
memoize function is useless, only that the base case doesn’t cover all cases. In this sec-
tion, we’ll define a memoization protocol in terms of the primitive operations:
lookup, has?, hit, and miss. Instead of providing a memoization facility that allows
the removal of individual cache items, it’s a better idea to provide one that allows for
dynamic cache-handling strategies.5
12.4.2 A memoization protocol
The protocol for a general-purpose cache feature is provided in the following listing.
Listing 12.3 A protocol for caching
(defprotocol CacheProtocol
(lookup [cache e])
(has? [cache e] )
(hit [cache e])
(miss [cache e ret]))
The function lookup retrieves the item in the cache if it exists. The function has? will
check for a cached value. The function hit is called when an item is found in the
cache, and miss is called when it’s not. If you’re familiar with creating Java interfaces,
the process of creating a protocol should be familiar. Moving on, we next implement
the core memoize functionality.
Listing 12.4 A basic cache type
(deftype BasicCache [cache]
CacheProtocol
(lookup [_ item]
(get cache item))
(has? [_ item]
(contains? cache item))
(hit [this item] this)
(miss [_ item result]
(BasicCache. (assoc cache item result))))
5 This section is motivated by the fantastic work of the brilliant Clojurians Meikel Brandmeyer, Christophe
Grand, and Eugen Dück summarized at http://kotka.de/blog/2010/03/memoize_done_right.html.
Download from Wow! eBook <www.wowebook.com>
286
CHAPTER 12 Performance
The BasicCache takes a cache on construction used for its internal operations. Test-
ing the basic caching protocol in isolation shows:
(def cache (BasicCache. {}))
(lookup (miss cache '(servo) :robot) '(servo))
;=> :robot
In the case of a miss, the item to be cached is added and a new instance of BasicCache
(with the cached entry added) is returned for retrieval using lookup. This is a simple
model for a basic caching protocol, but not terribly useful in isolation. We can go fur-
ther by creating an auxiliary function through, meaning in effect, “pass an element
through the cache and return its value”:
(defn through [cache f item]
(if (has? cache item)
(hit cache item)
(miss cache item (delay (apply f item)))))
With through, the value corresponding to a cache item (function arguments in this
case) would either be retrieved from the cache via the hit function, or calculated and
stored via miss. You’ll notice that the calculation (apply f item) is wrapped in a
delay call instead of performed outright or lazily through an ad hoc initialization
mechanism. The use of an explicit delay in this way helps to ensure that the value is
calculated only on first retrieval. With these pieces in place, we can then create a
PluggableMemoization type, as shown next.
Listing 12.5 A type implementing pluggable memoization
(deftype PluggableMemoization [f cache]
CacheProtocol
(has? [_ item] (has? cache item))
(hit [this item] this)
(miss [_ item result]
(PluggableMemoization. f (miss cache item result)))
(lookup [_ item]
(lookup cache item)))
The purpose of the PluggableMemoization type is to act as a delegate to an underlying
implementation of a CacheProtocol occurring in the implementations for hit, miss,
and lookup. Likewise, the PluggableMemoization delegation is interposed at the
protocol points to ensure that when utilizing the CacheProtocol, the Pluggable-
Memoization type is used and not the BasicCache. We’ve made a clear distinction
between a caching protocol fulfilled by BasicCache and a concretized memoization
fulfilled by PluggableMemoization and through. With the creation of separate abstrac-
tions, you can use the appropriate concrete realization in its proper context. Clojure
programs will be composed of various abstractions. In fact, the term abstraction-oriented
programming is used to describe Clojure’s specific philosophy of design.
The original manipulable-memoize function from section 11.4 is modified in the
following listing to conform to our memoization cache realization.
Download from Wow! eBook <www.wowebook.com>
Understanding coercion
287
Listing 12.6 A function for applying pluggable memoization to a function
(defn memoization-impl [cache-impl]
(let [cache (atom cache-impl)]
(with-meta
(fn [& args]
(let [cs (swap! cache through (.f cache-impl) args)]
@(lookup cs args)))
{:cache cache})))
If you’ll recall from the implementation of the through function, we stored delay
objects in the cache requiring they be deferenced when looked up. Returning to our
old friend the slowly function, we can exercise the new memoization technique as
shown:
(def slowly (fn [x] (Thread/sleep 3000) x))
(def sometimes-slowly (memoization-impl
(PluggableMemoization.
slowly
(BasicCache. {}))))
(time [(sometimes-slowly 108) (sometimes-slowly 108)])
; "Elapsed time: 3001.611 msecs"
;=> [108 108]
(time [(sometimes-slowly 108) (sometimes-slowly 108)])
; "Elapsed time: 0.049 msecs"
;=> [108 108]
You can now fulfill your personalized memoization needs by implementing pointed real-
izations of CacheProtocol, plugging them into instances of PluggableMemoization,
and applying them as needed via function redefinition, higher-order functions, or
dynamic binding. Countless caching strategies can be used to better support your needs,
each displaying different characteristics, or if needed your problem may call for some-
thing wholly new.
We’ve only scratched the surface of memoization in this section in favor of provid-
ing a more generic substrate on which to build your own memoization strategies.
Using Clojure’s abstraction-oriented programming techniques, your own programs
will likewise be more generic and be built largely from reusable parts.
12.5
Understanding coercion
Although Clojure is a dynamically typed language, it does provide mechanisms for
specifying value types. The first of these mechanisms, type hints, was covered in sec-
tion 12.1. The second, coercion, is the subject of this section. Although the nature of
type hints and coercion are similar, their intended purposes are quite different. In the
case of coercion, its purpose is to get at the primitive data type for a value, which we’ll
show next.
Download from Wow! eBook <www.wowebook.com>
288
CHAPTER 12 Performance
12.5.1 First rule of coercion: don’t
Clojure’s compiler is sophisticated enough that in many ways it’ll be unnecessary to
coerce values into primitives. It’s often better to start with a function or code block
devoid of coercions. Unless your specific application requires the utmost speed in exe-
cution, it’s better to stick with the version that favors simplicity over the alternative.
But should you decide that coercion might be the choice for you, then this section will
provide guidance.
12.5.2 Corollary: you’re probably not doing it right
If you’ve determined that coercion can help, then it’s worth stressing that you have to
be careful when going down that road. In many cases with coercion, the act of adding
it can actually slow your functions. The reason lies in the nature of Clojure. Functional
composition leads to passing arguments back and forth between pieces, and in the cir-
cumstance of coercion you’re just boxing and unboxing6 from one call to the next.
This particular circumstance is especially devious within the body of a loop, and fol-
lows the same performance degradations observed with Java. Clojure’s unboxing is an
explicit7 operation performed using the coercion functions, so there’s a speck of light
there. Unfortunately, autoboxing is still a danger and should be avoided if speed is a
concern, as we’ll explore now:
(defn occur-count [words]
(let [res (atom {})]
(doseq [w words] (swap! res assoc w (+ 1 (@res w 0))))
@res))
(defn roll [n d]
(reduce + (take n (repeatedly #(inc (rand-int d))))))
(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4055.505 msecs"
The function occur-count will return a map of the occurrence counts8 found in a
given sequence. This fairly straightforward implementation uses the function roll to
populate a sequence with a million simulated rolls of three six-sided dice. But four sec-
onds seems like a long time to wait, so perhaps we can speed things up by using coer-
cions. An initial attempt to gain speed may be to pull out the stored count from the
table and coerce it into an int:
(defn occur-count [words]
(let [res (atom {})]
(doseq [w words]
(let [v (int (@res w 0))]
(swap! res assoc w (+ 1 v))))
@res))
6
Autoboxing is the automatic conversion the Java compiler makes between the primitive types and their cor-
responding object wrapper classes.
7
Except when directly or indirectly (via inlining or a macro body) calling a method.
8
Clojure has a function frequencies that does this, so we provide occur-count for illustrative purposes only.
Download from Wow! eBook <www.wowebook.com>
Understanding coercion
289
(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4385.8 msecs"
Well, that didn’t work. The reason for a decrease in speed is that although we’re spec-
ifying the type at the outer loop, we haven’t reduced the need to box and unbox that
value further downstream in the roll function. We might then be led to try and opti-
mize the roll function too:
(defn roll [n d]
(let [p (int d)]
(reduce + (take n (repeatedly #(inc (rand-int p)))))))
(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 4456.393 msecs"
;=> nil
Again we’ve made matters worse and have spread the problems over the surface of the
entire code. Being adventurous, we grasp for straws and attempt to force integer arith-
metic with roll by using the unchecked-inc function:
(defn roll [n d]
(let [p (int d)]
(reduce + (take n (repeatedly #(unchecked-inc (rand-int p)))))))
(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
Go ahead and run that in the REPL, then go get some coffee and a bagel. Toast the
bagel. Eat the bagel. Drink the coffee. By that time, you might’ve received a result.
So what happened? In an attempt to be clever, we’ve confused the Clojure com-
piler into near unconsciousness. Instead of making direct calls to Clojure’s math func-
tions, we’re now making calls indirectly via Java reflection! You can observe this by
setting *warn-on-reflection* to true and reentering roll.
How can we speed things up? The problem isn’t with coercion itself, but instead
with the implementations of roll and occur-count. You can observe significant
speed-ups by rethinking your original implementations first and then resorting to
coercion second. The use of coercion should always be preceded by a reevaluation of
your implementation, because often by doing so you can eliminate the need for coer-
cion altogether, as shown next.
Listing 12.7 Using coercion to gain speed
(defn roll [n d]
(loop [n (int n), sum 0]
Coerce n to int
(if (zero? n)
sum
(recur (dec n) (+ sum (inc (rand-int d)))))))
(defn occur-count [words]
(reduce #(assoc %1 %2 (inc (%1 %2 0))) {} words))
(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))
; "Elapsed time: 937.346 msecs"
;=> nil
Download from Wow! eBook <www.wowebook.com>
290
CHAPTER 12 Performance
By refactoring the original functions, we’ve gained a five-fold increase in speed and
yet used only a single coercion. Additionally, we’ve managed to make the new imple-
mentation faster while also maintaining clarity. This should be a general goal when
writing your Clojure code, and when forced to make a trade between the two, it might
be a good idea to favor clarity.
12.5.3 Second rule of coercion: don’t
In the previous example, there’s too much noise in collection and sequence operations
for primitive coercion to help much. This goes to show that it’s important to remember
that the Clojure compiler will often do a better job at optimization than you.
12.5.4 Third rule of coercion: coerce a stable local
When coercing a local to a primitive type, it’s tempting to do so at the point of use,
but this practice should be avoided. A good rule of thumb for coercion is to coerce
only within a local context via let, binding, or loop. This provides a stable value point
for the primitive, allowing you to reuse that same local elsewhere in the same function
without having to again coerce at different points of use. This can be illustrated by the
following:
(defn mean
"Takes a sequence of integers and returns their mean value"
[sq]
(let [length (int (count sq))]
(if (zero? length)
0
(/ (int (reduce + sq)) length))))
The length value has been bound in the let, allowing it to be reused twice within the
body of the function. This allows for a cleaner implementation than the alternative,
which coerces the results of (count sq) in multiple places. Using this advice and the
fact that Clojure provides lexical scope by default, you can also avoid the need to
define a name-mangled local by instead using let to rebind original argument names
to coerced values (defn [x] (let [x (int x)] ...)).
12.5.5 Fourth rule of coercion: watch your sizes
Primitive type coercions in Clojure act the same as type truncation in Java. If a given
value is coerced into a type that can’t hold a value of its magnitude, then data loss will
occur, and in the case of integer overflow, exceptions will be thrown.
12.5.6 Fifth rule of coercion: truncate only as a goal
By default, Clojure doesn’t limit the accuracy of mathematical operations, but this can
occur when using coercion. There will be many instances in your own projects when
speed is more important than accuracy in mathematical operations. Likewise, there
will also be times when truncation is necessary, especially when dealing with Java
library methods that take primitive types:
Download from Wow! eBook <www.wowebook.com>
Summary
291
(Math/round 1.23897398798929929872987890030893796768727987138M)
; java.lang.IllegalArgumentException:
; No matching method found: round
When a method or function isn’t overloaded, the Clojure compiler can determine
whether an argument can be coerced to a primitive type and will do so if able. The
preceding issue exception arises from the fact that Math/round is overloaded, taking
either a float or double typed argument. Therefore, you have to explicitly use coer-
cion to truncate the argument:
(Math/round (float 1.23897398798929929872987890030893796768727987138M))
;=> 1
Our goal in using the truncating operation float was to get a result that we knew
wouldn’t be affected by truncation. But many instances will arise when truncation will
affect your results and will often do so to your detriment. Therefore, it’s best to be
wary when using coercion, because it propagates inaccuracies. It’s best to limit its
usage when truncation is desired and document vigorously when it’s absolutely
needed for speed.
Coercion can be an effective tool in your Clojure applications, but take care to be
sure you understand the caveats. If you take away one lesson from this section, let it be
this: do not rush to coercion.
12.6
Summary
Clojure provides numerous ways to gain speed in your applications. Using some com-
bination of type hints, transients, chunked sequences, memoization, and coercion,
you should be able to achieve noticeable performance gains. Like any powerful tool,
these performance techniques should be used cautiously and thoughtfully. But once
you’ve determined that performance can be gained, their use is minimally intrusive
and often natural to the unadorned implementation.
In the final chapter, we’ll cover a number of ways that the Clojure way of thinking
might be different from what you’re accustomed to. The discussion therein, when
explored with an open mind, will change the way that you write software.
Download from Wow! eBook <www.wowebook.com>
Clojure changes
the way you think
This chapter covers
DSLs
Testing
A lack of design patterns
Error handling and debugging
Fare thee well
In this final chapter, we cover some tangential topics that you might already be
familiar with, but perhaps not from a Clojure perspective. Our discussion will start
with domain-specific languages (DSLs) and the unique way that Clojure applica-
tions are built from a layering of unique application-specific DSLs. Next, you’re
unlikely to be ignorant of the general push toward a test-driven development
(TDD) philosophy, with a special focus on unit testing. We’ll explore why Clojure is
especially conducive to unit testing and why it’s often unnecessary. Next, whether
you agree with the cult of design patterns or not, it’s inarguable that patterns have
changed the way that object-oriented software is designed and developed. The clas-
sical design patterns are often invisible, or at times outright nonexistent in Clojure
292
Download from Wow! eBook <www.wowebook.com>
DSLs
293
code, which we’ll discuss in this chapter. As we’ll then show, error handling in Clojure
flows in two directions: from inner functions to outer via exceptions, and from outer
functions in via dynamic bindings. Finally, we’ll explore how having the entire lan-
guage at your disposal can help to change the way that your debugging occurs. We
hope that by the time you’ve finished this chapter, you’ll agree—Clojure changes the
way you think about programming.
13.1 DSLs
Lisp is not the right language for any particular problem. Rather, Lisp encourages
one to attack a new problem by implementing new languages tailored to that
problem.
—“Lisp: A Language for Stratified Design” (Abelson 1988)
In chapter 8, we explored the notion of a domain-specific language for describing
domains. This meta-circularity, while playful, was meant to make a subtle point: Clo-
jure blurs, and often obliterates, the line between DSL and API. When a language is
built from the same data structures that the language itself manipulates, it’s known as
homoiconic (Mooers 1965). When a programming language is homoiconic, it’s simple
to mold the language into a form that bridges the gap between the problem and solu-
tion domains. When designing DSLs in Clojure, it’s important to determine when the
existing language facilities will suffice (Raymond 2003) and when it’s appropriate to
create one from whole cloth (Ghosh 2010). In this section we’ll do both and provide a
little discussion about each.
13.1.1 A ubiquitous DSL
The declarative language SQL is among the most widespread DSLs in use today. In sec-
tion 1.2 we showed a simple Clojure DSL, which provided a simple subset of the SELECT
statement that created a representational SQL string. Though that particular example
was meant to be instructive, Clojure provides a comprehensive library for relational
algebra, on which SQL is based (Date 2009). Imagine a dataset of the following:
(def artists
#{{:artist "Burial" :genre-id 1}
{:artist "Magma" :genre-id 2}
{:artist "Can" :genre-id 3}
{:artist "Faust" :genre-id 3}
{:artist "Ikonika" :genre-id 1}
{:artist "Grouper"}})
(def genres
#{{:genre-id 1 :genre-name "Dubstep"}
{:genre-id 2 :genre-name "Zeuhl"}
{:genre-id 3 :genre-name "Prog"}
{:genre-id 4 :genre-name "Drone"}})
You can try Clojure’s relational functions by entering the examples shown in the fol-
lowing listing.
Download from Wow! eBook <www.wowebook.com>
294
CHAPTER 13 Clojure changes the way you think
Listing 13.1 Examples of Clojure’s relational algebra functions
(require '[clojure.set :as ra])
(def ALL identity)
(ra/select ALL genres)
;=> #{{:genre-id 4, :genre-name "Drone"}
{:genre-id 3, :genre-name "Prog"}
{:genre-id 2, :genre-name "Zeuhl"}
{:genre-id 1, :genre-name "Dubstep"}}
(ra/select #(#{1 3} (:genre-id %)) genres)
;=> #{{:genre-id 3, :genre-name "Prog"}
{:genre-id 1, :genre-name "Dubstep"}}
(take 2 (ra/select ALL (ra/join artists genres)))
;=> #{{:artist "Burial", :genre-id 1, :genre-name "Dubstep"}
{:artist "Magma", :genre-id 2, :genre-name "Zeuhl"}}
The relational functions in clojure.set are a perfect example of the way that Clojure
blurs the line between API and DSL. No macro tricks are involved, but through the
process of functional composition, the library provides a highly expressive syntax
matching closely (Abiteboul 1995) that of SQL itself. Though you might be tempted to
create a custom query language for your own application(s), there are times when the
relational functions are exactly what you need. Your time might be better spent solv-
ing actual problems, one of which we’ll cover in the following section.
13.1.2 Putting parentheses around the specification
Many applications deal in measurements of differing units. For example, it’s widely
known that the U.S. works almost exclusively in English units of measure, whereas
most of the rest of the planet works in SI, or metric units. To convert1 from one to the
other isn’t an arduous task and can be handled easily with a set of functions of this
general form:
(defn meters->feet [m] (* m 3.28083989501312))
(defn meters->miles [m] (* m 0.000621))
(meters->feet 1609.344)
;=> 5279.9999999999945
(meters->miles 1609.344)
;=> 0.999402624
This approach will certainly work if only a few functions define the extent of your
conversion needs. But if your applications are like ours, then you probably need to
convert to and from differing units of measure of many different magnitudes. You
may also need to convert back and forth between units of time, dimension, orienta-
tion, and a host of others. Therefore it’d be nice to be able to write a specification of
unit conversions (Hoyte 2008) as a Clojure DSL and use its results as a low-level layer
1 A spectacular general-purpose JVM language named Frink excels at conversions of many different units. We
highly advocate exploring Frink at your next available opportunity: http://futureboy.us/frinkdocs/.
Download from Wow! eBook <www.wowebook.com>
DSLs
295
for high-layer application specifics. This is precisely the nature of Lisp development in
general—each level in an application provides the primitive abstractions for the levels
above it.
In this section, we’re going to create a small specification and then convert it into a
Clojure DSL using a technique coined by Rainer Joswig as “putting parentheses
around the specification.”
DEFUNITS
An ideal representation for a unit-conversion specification language would be simple:
Our base unit of distance is the meter. There are 1,000 meters in a kilometer. There
are 100 centimeters in a meter. There are 10 millimeters in a centimeter. There are
3.28083 feet in a meter. And finally, there are 5,280 feet in a mile.
Of course, to make sense of free text is a huge task in any language, so it behooves us
to change it so that it’s easier to reason about programmatically, but not so much that
it’s cumbersome for someone attempting to describe unit conversions. As a first pass,
we’ll try to group the most obvious parts using some Clojure syntactical elements:
(Our base unit of distance is the :meter
[There are 1000 :meters in a :kilometer]
[There are 100 :centimeters in a :meter]
[There are 10 :millimeters in a :centimeter]
[There are 3.28083 :feet in a :meter]
[There are 5280 :feet in a :mile])
This specification is starting to look a little like Clojure code, but it would still be diffi-
cult to parse this into a usable form. Likewise, it’ll be difficult for the person writing
the specification to use the correct terms, avoid spelling mistakes, properly punctuate,
and so forth. In a word, this form is still not useful. It’d be ideal if we could make this
into a form that’s still recognizable to both Clojure and a conversion expert. We’ll try
one more time:
(define unit of distance
{:m 1,
:km 1000,
:cm 1/100,
:mm [1/10 of a :cm],
:ft 0.3048,
:mile [is 5280 :ft]})
This almost looks like Clojure source code, except for a few minor details. We’ve
changed the measure of feet from an “in a” relationship to a relative one with regard
to the meter base unit. Also, a vector indicates the use of a different relative unit,
keeping the DSL regular in its meaning between one conversion and the next and
providing a way to describe intermediate relative units of measure. Those definitions
look like a map, so we should write a utility function that takes a unit and a map like
the preceding one and returns the number of units it takes to compose the base unit.
Download from Wow! eBook <www.wowebook.com>
296
CHAPTER 13 Clojure changes the way you think
Listing 13.2 A function for calculating compositional units of a base unit
(defn relative-units [u units]
(let [spec (u units)]
Look up unit spec
(if (nil? spec)
(throw (Exception. (str "Undefined unit " u)))
(if (vector? spec)
Deconstruct relative spec
(let [[conv to] spec]
(* conv
Multiply relative units
(relative-units to units)))
spec))))
The function relative-units goes through the map units looking up units and mul-
tiplying their compositional values. When it finds an indirect specification (such as
millimeters defined in terms of centimeters), it traverse the chain of indirect refer-
ences multiplying the factors along the way, as shown:
(relative-units :m {:m 1 :cm 100 :mm [10 :cm]})
;=> 1
(relative-units :cm {:m 1 :cm 100 :mm [10 :cm]})
;=> 100
(relative-units :mm {:m 1 :cm 100 :mm [10 :cm]})
;=> 1000
We changed the unit conversions map to remove the natural language phrase “in a,”
because English isn’t good for a DSL. Natural language often lacks the precision that a
simple yet regular form has. Now that we have the auxiliary function created, we’d like
to create a macro to interpret the unit specification as shown:
(defunits-of distance :m
:km 1000
:cm 1/100
:mm [1/10 :cm]
:ft 0.3048
:mile [5280 :ft])
This is a simplification versus the original verbal form of the conversion specification.
This final form is indubitably more conducive to parsing, yet doesn’t appreciably sacri-
fice readability. The implementation of the defunits-of macro is presented in the
following listing.
Listing 13.3 A defunits-of macro
(defmacro defunits-of [name base-unit & conversions]
(let [magnitude (gensym)
unit (gensym)
Create
units-map (intò{~base-unit 1}
units map
(map vec (partition 2 conversions)))]
`(defmacro ~(symbol (str "unit-of-" name))
[~magnitude ~unit]
`(* ~~magnitude
Multiply magnitude
~(case ~unit
by target unit
Download from Wow! eBook <www.wowebook.com>
DSLs
297
~@(mapcat
Unroll unit conversions
(fn [[u# & r#]
into case lookup
`[~u# ~(relative-units u# units-map)])
units-map))))))
The macro defunits-of is different than any macro that you’ve seen thus far, but it’s
typical for macros that expand into another macro definition. In this book you’ve yet
to see a macro that builds another macro and uses multiple levels of nested2 syntax-
quotes. You won’t likely see macros of this complexity often, but in this case we use
nested syntax-quotes so that we can feed structures from the inner layers of the nested
macros to the outer layers, processing each fully before proceeding. At this point, we
can now run a call to the defunits-of macro with the simplified metric to English
units conversion specification to define a new macro named unit-of-distance:
(unit-of-distance 1 :m)
;=> 1
(unit-of-distance 1 :mm)
;=> 1/1000
(unit-of-distance 1 :ft)
;=> 0.3048
(unit-of-distance 1 :mile)
;=> 1609.344
Perfect! Everything is relative to the base unit :m, just as we’d like (read as “how many
meters are in a _”). The generated macro unit-of-distance allows you to work in
your given system of measures relative to a standard system without loss of precision or
the need for a bevy of awkward conversion functions. To calculate the distance a home
run hit by the Orioles’ Matt Wieters travels in Canada is a simple call to (unit-of-
distance 441 :ft) away. The expansion of the distance specification given as
(defunits-of distance :m ...) looks approximately like the following:
(defmacro unit-of-distance [G__43 G__44]
`(* ~G__43
(case ~G__44
:mile 1609.344
:km 1000
:cm 1/100
:m 1
:mm 1/1000
:ft 0.3048)))
The defunits-of macro is an interpreter of the unit-conversion DSL, which generates
another macro unit-of-distance that performs a straightforward lookup of relative
unit values. Amazingly, the expansion given by (macroexpand '(unit-of-distance 1
:cm)) is that of a simple multiplication (* 1 1/100). This is an awe-inspiring revela-
tion. What we’ve managed to achieve is to fuse the notion of compilation and evalua-
2 We talked briefly about making sense out of nested syntax-quotes in section 8.1. However, you’re not likely to
need them very often.
Download from Wow! eBook <www.wowebook.com>
298
CHAPTER 13 Clojure changes the way you think
tion by writing a relative units of measure “mini-language” that’s interpreted into a
simple multiplication at compile time!
This is nothing new; Lisp programmers have known about this technique for
decades, but it never ceases to amaze. There’s one downside to our implementation—
it allows for circular conversion specifications (seconds defined in terms of minutes,
which are then defined in terms of seconds), but this can be identified and handled in
relative-units if you’re so inclined.
13.1.3 A note about Clojure’s approach to DSLs
DSLs and control structures implemented as macros in Common Lisp tend to be writ-
ten in a style more conducive to macro writers. But Clojure macros such as defunit-
of, cond, and case are idiomatic in their minimalism; their component parts are
paired and meant to be grouped through proper spacing. Clojure macro writers
should understand that the proliferation and placement of parentheses are legitimate
concerns for some, and as a result you should strive to reduce the number whenever
possible. Why would you explicitly group your expressions when their groupings are
only a call to partition away?
CLOJURE APHORISM If a project elicits a sense of being lost, then start from
the bottom up.
DSLs are an important part of a Clojure programmer’s toolset and stem from a long
Lisp tradition. When Paul Graham talks about “bottom-up programming” in his
perennial work On Lisp, this is what he’s referring to. In Clojure, it’s common practice
to start by defining and implementing a low-level language specifically for the levels
above. Creating complex software systems is hard, but using this approach, you can
build the complicated parts out of smaller, simpler pieces.
Clojure changes the way that you think.
13.2
Testing
Object-oriented programs can be highly complicated beasts to test properly, thanks to
mutating state coupled with the need to test across class hierarchies. Programs are a
vast tapestry of interweaving execution paths, and to test each path comprehensively is
difficult, if not impossible. In the face of unrestrained mutation, the execution paths
are overlaid with mutation paths, further adding to the chaos. Conversely, Clojure
programs tend to be compositions of pure functions with isolated pools of mutation.
The result of this approach helps to foster an environment conducive to unit testing.
But though the layers of an application are composed of numerous functions, each
individually and compositionally tested, the layers themselves and the wiring between
them must also be tested.
Test-driven development (Beck 2002) has conquered the software world, and at its
core it preaches that test development should drive the architecture of the overall
application. Unfortunately, this approach isn’t likely to bear fruit in your Clojure
Download from Wow! eBook <www.wowebook.com>
Testing
299
programs. Instead, Clojure provides the foundation for a contracts-based program
specification that’s more amenable for writing correct programs. But before we dis-
cuss contracts, we’ll touch on the ways Clojure facilitates one part of TDD, unit testing.
13.2.1 Some useful techniques
We don’t want to disparage test-driven development, because its goals are virtuous and
testing in general is essential. Because Clojure programs are organized using
namespaces, and they are themselves aggregations of functions, often pure, the act of
devising a unit-test suite at the namespace boundary is often mechanical in its direct-
ness. From a larger perspective, devising comprehensive test strategies is the subject of
numerous volumes and therefore outside of the scope of this book; but there are a few
Clojure-specific techniques that we wish to discuss.
USING WITH-VAR-ROOT TO STUB
Stubbing (Fowler 2007) is the act of supplying an imitation implementation of a func-
tion for testing purposes. One mechanism that can perform this stubbing is the
with-redefs macro implemented in the following listing. Though this exact macro
will likely be included in future versions of Clojure, it’s not in Clojure 1.2, so a defini-
tion is provided.
Listing 13.4 Macro to aid in mocking
(defn with-redefs-fn [binding-map func & args]
(let [root-bind (fn [m]
(doseq [[a-var a-val] m] (.bindRoot a-var a-val)))
old-vals (zipmap (keys binding-map)
(map deref (keys binding-map)))]
(try
(root-bind binding-map)
(apply func args)
(finally
(root-bind old-vals)))))
(defmacro with-redefs [bindings & body]
`(with-redefs-fn ~(zipmap (map #(list `var %) (take-nth 2 bindings))
(take-nth 2 (next bindings)))
(fn [] ~@body)))
The function rss-children from section 11.6 parses a Twitter RSS2 feed, returning a
sequence of the top-level feed elements. Testing functions that rely on rss-children
is futile against live Twitter feeds, so a stubbed implementation returning a known
sequence would be more prudent, as shown next.
Listing 13.5 Using with-redefs to create stubs
(defn tweetless-rss-children [s]
Create stub
'({:tag :title, :attrs nil, :content ["Stub"]}))
(defn count-rss2-children [s]
(count (rss-children s)))
Download from Wow! eBook <www.wowebook.com>
300
CHAPTER 13 Clojure changes the way you think
(with-redefs [rss-children tweetless-rss-children]
Dynamically
(count-rss2-children "dummy"))
bind with stub
;=> 1
The tweetless-rss-children function returns a sequence of some canned data.
Therefore, when testing the count-rss2-children function we temporarily change
the value of rss-children so that it resolves to tweetless-rss-children instead. This
change is made at the root of the rss-children Var and so is visible to all threads. As
long as all the test calls to it are made before control leaves the with-redefs form, the
stub will be invoked every time. Because tweet-occurrences doesn’t return until it col-
lects results from all the futures it creates, it will use the redef given by with-redefs:
(with-redefs [rss-children tweetless-rss-children]
(tweet-occurrences "dummy" "test-url"))
;=> 0
Another option that is sometimes suggested is to use binding in place of with-redefs.
This would push a thread-local binding for rss-children, which might seem attrac-
tive in that it could allow other threads to bind the same Var to a different stub func-
tion, potentially for simultaneously running different tests. But because tweet-
occurrences uses futures, the other threads will be calling rss-children and will see
the root binding rather than the stub,3 causing an error:
(binding [rss-children tweetless-rss-children]
(tweet-occurrences "dummy" "test-url"))
; java.util.concurrent.ExecutionException:
; java.io.FileNotFoundException: test-url
When the root binding of rss-children runs, it tries to actually load “test-url” and
fails, instead of calling our stub and succeeding. The with-redefs macro is a better
solution for mocking.
CLOJURE.TEST AS SPECIFICATION
Clojure ships with a testing library in the clojure.test namespace used to create test
suites that can further serve as partial system specifications. We won’t provide a com-
prehensive survey of the clojure.test functionality, but you should get a feel for how
it works. Unit-test specifications in Clojure are declarative in nature, as shown next.
Listing 13.6 clojure.test as a partial specification
(require '[clojure.test :as test])
(test/deftest feed-tests
(with-redefs [rss-children tweetless-rss-children]
(test/testing "RSS2 Child Counting"
(test/is (= 1000 (count-rss2-children "dummy"))))
3
Alpha versions for Clojure 1.3 handle binding’s interaction with future and Agent send differently, passing
dynamic bindings through to code executed in these other thread contexts. But because these are not the
only kinds of threads that can be spawned, with-redefs (which may be included in Clojure 1.3) is still rec-
ommended for mocking out functions during tests.
Download from Wow! eBook <www.wowebook.com>
Testing
301
(test/testing "Twitter Occurrence Counting"
(test/is (= 0 (count-tweet-text-task "#clojure" ""))))))
(defn test-ns-hook []
(feed-tests))
Clojure’s test library provides a DSL for describing unit test cases. If you’ll notice, we
added a failing test to the RSS2 Child Counting test so that when run, the test will fail
as expected:
(test/run-tests 'user)
; Testing user
;
; FAIL in (feed-tests) (NO_SOURCE_FILE:101)
; RSS2 Child Counting
; expected: (= 1000 (count-rss2-children "dummy"))
; actual: (not (= 1000 1))
;
; Ran 1 tests containing 2 assertions.
; 1 failures, 0 errors.
;=> {:type :summary, :test 1, :pass 1, :fail 1, :error 0}
Though tests are a good way to find some errors, they make few guarantees that the sys-
tem works properly. The ideal approach is the design and implementation of a frame-
work corresponding closely with the domain of the application itself. This framework
would ideally take the literal form of a domain DSL built incrementally through an
interaction with domain experts and must come before testing begins. No amount of
testing can substitute for thoroughly thinking through the fundamental design
details. That’s not to say that the domain DSLs should be fully realized from the start;
instead, the form of the DSL and its constituent parts should be reflective of the actual
domain. In our experience, there are no languages comparable to Clojure for this
kind of domain modeling, save for perhaps Haskell, Factor, and Scala. Having said
that, the domain isn’t simply defined by the shape of its language; it also includes its
expectations, which we’ll discuss presently.
13.2.2 Contracts programming
Test-driven development is in many ways a heuristic affair. People tend to only test the
error conditions and expectations that they can conceptualize. Surely there’s no such
thing as an exhaustive test suite, but in many cases test suites tend toward a local max-
ima. There’s a better way to define semantic expectations within your application:
using Clojure pre- and postconditions.
REVISITING PRE- AND POSTCONDITIONS
In section 7.1, we explored Clojure’s pre- and postcondition facility. Function con-
straint specification is a conceptually simple model for declaring the expectations for
any given function. Function constraints can cover the full range of expected condi-
tions imposed on the function’s inputs, its outputs, and their relative natures. The
beauty of specifying constraints is that they can augment a testing regimen with the
application of random values. The reason this works is that you can effectively throw out
Download from Wow! eBook <www.wowebook.com>
302
CHAPTER 13 Clojure changes the way you think
the values that fail the preconditions and instead focus on the values that cause error
in the postconditions. We’ll try this approach for a simple function to square a number:
(def sqr (partial
(contract sqr-contract
[n]
(require (number? n))
(ensure (pos? %)))
#(* % %)))
[(sqr 10) (sqr -9)]
;=> [100 81]
The contract for sqr states simply: require a number and ensure that its return is pos-
itive. Now we can create a simple test driver4 that throws many random values at it to
see if it breaks:
(doseq [n (range Short/MIN_VALUE Short/MAX_VALUE)]
(try
(sqr n)
(catch AssertionError e
(println "Error on input" n)
(throw e))))
; Error on input 0
;=> java.lang.AssertionError: Assert failed: (pos? %)
Even when adhering to the tenets of the preconditions, we’ve uncovered an error in
the sqr function at the postcondition end. Postconditions should be viewed as the
guarantee of the return value given that the preconditions are met. The reason for
the postcondition error is that the function’s contract doesn’t specify that the number
n should be nonzero. By adding a check for zero (not= 0 n) in the preconditions, we
can guarantee that the sqr function acts as expected. To perform this same verifica-
tion using unit testing is trivial in this case, but what if the edge condition wasn’t as
obvious? In such a case, it’s probable that the error might not be caught until it’s too
late. Of course, there’s no guarantee that your contracts are comprehensive, but that’s
why domain expertise is often critical when defining them.
ADVANTAGES OF PRE- AND POSTCONDITIONS
Function constraints aren’t code. They take the form of code, but that fact is only a mat-
ter of representation. Instead, constraints should be viewed as a specification language
describing expectations and result assurances. On the other hand, unit tests are code,
and code has bugs. Contracts, on the other hand, are essential semantic coupling.
Another potential advantage of contracts over tests is that in some cases, tests can
be generated from the contracts themselves. Also, pre- and postconditions are amena-
ble to being expressed as an overall description of the system itself, which can thus be
4 For the sake of highlighting this technique, we’ve simplified our test driver. Testing a limited range of input
values might not be an appropriate approach in all circumstances.
Download from Wow! eBook <www.wowebook.com>
A lack of design patterns
303
fed into a rule base for query and verification. Both of these cases are outside of the
scope of this book, but you shouldn’t be surprised if they make their way into future
versions of Clojure. There’s tremendous potential in Clojure’s pre- and postcondi-
tions. Though they’re currently low-level constructs, they can be used to express full-
blown design by contract facilities for your own applications.
Clojure changes the way that you think.
13.3 A lack of design patterns
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-
specified, bug-ridden, slow implementation of half of Common Lisp.
—Greenspun’s Tenth Rule
The book Design Patterns: Elements of Reusable Object-Oriented Software (Gamma et al
1995) was a seminal work of software design and development. You’d be hard pressed
to find a software programmer in this day and age who’s not familiar with this work.
The book describes 24 software best practices encountered throughout the course of
experience in developing software projects of varying sizes.
Design patterns have obtained a bad reputation in some circles, whereas in others
they’re considered indispensable. From our perspective, design patterns are a way to
express software best practices in a language-neutral way. But where patterns fall short
is that they don’t represent pure abstraction. Instead, design patterns have come to be
viewed as goals in and of themselves, which is likely the source of the antagonism
aimed at them. The ability to think in abstractions is an invaluable skill for a software
programmer to strengthen. In this section, we’ll attempt to dissuade you from viewing
Clojure features as design patterns (Norvig 1998) and instead as an inherent nameless
quality.
13.3.1 Clojure’s first-class design patterns
Most if not all of the patterns listed in the book are applicable to functional program-
ming languages in general, and to Clojure in particular. But at its most pragmatic, the
patterns described are aimed at patching deficiencies in popular object-oriented pro-
gramming languages. This practical view of design patterns isn’t directly relevant to
Clojure, because in many ways the patterns are ever-present and are first-class citizens
of the language itself. We won’t provide a comprehensive survey of the ways that Clo-
jure implements or eliminates popular design patterns but will provide enough to
make our point.
OBSERVER
Clojure’s add-watch and remove-watch functions provide the underpinnings of an
observer (publisher/subscriber) capability based on reference types. We can illustrate
this through the implementation of the simple defformula macro shown in listing 13.7.
Download from Wow! eBook <www.wowebook.com>
304
CHAPTER 13 Clojure changes the way you think
Listing 13.7 A macro to create spreadsheet-cell-like formulas
(defmacro defformula [nm bindings & formula]
`(let ~bindings
(let [formula# (agent ~@formula)
Create formula as Agent
update-fn# (fn [key# ref# o# n#]
(send formula# (fn [_#] ~@formula)))]
(doseq [r# ~(vec (take-nth 2 bindings))]
(add-watch r# :update-formula update-fn#))
Add watch to
(def ~nm formula#))))
each reference
(def h (ref 25))
(def ab (ref 100))
Create
(defformula avg [at-bats ab hits h]
baseball formula
(float (/ @hits @at-bats)))
@avg
;=> 0.25
Update
(dosync (ref-set h 33))
dependent Ref
;=> 33
@avg
Observe
;=> 0.33
formula change
By using watchers on references, you can use defformula to provide an abstract value
that changes when any of its parts change. A more traditional Lisp approach is to pro-
vide predefined hooks (Glickstein 1997) that are called at certain times within the
execution cycle. In addition, using proxy or gen-class to extend java.util.
Observable is the most straightforward way to wire into existing source code using the
Observer pattern.
STRATEGY
Algorithm strategies selected at runtime are common practice in Clojure, and there
are a number of ways to implement them. One such way is via continuation-passing
style, as we explored in section 7.3. A more general solution is to pass the desired
function as an argument to a higher-order function, such as you’d see in the ubiqui-
tous map, reduce, and filter functions. Further, we’ll provide a case of dynamic error
functions in the next section illustrating how Clojure’s multimethods are a more pow-
erful substitute for the classic strategy pattern.
VISITOR
The Visitor pattern is designed to describe a way to decouple operations on a struc-
ture from the structure itself. Even casual observers will see the parallel to Clojure’s
multimethods, protocols, types, proxies, and reify features.
ABSTRACT FACTORY
The Abstract Factory pattern is used to describe a way to create related objects without
having to name explicit types at the point of creation. Clojure’s types avoid the cre-
ation of explicit hierarchies (although ad hoc hierarchies can be created, as seen in
section 9.2). Therefore, in Clojure this particular usage scenario is relegated to use
within Java interoperability contexts. But the use of factory functions to abstract the
Download from Wow! eBook <www.wowebook.com>
A lack of design patterns
305
call to the constructors of types and records is idiomatic and in fact actively promoted.
The reasons for a Clojure-style factory are to simplify the importing requirements of a
type or record, and also to add additional project-specific functionality to the con-
structor (keyword arguments, default values, and so on).
INTERPRETER
The Interpreter pattern is in every way Greenspun’s Tenth Rule formalized. Many
projects of sufficient size can be well served by the inclusion of a specialized grammar
describing parts of the system itself. Clojure macros make the matter of creating spe-
cialized grammars a first-class member of the language.
BUILDER
The creation of complex structures from representation is central to Clojure pro-
gramming, although it’s viewed differently from a similar object-oriented approach—
the Builder pattern. In section 8.4, we used a simple data representation as the input
to Clojure’s clojure.xml/emit function to produce an analogous XML representa-
tion. If you preferred a different output representation, then you could write another
conversion function. If you preferred finer control over the constituent parts, then
you could write functions or multimethods for each and specialize at runtime.
FAÇADE
The use of Clojure namespaces, as seen in section 9.1, is the most obvious way to pro-
vide a simplified façade for a more complex API. You can also use the varying levels of
encapsulation (as outlined in section 2.4) for more localized façades.
ITERATOR
Iteration in Clojure is defined through an adherence to the seq protocol, as outlined
in section 5.1 and later elaborated on in sections 9.3 about types and protocols.
DEPENDENCY INJECTION
Though not a classical pattern in the Design Patterns sense, dependency injection has
become a de facto pattern for object-oriented languages that don’t allow overridable
class constructors. This condition requires that separate factory methods and/or
classes create concrete instances conforming to a given interface. In forsaking the
ability to define classes, Clojure completely avoids the problem that DI solves. Instead,
Clojure’s closest analogue to this “pattern” is the use of functions returning closures
that are specialized based on the original arguments. Likewise, you could use partial
application and composition similarly.
We could go further with this survey, but to do so would belabor the point: most of
what are known as design patterns are either invisible or trivial to implement in Clo-
jure. But what about the Prototype pattern, you ask? We implemented the UDP in sec-
tion 9.2. Decorators or chain of responsibility? Why not use a macro that returns a
function built from a list of forms spliced into the -> or ->> macro? Proxies would
likely be implemented as closures and so would commands. The list goes on and on,
and in the end you must face the inevitable—Clojure changes the way that you think.
Download from Wow! eBook <www.wowebook.com>
306
CHAPTER 13 Clojure changes the way you think
13.4 Error handling and debugging
Our goal throughout this book was to show the proper way to write Clojure code, with
mostly deferral and hand-waving regarding error handling and debugging. In this sec-
tion, we’ll cover these topics with what you might view as a unique twist, depending on
your programming background.
13.4.1 Error handling
As we showed in figure 10.7, there are two directions for handling errors. The first,
and likely most familiar, refers to the passive handling of exceptions bubbling outward