The Joy of Clojure

If you’ll notice, we used the :refer-clojure namespace directive to :exclude the

array and sequence functions that the SafeArray protocol overrides. We did this not

only because it’s important to know how to use :refer-clojure, but also because

we’re changing the semantics of aset to take a mutating function as its last argument

instead of a raw value. We then used the :require directive to alias the Clojure

namespace as clj, thus avoiding the need to use the fully qualified function names a

la clojure.core/aget.

The dumb array created by make-dumb-array is stored in a closure created by

reify, and unguarded access is provided without concern for concurrent matters.

Using this implementation across threads is disastrous, as shown:

(defn pummel [a]

(dothreads! #(dotimes [i (count a)] (aset a i inc)) :threads 100))

Download from Wow! eBook <www.wowebook.com>

When to use locks

259

(def D (make-dumb-array Integer/TYPE 8))

(pummel D)

;; wait for pummel to terminate

(seq D)

;=> (82 84 65 63 83 65 83 87)

This is very wrong—100 threads incrementing concurrently should result in 100 for

each array slot. To add insult to injury, Clojure didn’t throw a Concurrent-

ModificationException as you might’ve expected, but instead just silently went

along doing very bad things. Next, we’ll talk a little about locking and provide an

alternate implementation for SafeArray using locking primitives.

11.5.1 Safe mutation through locking

Currently, the only5 way to safely modify and see consistent values for a mutable object

across threads in Clojure is through locking.

REFERENCES AROUND EVIL MUTABLE THINGS

Wrapping a mutable object in a Clojure reference type provides absolutely no guarantees

for safe concurrent modification. Doing this will at best explode immediately or, worse,

provide inaccurate results.

If at all possible, locking should be avoided; but for those times when it’s unavoid-

able, the locking macro will help. The locking macro takes a single parameter acting

as the locking monitor and a body that executes in the monitor context. Any writes

and reads to the monitor object are thread safe, and as a bonus the monitor is always

released at the end of the block. One of the major complexities in concurrent pro-

gramming using locks is that all errors must be handled fully and appropriately; other-

wise you risk orphaned locks, and they spell deadlock. But the locking macro will

always release the lock, even in the face of exceptions.

Listing 11.5 An implementation of the SafeArray protocol using the locking macro

(defn make-safe-array [t sz]

(let [a (make-array t sz)]

Array creation

(reify

is same

SafeArray

(count [_] (clj/count a))

(seq [_] (clj/seq a))

(aget [_ i]

aget is locked

(locking a

(clj/aget a i)))

(aset [this i f]

aset is locked

(locking a

(clj/aset a i (f (aget this i))))))))

aset uses aget

(def A (make-safe-array Integer/TYPE 8))

(pummel A)

5 Although a potential future addition to Clojure named pods may provide another.

Download from Wow! eBook <www.wowebook.com>

260

CHAPTER 11 Mutation

;; wait for pummel to terminate

(seq A)

;=> (100 100 100 100 100 100 100 100)

We used the locking macro on both the aget and aset functions so that they can

both maintain consistency concurrently. Because aset calls aget, the locking macro

is called twice. This isn’t a problem because locking is reentrant, or able to be called

multiple times in the same thread. Typically, you’d have to manage the releasing of

reentrant locking mechanism to match the number of times called, but fortunately

locking manages that for us.

The locking macro is the simplest way to perform primitive locking in Clojure.

But the implementation of make-safe-array is coarse in that the locks used are

guarding the entire array. Any readers or writers wishing to access or update any slot

in the array must wait their turn, a bottleneck known as contention. If you need finer-

grained locking, the locking facilities provided by Java will help to gain more control,

a topic we cover next.

11.5.2 Using Java’s explicit locks

Java provides a set of explicit locks in the java.util.concurrent.locks package that

can also be used as shown in the following listing. One such lock is provided by the

java.util.concurrent.locks.ReentrantLock class.

Listing 11.6 An implementation of the SafeArray protocol using ReentrantLock

(defn lock-i [target-index num-locks]

(mod target-index num-locks))

(import 'java.util.concurrent.locks.ReentrantLock)

(defn make-smart-array [t sz]

(let [a (make-array t sz)

Array

Lsz (quot sz 2)

L (into-array (take Lsz

Locks

(repeatedly #(ReentrantLock.))))]

(reify

SafeArray

(count [_] (clojure.core/count a))

(seq [_] (clojure.core/seq a))

(aget [_ i]

(let [lk (clojure.core/aget L (lock-i (inc i) Lsz))]

(.lock lk)

Explicit

(try

locking

(clojure.core/aget a i)

Explicit

(finally (.unlock lk)))))

unlocking

(aset [this i f]

(let [lk (clojure.core/aget L (lock-i (inc i) Lsz))]

(.lock lk)

(try

Reentrant

(clojure.core/aset a i (f (aget this i)))

locking

(finally (.unlock lk))))))))

Download from Wow! eBook <www.wowebook.com>

When to use futures

261

(def S (make-smart-array Integer/TYPE 8))

(pummel S)

;; wait for pummel to terminate

(seq S)

;=> (100 100 100 100 100 100 100 100)

The first point of note is that we use a technique (simplified for clarity) called lock

striping (Herlihy 2008) to reduce the contention of guarding the array as a whole

using locking. The target array a’s slots are guarded by half the number of locks, each

chosen using the simple formula (mod target-index num-locks). This scheme

allows readers and writers to (potentially) act independently when accessing different

array slots. It’s crucial that we closed over the lock instance array L because for explicit

locks to work, each access must lock and unlock the same instance. Additionally, we’re

calling the .unlock method in the body of a finally expression, because failing to do

so is a recipe for disaster. Unlike the locking macro, the ReentrantLock class doesn’t

manage lock release automatically. Finally, you can also use the ReentrantLock in a

way equivalent to using the locking macro, but using ReentrantLock gives you the

choice of using proxy to provide more complex semantics than locking can provide.

One flaw of the make-smart-array function is that it uses the same locks for read-

ers and writers. But you can allow for more concurrency if you enable some number

of readers to access array slots without blocking at all by using the java.util.

concurrent.locks.ReentrantReadWriteLock class. The ReentrantReadWriteLock

class holds two lock instances, one for reads and one for writes, and by adding another

lock array you can take advantage of this fact. We won’t get into that exercise here, but

if you choose to do so then you can use the implementation of make-smart-array as a

guide.

Using the various locking mechanisms, you can guarantee consistency across

threads for mutable objects. But as we showed with explicit locks, there’s an expected

incantation to unlocking that must be strictly observed. Though not necessarily com-

plex in the SafeArray implementations, the conceptual baggage incurred in the

semantics of explicit locking scheme doesn’t scale well. The java.util.concurrent

package contains a cacophony of concurrency primitives above and beyond simple

locks, but it’s not our goal to provide a comprehensive survey herein.

Now that we’ve covered the matter of guaranteeing coordinated state across dispa-

rate threads, we turn our attention to a different topic: parallelization.

11.6

When to use futures

Clojure includes two reference types supporting parallelism: futures and promises.

Futures, the subject of this section, are simple yet elegant constructs useful for parti-

tioning a typically sequential operation into discrete parts. These parts can then be

asynchronously processed across numerous threads that will block if the enclosed

expression hasn’t finished. All subsequent dereferencing will return the calculated

value. The simplest example of the use of a future is as shown:

Download from Wow! eBook <www.wowebook.com>

262

CHAPTER 11 Mutation

(time (let [x (future (do (Thread/sleep 5000) (+ 41 1)))]

[@x @x]))

; "Elapsed time: 5001.682 msecs"

;=> [42 42]

The processing time of the do block is only paid for on the first dereference of the

future x. Futures represent expressions that have yet to be computed.

11.6.1 Futures as callbacks

One nice use case for futures is in the context of a callback mechanism. Normally you

might call out to a remote-procedure call (RPC), wait for it to complete, and then pro-

ceed with some task depending on the return value. But what happens if you need to

make multiple RPC calls? Should you be forced to wait for them all serially? Thanks to

futures, the answer is no. In this section, we’ll use futures to create an aggregate task

that finds the total number of occurrences of a string within a given set of Twitter6

feeds. This aggregate task will be split into numerous parallel subtasks via futures.

COUNTING WORD OCCURRENCES IN A SET OF TWITTER FEEDS

Upon going to a personal Twitter page such as http://twitter.com/fogus, you can find

a link to the RSS 2.0 feed for that user. We’ll use this feed as the input to our functions.

An RSS 2.0 feed is an XML document used to represent a piece of data that’s con-

stantly changing. The layout of a Twitter RSS entry is straightforward:

<title>Twitter / fogus</title>

<item>

<title>fogus: Thinking about #Clojure futures.</title>

</item>

</channel>

</rss>

There’s more to the content of a typical RSS feed, but for our purposes we wish to only

retrieve the title element of the item elements (there can be more than one). To do

this, we need to first parse the XML and put it into a convenient format. If you recall

from section 8.4, we created a domain DSL to create a tree built on a simple node

structure of tables with the keys :tag, :attrs, and :content. As mentioned, that struc-

ture is leveraged in many Clojure libraries, and we’ll take advantage of this fact. Clo-

jure provides some core functions in the clojure.xml and clojure.zip namespaces

to help make sense of the feed:

(require '(clojure [xml :as xml]))

(require '(clojure [zip :as zip]))

(defmulti rss-children class)

(defmethod rss-children String [uri-str]

6 Twitter is online at http://twitter.com.

Download from Wow! eBook <www.wowebook.com>

When to use futures

263

(-> (xml/parse uri-str)

zip/xml-zip

zip/down

zip/children))

Using the function clojure.xml/parse, we can retrieve the XML for a Twitter RSS

feed and convert it into the familiar tree format. That tree is then passed into a func-

tion clojure.zip/xml-zip that converts that structure into another data structure

called a zipper. The form and semantics of the zipper are beyond the scope of this book

(Huet 1997), but using it in this case allows us to easily navigate down from the root

rss XML node to the channel node, where we then retrieve its children. The child

nodes returned from rss-children contain other items besides item nodes (title,

link, and so forth) that need to be filtered out. Once we have those item nodes, we

then want to retrieve the title text and count the number of occurrences of the tar-

get text (case-insensitive). We perform all of these tasks using the function count-

tweet-text-task, defined in the following listing.

Listing 11.7 Creating a future task to count word occurrences in a tweet

(import '(java.util.regex Pattern))

(defn count-tweet-text-task [txt feed]

(let [items (rss-children feed)

Get kids

re (Pattern/compile (Pattern/quote txt))]

(count

(mapcat #(re-seq re (first %))

Get matches

(for [item (filter (comp #{:item} :tag) items)]

Filter non-items

(-> item :content first :content))))))

Get title

We’ll now try to count some text in a Twitter feed to see what happens:

(count-tweet-text-task

"#clojure"

"http://twitter.com/statuses/user_timeline/46130870.rss")

;=> 7

The result you see is highly dependent on when you run this function, because the

RSS feeds are ever-changing. But using the count-tweet-text-task function, we can

build a sequence of tasks to be performed over some number of Twitter feeds. Before

we do that, we’ll create a convenience macro as-futures to take said sequence and

dispatch the enclosed actions across some futures.

Listing 11.8 A macro to dispatch a sequence of futures

(defmacro as-futures [[a args] & body]

(let [parts (partition-by #{'=>} body)

[acts _ [res]] (partition-by #{:as} (first parts))

[_ _ task] parts]

`(let [~res (for [~a ~args] (future ~@acts))]

~@task)))

Download from Wow! eBook <www.wowebook.com>

264

CHAPTER 11 Mutation

The as-futures macro implemented in listing 11.8 names a binding corresponding

to the arguments for a given action, which is then dispatched across a number of

futures, after which a task is run against the futures sequence. The body of as-

futures is segmented so that we can clearly specify the needed parts—the action

arguments, the action to be performed for each argument, and the tasks to be run

against the resulting sequence of futures:

(as-futures [<arg-name> <all-args>]

<actions-using-args>

:as <results-name>

<actions-using-results>)

To simplify the macro implementation, we use the :as keyword and => symbol to

clearly delineate its segments. The as-futures body only exits after the task body fin-

ishes—as determined by the execution of the futures. We can use as-futures to per-

form the original task with a new function tweet-occurrences, implemented in the

following listing.

Listing 11.9 Counting string occurrences in Twitter feeds fetched in parallel

(defn tweet-occurrences [tag & feeds]

(as-futures [feed feeds]

(count-tweet-text-task tag feed)

:as results

(reduce (fn [total res] (+ total @res))

results)))

The as-futures macro builds a sequence of futures named results, enclosing the

call to count-tweet-text-task across the unique set of Twitter feeds provided. We

then sum the counts returned from the dereferencing of the individual futures, as

shown:

(tweet-occurrences "#Clojure"

"http://twitter.com/statuses/user_timeline/46130870.rss"

"http://twitter.com/statuses/user_timeline/14375110.rss"

"http://twitter.com/statuses/user_timeline/5156041.rss"

"http://twitter.com/statuses/user_timeline/21439272.rss")

;=> 22

And that’s that. Using only a handful of functions and macros, plus using the built-in

core facilities for XML parsing and navigation, we’ve created a simple Twitter occur-

rences counter. Our implementation has some trade-offs made in the name of page

count. First, we blindly dereference the future in tweet-occurrences when calculat-

ing the sum. If the future’s computation freezes, then the dereference would likewise

freeze. Using some combination of future-done?, future-cancel, and future-can-

celled? in your own programs, you can skip, retry, or eliminate ornery feeds from the

calculation. Futures are only one way to perform parallel computation in Clojure, and

in the next section we’ll talk about another—promises.

Download from Wow! eBook <www.wowebook.com>

When to use promises

265

11.7 When to use promises

Another tool that Clojure provides for parallel computation is the promise and

deliver mechanisms. Promises are similar to futures, in that they represent a unit of

computation to be performed on a separate thread. Likewise, the blocking semantics

when dereferencing an unfinished promise are also the same. Whereas futures encap-

sulate an arbitrary expression that caches its value in the future upon completion,

promises are placeholders for values whose construction is fulfilled by another thread

via the deliver function. A simple example is as follows:

(def x (promise))

(def y (promise))

(def z (promise))

(dothreads! #(deliver z (+ @x @y)))

(dothreads!

#(do (Thread/sleep 2000) (deliver x 52)))

(dothreads!

#(do (Thread/sleep 4000) (deliver y 86)))

(time @z)

; "Elapsed time: 3995.414 msecs"

;=> 138

Promises are write-once; any further attempt to deliver will throw an exception.

11.7.1 Parallel tasks with promises

We can create a macro similar to as-futures for handling promises, but because of

the more advanced value semantics, the implementation is thus more complicated.

We again wish to provide a named set of tasks, but we’d additionally like to name the

corresponding promises so that we can then execute over the eventual results, which

we do next.

Listing 11.10 A macro to dispatch a sequence of promises across a number of threads

(defmacro with-promises [[n tasks _ as] & body]

(when as

`(let [tasks# ~tasks

n# (count tasks#)

promises# (take n# (repeatedly promise))]

(dotimes [i# n#]

(dothreads!

(fn []

(deliver (nth promises# i#)

((nth tasks# i#))))))

(let [~n tasks#

~as promises#]

~@body))))

We could then build a rudimentary parallel testing facility, dispatching tests across dis-

parate threads and summing the results when all of the tests are done:

Download from Wow! eBook <www.wowebook.com>

266

CHAPTER 11 Mutation

(defrecord TestRun [run passed failed])

(defn pass [] true)

(defn fail [] false)

(defn run-tests [& all-tests]

(with-promises

[tests all-tests :as results]

(into (TestRun. 0 0 0)

(reduce #(merge-with + %1 %2) {}

(for [r results]

(if @r

{:run 1 :passed 1}

{:run 1 :failed 1}))))))

(run-tests pass fail fail fail pass)

;=> #:user.TestRun{:run 5, :passed 2, :failed 3}

This unit-testing model is simplistic by design in order to illustrate parallelization

using promises and not to provide a comprehensive testing framework.

11.7.2 Callback API to blocking API

Promises, much like futures, are useful for executing RPC on separate threads. This

can be useful if you need to parallelize a group of calls to an RPC service, but there’s a

converse use case also. Often, RPC APIs take arguments to the service calls and also a

callback function to be executed when the call completes. Using the rss-children

function from the previous section, we can construct an archetypal RPC function:

(defn tweet-items [k feed]

(for [item (filter (comp #{:item} :tag) (rss-children feed))]

(-> item :content first :content))))

The tweet-items function is a distillation of the count-tweet-text-task function

from the previous chapter, as shown:

(tweet-items

count

"http://twitter.com/statuses/user_timeline/46130870.rss")

;=> 16

The argument k to tweet-items is the callback, or continuation, that’s called with the

filtered RPC results. This API is fine, but there are times when a blocking call is more

appropriate than callback based call. We can use a promise to achieve this blocking

behavior with the following:

(let [p (promise)]

(tweet-items #(deliver p (count %))

"http://twitter.com/statuses/user_timeline/46130870.rss")

@p)

;=> 16

And as you see, the call blocks until the deliver occurs. This is a fine way to transform

the callback into a blocking call, but we’d like a way to do this generically. Fortunately,

Download from Wow! eBook <www.wowebook.com>

When to use promises

267

most well-written RPC APIs follow the same form for their callback functions/methods,

so we can create a macro to wrap this up nicely in the following listing.

Listing 11.11 A macro for transforming a callback-based function to a blocking call

(defmacro cps->fn [f k]

`(fn [& args#]

(let [p# (promise)]

(apply ~f (fn [x#] (deliver p# (~k x#))) args#)

@p#)))

(def count-items (cps->fn tweet-items count))

(count-items "http://twitter.com/statuses/user_timeline/46130870.rss")

;=> 16

This is a simple solution to a common problem that you may have already encoun-

tered in your own applications.

11.7.3 Deterministic deadlocks

You can cause a deadlock in your applications by never delivering to a promise. One

possibly surprising advantage of using promises is that if a promise can deadlock, it’ll

deadlock deterministically. Because only a single thread can ever deliver on a promise,

only that thread will ever cause a deadlock. We can create a cycle in the dependencies

between two promises to observe a deadlock using the following code:

(def kant (promise))

(def hume (promise))

(dothreads!

#(do (println "Kant has" @kant) (deliver hume :thinking)))

(dothreads!

#(do (println "Hume is" @hume) (deliver kant :fork)))

The Kant thread is waiting for the delivery of the value for kant from the Hume

thread, which in turn is waiting for the value for hume from the Kant thread. Attempt-

ing either @kant or @hume in the REPL will cause an immediate deadlock. Further-

more, this deadlock will happen every time; it’s deterministic rather than dependent

on odd thread timings or the like. Deadlocks are never nice, but deterministic dead-

locks are better than nondeterministic.7

We’ve only touched the surface for the potential that promises represent. In fact,

the pieces that we’ve assembled in this section represent some of the basic building

blocks of dataflow (Van Roy 2004) concurrency. But any attempt to serve justice to

dataflow concurrency in a single section would be a futile effort. At its essence,

There are experts in concurrent programming who will say that naïve locking schemes are also deterministic.

Our simple example is illustrative, but alas it isn’t representative of a scheme that you may devise for your own

code. In complex designs where promises are created in one place and delivered in a remote locale, deter-

mining deadlock will naturally be more complex. Therefore, we’d like to use this space to coin a new phrase:

“determinism is relative.”

Download from Wow! eBook <www.wowebook.com>

268

CHAPTER 11 Mutation

dataflow deals with the process of dynamic changes in values causing dynamic changes

in dependent “formulas.” This type of processing finds a nice analogy in the way that

spreadsheet cells operate, some representing values and others dependent formulas

that change as the former also change.

Continuing our survey of Clojure’s parallelization primitives, we’ll next discuss

some of the functions provided in the core library.

11.8

Parallelism

In the previous two sections we built two useful macros as-futures and with-

promises, allowing you to parallelize a set of operations across numerous threads. But

Clojure has functions in its core library providing similar functionality named pmap,

pvalues, and pcalls, which we’ll cover briefly in this section.

11.8.1 pvalues

The pvalues macro is analogous to the as-futures macro, in that it executes an arbi-

trary number of expressions in parallel. Where it differs is that it returns a lazy

sequence of the results of all the enclosed expressions, as shown:

(pvalues 1 2 (+ 1 2))

;=> (1 2 3)

The important point to remember when using pvalues is that the return type is a lazy

sequence, meaning that your access costs might not always present themselves as

expected:

(defn sleeper [s thing] (Thread/sleep (* 1000 s)) thing)

(defn pvs [] (pvalues

(sleeper 2 :1st)

(sleeper 3 :2nd)

(keyword "3rd")))

(-> (pvs) first time)

; "Elapsed time: 2000.309 msecs"

;=> :1st

The total time cost of accessing the first value in the result of pvs is only the cost of its

own calculation. But accessing any subsequent element costs as much as the most

expensive element before it, which you can verify by accessing the last element:

(-> (pvs) last time)

; "Elapsed time: 4001.435 msecs"

;=> :3rd

This may prove a disadvantage if you want to access the result of a relatively cheap

expression that happens to be placed after a more costly expression. More accurately,

all seq values within a sliding window 8 are forced, so processing time is limited by the

most costly element therein.

8 Currently, the window size is N+2, where N is the number of CPU cores. But this is an implementation detail, so it’s enough to know only that the sliding window exists.

Download from Wow! eBook <www.wowebook.com>

Parallelism

269

11.8.2 pmap

The pmap function is the parallel version of the core map function. Given a function

and a set of sequences, the application of the function to each matching element hap-

pens in parallel:

(->> [1 2 3]

(pmap (comp inc (partial sleeper 2)))

doall

time)

; "Elapsed time: 2000.811 msecs"

;=> (2 3 4)

The total cost of realizing the result of mapping a costly increment function is again

limited by the most costly execution time within the aforementioned sliding window.

Clearly, in this contrived case, using pmap provides a benefit, so why not just replace

every call to map in your programs with a call to pmap? Surely this would lead to faster

execution times if the map functions were all applied in parallel, no? The answer is a

resounding: it depends. A definite cost is associated with keeping the resulting

sequence result coordinated, and to indiscriminately use pmap might actually incur

that cost unnecessarily, leading to a performance penalty. But if you’re certain that the

cost of the function application outweighs the cost of the coordination, then pmap

might help to realize performance gains. Only through experimentation will you be

able to determine whether pmap is the right choice.

11.8.3 pcalls

Finally, Clojure provides a pcalls function that takes an arbitrary number of func-

tions taking no arguments and calls them in parallel, returning a lazy sequence of the

results. The use shouldn’t be a surprise by now:

(-> (pcalls

#(sleeper 2 :1st)

#(sleeper 3 :2nd)

#(keyword "3rd"))

doall

time)

; "Elapsed time: 3001.039 msecs"

;=> (:1st :2nd :3rd)

The same benefits and trade-offs associated with pvalues and pmap also apply to

pcalls and should be considered before use.

Executing costly operations in parallel can be a great boon when used properly,

but should by no means be considered a magic potion guaranteeing speed gains.

There’s currently no magical formula for determining which parts of an application

can be parallelized—the onus is on you to determine your application’s parallel

potential. What Clojure provides is a set of primitives, including futures, promises,

pmap, pvalues, and pcalls as the building blocks for your own personalized paral-

lelization needs.

Download from Wow! eBook <www.wowebook.com>

270

CHAPTER 11 Mutation

In the next section, we’ll cover the ubiquitous Var, but from a different perspective

than we have thus far.

11.9

Vars and dynamic binding

The last reference type we’ll explore is perhaps the most commonly used—the Var.

Vars are most often used because of two main features:

 Vars can be named and interned in a namespace.

 Vars can provide thread-local state.

It’s through the second feature that Vars contribute most usefully to the reference

type landscape. The thread-local value of a Var by definition can only be read from or

written to a single thread, and thus provides the thread-safe semantics you’ve come to

expect from a Clojure reference type.

But before you can start experimenting with Vars at the REPL, we to need address

some consequences of the first feature. The other reference objects you’ve looked at

aren’t themselves named and so are generally stored in something with a name. This

means that when the name is evaluated, you get the reference object, not the value.

To get the object’s value, you have to use deref. Named Vars flip this around—evaluat-

ing their name gives the value, so if you want the Var object, you need to pass the

name to the special operator var.

With this knowledge in hand, you can experiment with an existing Var. Clojure

provides a Var named *read-eval*,9 so you can get its current value by evaluating its

name:

*read-eval*

;=> true

No deref needed, because *read-eval* is a named Var. Now for the Var object itself:

(var *read-eval*)

;=> #'clojure.core/*read-eval*

That’s interesting—when a named Var object is printed, it starts with #' and is then

followed by the fully qualified name of the Var. The #' reader feature expands to the

Var operator—it means the same thing:

#'*read-eval*

;=> #'clojure.core/*read-eval*

Now that you’ve seen how to refer to Var objects, you can look at how they behave.

The Var *read-eval* is one of those provided by Clojure that’s specifically meant to

be given thread-local bindings but by default has only a root binding. You should’ve

seen its root binding when you evaluated it earlier—by default, *read-eval* is bound

to true.

9 *read-eval* happens to be a Var that has a default configuration useful for this discussion about Vars —its

actual purpose is unimportant here.

Download from Wow! eBook <www.wowebook.com>

Vars and dynamic binding

271

11.9.1 The binding macro

The root binding of a Var can act as the base of a stack, with each thread’s local bind-

ings pushing onto that stack and popping off of it as requested. The most common

mechanism for pushing and popping thread-local bindings is the macro binding. It

takes one or more Var names and a value for each that will initialize the new binding

when it’s pushed. These bindings remain in effect until control passes out of the

binding macro, at which point they’re popped off the stack.

Here’s a simple example of a function that prints the current value of the Var

*read-eval*, either the root or thread-local value, whichever is currently in effect:

(defn print-read-eval []

(println "*read-eval* is currently" *read-eval*))

This function calls print-read-eval three times, the first and last of which will print

the root binding. The middle time, binding is in effect:

(defn binding-play []

Thread A

(print-read-eval)

(binding [*read-eval* false]

(print-read-eval))

Thread B

Var root

(print-read-eval))

binding

push binding

This results in the Var temporarily having a thread-

local value of false:

pop binding

(binding-play)

; *read-eval* is currently true

Thread C

; *read-eval* is currently false

push binding

; *read-eval* is currently true

This is a like thread B in figure 11.9, which also shows a

push binding

simpler scenario than thread A and a more complex

one than thread C.

pop binding

11.9.2 Creating a named Var

pop binding

Vars are most commonly created with the special oper-

ator def or one of the many macros that expands to a

Figure 11.9 Thread-local Var

form that has a def inside:

bindings. This illustration depicts

a single Var being used from three

 defn—For putting a function in a Var

different threads. Each rounded

 defmacro—For putting a macro in a Var

box is a Var binding, either thread-



local or root. Each star is the Var

defonce—For setting the value of an unbound Var

being deref'ed, with the solid

 defmulti—For putting a multimethod in a Var

arrow pointing to the binding used.

The dotted lines point from a

There are a few others in clojure.core10 and many

thread-local binding to the next

more in contrib. What they have in common is that binding on the stack.

10 It's likely that starting with Clojure 1.3 Vars will only have the ability to take on thread-local values when defined using defdynamic or marked with metadata like ^{:dynamic true}. Throughout this book, we will

take the latter approach with high confidence that it will just work in 1.3.

Download from Wow! eBook <www.wowebook.com>

272

CHAPTER 11 Mutation

each of these will intern a Var in the current namespace. Clojure will search for the

named Var in the current namespace. If one is found, it’s used; otherwise, a new Var is

created and added to the namespace, and that one is used.11 The Var (specifically the

root binding of the Var) is bound to whatever value, function, or macro (and so on)

was given. The Var itself is returned:

(def favorite-color :green)

#'user/favorite-color

When a Var is printed, its fully qualified name is given, along with the namespace where

the Var is interned (user) and the Var’s name itself (favorite-color). These are pre-

ceded by #' because unlike the other reference types, a named Var is automatically

dereferenced when its name is evaluated—no explicit @ or call to deref is required:

favorite-color

;=> :green

So in order to refer to a Var instead of the value it’s bound to, you need to use #' or

the special form var, which are equivalent:

(var favorite-color)

;=> #'user/favorite-color

A Var can exist (or not exist) in any of four states. The precise state a Var is in can be

determined using the functions resolve, bound?, and thread-bound? as shown in

Table 11.1.

Table 11.1 Var states

Initialization mechanism

(resolve 'x)

(bound? #'x)

(thread-bound? #'x)

(def x)

#'user/x

false

(def x 5)

#'user/x

true

false

(binding [x 7] ...)

#'user/x

true

(with-local-vars [x 9] ...)

nil

true (bound? x)

true (thread-bound? x)

The first row of the table shows the results of resolve, bound?, and thread-bound?

when a var x is unbound. The remaining rows show how to change x to cause those

functions to return the values shown.

11.9.3 Creating anonymous Vars

Vars don’t always have names, nor do they need to be interned in a namespace. The

with-local-vars macro creates Vars and gives them thread-local bindings all at once,

but it won’t intern them. Instead, they’re bound to locals, which means that the

associated Var isn’t implicitly looked up by symbolic name. You need to use deref or

var-get to get the current value of the Var. Here’s an example of a Var x created and

11 Not all macros starting with def necessarily create or intern Vars. Some that don’t: defmethod, defrecord,

and deftype.

Download from Wow! eBook <www.wowebook.com>

Vars and dynamic binding

273

interned with def, and then a local x that shadows it and is bound to a new var via

with-local-vars:

(def x 42)

{:outer-var-value x

:with-locals (with-local-vars [x 9]

{:local-var x

:local-var-value (var-get x)})}

;=> {:outer-var-value 42,

:with-locals {:local-var #<Var: --unnamed-->,

:local-var-value 9}}

Within the body of the with-local-vars macro, the bound value can bet set using

(var-set <var> <value>), which will of course only affect the thread-local value. It’s

almost stunning how rarely with-local-vars is useful.

11.9.4 Dynamic scope

Vars have dynamic scope, which contrasts with the lexical scope of let locals. The most

obvious difference is that with a lexical local, you can easily see where it was initialized

by looking at the nested structure of the code. A Var, on the other hand, may have been

initialized by a binding anywhere earlier in the call stack, not necessarily nearby in the

code at all. This difference can create unexpectedly complex interactions and is one of

the few areas where Clojure does little to help you address such complexity.

An example of this complexity is shown by using the binding macro or any macro

built on top of it, such as with-precision and with-out-str. For example, we can

use the with-precision macro to conveniently set up the *math-context* Var:

(with-precision 4

(/ 1M 3))

;=> 0.3333M

We need to use with-precision here because if we don’t tell BigDecimal we’re okay

with it rounding off the result, it’ll refuse to return anything in this case:

(/ 1M 3)

; java.lang.ArithmeticException: Non-terminating decimal expansion;

; no exact representable decimal result.

With that in mind, can you see why with-precision isn’t doing its job in the next

snippet? The only thing that makes it different from the example that worked earlier

is we’re using map to produce a sequence of three numbers instead of just one:

(with-precision 4

(map (fn [x] (/ x 3)) (range 1M 4M)))

; java.lang.ArithmeticException: Non-terminating decimal expansion;

; no exact representable decimal result.

The problem is that map is lazy and therefore doesn’t call the function given to it

immediately. Instead, it waits until the REPL tries to print it, and then does the divi-

sion. Although the map and the function it calls are within the lexical scope of with-

binding, and with-binding itself uses a thread-local binding internally, it doesn’t care

Download from Wow! eBook <www.wowebook.com>

274

CHAPTER 11 Mutation

about lexical scope. When the division operation is performed, we’ve already left the

dynamic scope of the with-precision, and it no longer has any effect. The BigDecimal

behavior drops back to its default, and it throws an exception.

One way to solve this is to make sure that all the division is done before leaving the

dynamic scope. Clojure’s doall function is perfect for this:

(with-precision 4

(doall (map (fn [x] (/ x 3)) (range 1M 4M))))

;=> (0.3333M 0.6667M 1M)

One drawback is that it completely defeats map’s laziness. An alternate solution is to

have the function provided to map re-create, when it’s run, the dynamic scope in which

the function was created. Clojure provides a handy macro bound-fn to do exactly that:

(with-precision 4

(map (bound-fn [x] (/ x 3)) (range 1M 4M)))

;=> (0.3333M 0.6667M 1M)

Now the sequence being returned is still lazy, but before each item is computed, the

dynamic scope of *math-context* is re-created and the exception is avoided.

This kind of mismatch between a function definition that appears lexically inside a

form like with-precision or binding and yet has a different dynamic scope when

called doesn’t cause problems with lazy sequences alone. You may also see problems

with functions sent to Agents as actions or with the body of a future, because these are

executed in other threads outside the dynamic scope where they’re set up.

Problems related to dynamic scope aren’t even exclusive to Vars. The scope of a

try/catch is also dynamic and can have similarly unexpected behavior. For example,

with-open uses try/finally to close a file automatically when execution leaves its

dynamic scope. Failing to account for this can lead to an error when trying to write to

a closed file, because the dynamic scope of with-open has been left. Though bound-

fn can help make the dynamic scope of a Var borrow from its lexical scope, the only

way to deal with try/catch is to make sure everything is executed before leaving its

dynamic scope.

11.10 Summary

This has been the most complex chapter of the book. State management is a compli-

cated process that can quickly lose all semblance of sanity in the face of concurrent

modifications. Clojure’s main tenet is not to foster concurrency, but instead to pro-

vide the tools for the sane management of state. As a result of this focus, sane concur-

rency follows. Clojure also provides the building blocks for you to parallelize

computations across disparate threads of execution. From the expression-centric

future, to the function-centric set-once “variable” promise, to the core functions

pcalls, pvalues, and pmap, Clojure gives you the raw materials for your specialized

needs. Finally, we talked in depth about Clojure’s Var, dynamic binding, and the

mechanics of thread-locals.

The next chapter deals with performance considerations and how to make your

Clojure programs much faster.

Download from Wow! eBook <www.wowebook.com>

Part 5

Tangential

considerations

Some topics are so interesting and important that we must include them,

even if they don’t fit well in another chapter or warrant a chapter to themselves.

In this part, we’ll cover several such topics, including transient collections,

domain-specific languages, and testing.

Download from Wow! eBook <www.wowebook.com>

Performance

This chapter covers

 Type hints

 Transients

 Chunked sequences

 Memoization

 Understanding coercion

Now that we’ve spent a book’s worth of material learning the why and how of Clo-

jure, it’s high time we turned our attention to the subject of performance. There’s

a meme in programming that can be summarized as follows: make it work first,

then make it fast. Throughout this book, we’ve taught you the ways that Clojure

allows you to “make it work,” and now we’re going to tell how to make it fast.

In many cases, Clojure’s compiler will be able to highly optimize idiomatic Clo-

jure source code. But there are times when the form of your functions, especially in

interoperability scenarios, will prove to be ambiguous or even outright counter to

compiler optimizations. Therefore, we’ll lead you through optimization techniques

such as type hints, transients, chunked sequences, memoization, and coercion.

Using some combination of these techniques will help you approach, and some-

times exceed, the performance of Java itself.

The most obvious place to start, and the one you’re most likely encounter, is

type hinting—so this is where we’ll begin.

277

Download from Wow! eBook <www.wowebook.com>

278

CHAPTER 12 Performance

12.1

Type hints

The path of least resistance in Clojure often produces the fastest and most efficient

compiled code, but not always. The beauty of Clojure is that this path of least resis-

tance allows simple techniques for gaining speed via type hints. The first thing to

know about type hints is that they’re used to indicate that an object is an instance of

some class—never a primitive.

THE RULE OF TYPE HINTING Write your code so that it’s first and foremost cor-

rect; then add type-hint adornment to gain speed. Don’t trade the efficiency

of the program for the efficiency of the programmer.

12.1.1 Advantages of type adornment

There are epic debates about the virtues of static versus dynamic type systems; we

won’t engage in those arguments here. But there are a few advantages to a dynamic

type system like Clojure’s that also allows type hinting to occur after the bulk of devel-

opment. One such advantage is that in a static type system, the cost of changing argu-

ment lists is extended to all of the callers, whereas in Clojure the cost is deferred until

adornment time or even outright avoided.1 This scenario isn’t limited to the case of

function arguments in Clojure nor to statically typed languages, but instead to any

typed element. This dynamic type system provides an agile experience in general to

Clojure, which can later be optimized when there’s a need.

12.1.2 Type-hinting arguments and returns

If you recall from section 10.3, we created a function asum-sq that took an array of

floats and performed a sum of squares on its contents. Unfortunately, asum-sq wasn’t

as fast as it could’ve been. We can illuminate the cause of its inefficiency using a REPL

flag named *warn-on-reflection*, which by default is set to false:

(set! *warn-on-reflection* true)

;=> true

What this seemingly innocuous statement does is to signal to the REPL to report when

the compiler encounters a condition where it can’t infer the type of an object and

must use reflection to garner it at runtime. You’ll see a reflection warning by entering

asum-sq into the REPL:

(defn asum-sq [xs]

(let [dbl (amap xs i ret

(* (aget xs i)

(aget xs i)))]

(areduce dbl i ret 0

(+ ret (aget dbl i)))))

; Reflection warning - call to aclone can't be resolved.

; ...

1 Aside from the case where type hints don’t require client changes, the use of keyword arguments as seen in

section 7.1 can help to localize additional function requirements to only the callers needing them.

Download from Wow! eBook <www.wowebook.com>

Type hints

279

Though not terribly informative in and of itself, the fact that a reflection warning

occurs is portentous. Running the call to asum-sq in a tight loop verifies that some-

thing is amiss:

(time (dotimes [_ 10000] (asum-sq (float-array [1 2 3 4 5]))))

; "Elapsed time: 410.539 msecs"

;=> nil

Though the reflection warning didn’t point to the precise inefficiency, you can infer

where it could be given that Clojure deals with the java.lang.Object class across

function boundaries. Therefore, you can assume that the problem lies in the argu-

ment xs coming into the function as something unexpected. Adding two type hints to

xs and dbl (because it’s built from xs) might do the trick:

(defn asum-sq [ ^floats xs]

(let [^floats dbl (amap xs i ret

...

Rerunning the tight loop verifies that the assumption was correct:

(time (dotimes [_ 10000] (asum-sq (float-array [1 2 3 4 5]))))

; "Elapsed time: 17.087 msecs"

;=> nil

This is a dramatic increase in speed using a simple type hint that casts the incoming

array xs to one containing primitive floats. The whole range of array type hints is

shown next:

 objects

 floats

 shorts

 ints

 doubles

 bytes

 longs

 chars

 booleans

The problems might still not be solved, especially if you want to do something with the

return value of asum-sq, as shown:

(.intValue (asum-sq (float-array [1 2 3 4 5])))

; Reflection warning, reference to field intValue can't be resolved.

;=> 55

This is because the compiler can’t garner the type of the return value and must there-

fore use reflection to do so. By hinting the return type of asum-sq, the problem goes

away:

(defn ^Float asum-sq [ ^floats xs]

...

(.intValue (asum-sq (float-array [1 2 3 4 5])))

;=> 55

With minor decoration on the asum-sq function, we’ve managed to increase its speed

as well as potentially increasing the speed of expressions downstream.

Download from Wow! eBook <www.wowebook.com>

280

CHAPTER 12 Performance

12.1.3 Type-hinting objects

In addition to allowing for the hinting of function arguments and return values, you

can also hint arbitrary objects. If you didn’t have control over the source to asum-sq,

then these reflection problem would be insurmountable when executing (.intValue

(asum-sq (float-array [1 2 3 4 5]))). But you can instead hint at the point of usage

and gain the same advantage as if asum-sq had been hinted all along:

(.intValue ^Float (asum-sq (float-array [1 2 3 4 5])))

;=> 55

All isn’t lost when you don’t own a piece of code causing performance problems,

because Clojure is flexible in the placement of type hints.

12.2

Transients

We’ve harped on you for this entire book about the virtues of persistent data struc-

tures and how wonderful they are. In this section, we’ll present an optimization tech-

nique provided by Clojure called transients, which offer a mutable view of a collection.

It seems like blasphemy, but we assure you there’s a good reason for their existence,

which we’ll discuss currently.

12.2.1 Ephemeral garbage

The design of Clojure is such that it presumes that the JVM is extremely efficient at

garbage collection of ephemeral (or short-lived) objects, and in fact it is. But as you

can imagine based on what you’ve learned so far, Clojure does create a lot of young

objects that are never again accessed, shown (in spirit) here:

(reduce merge [{1 3} {1 2} {3 4} {3 5}])

;=> {1 2, 3 5}

A naive implementation2 of reduce would build intermediate maps corresponding to

the different phases of accumulation. The accumulation of these short-lived instances

can in some circumstances cause inefficiencies, which transients are meant to address.

THE RULE OF TRANSIENTS Write your code so that it’s first and foremost cor-

rect using the immutable collections and operations; then, make changes to

use transients for gaining speed. But you might be better served by writing idi-

omatic and correct code and letting the natural progression of speed

enhancements introduced in new versions of Clojure take over. Spot optimi-

zations often quickly become counter-optimizations by preventing the lan-

guage/libraries from doing something better.

We’ll explore how you can use transients in the next section.

2 The actual implementation of reduce follows a reduce protocol that delegates to a smart “internal reduce”

mechanism that’s meant for data structures that know the most efficient way to reduce themselves.

Download from Wow! eBook <www.wowebook.com>

Transients

281

12.2.2 Transients compare in efficiency to mutable collections

Mutable objects generally don’t make new allocations during intermediate phases of

an operation on a single collection type, and comparing persistent data structures

against that measure assumes a lesser memory efficiency. But you can use transients to

provide not only efficiency of allocation, but often of execution as well. Take a function

zencat, intended to work similarly to Clojure’s concat, but with vectors exclusively:

(defn zencat1 [x y]

(loop [src y, ret x]

(if (seq src)

(recur (next src) (conj ret (first src)))

ret)))

(zencat1 [1 2 3] [4 5 6])

;=> [1 2 3 4 5 6]

(time (dotimes [_ 1000000] (zencat1 [1 2 3] [4 5 6])))

; "Elapsed time: 486.408 msecs"

;=> nil

The implementation is simple enough, but it’s not all that it could be. The effects of

using transients is shown next.

Listing 12.1 A concatenation function using transients

(defn zencat2 [x y]

(loop [src y, ret (transient x)]

Create transient

(if (seq src)

(recur (next src) (conj! ret (first src)))

Use transient conj!

(persistent! ret))))

Return persistent

(zencat2 [1 2 3] [4 5 6])

;=> [1 2 3 4 5 6]

(time (dotimes [_ 1000000] (zencat2 [1 2 3] [4 5 6])))

; "Elapsed time: 846.258 msecs"

;=> nil

Wait, what? It seems that by using transients, we’ve actually made things worse—but

have we? The answer lies in the question, “what am I actually measuring?” The timing

code is executing zencat2 in a tight loop. This type of timing isn’t likely representa-

tive of actual use, and instead highlights an important consideration: the use of

persistent! and transient, though constant time, aren’t free. By measuring the use

of transients in a tight loop, we’ve introduced a confounding measure, with the dispa-

rate cost of using transients compared to the cost of concatenating two small vectors.

A better benchmark would instead be to measure the concatenation of larger vectors,

therefore minimizing the size-relative cost of transients:

(def bv (vec (range 1e6)))

(first (time (zencat1 bv bv)))

; "Elapsed time: 181.988 msecs"

Download from Wow! eBook <www.wowebook.com>

282

CHAPTER 12 Performance

;=> 0

(first (time (zencat2 bv bv)))

; "Elapsed time: 39.353 msecs"

;=> 0

In the case of concatenating large vectors, the use of transients is ~4.5 times faster

than the purely functional approach. Be careful how you use transients in your own

applications, because as you saw, they’re an incredible boon in some cases, and quite

the opposite in others. Likewise, be careful designing performance measurements,

because they might not always measure what you think.

Because transients are a mutable view of a collection, you should take care when

exposing outside of localized contexts. Fortunately, Clojure doesn’t allow a transient

to be modified across threads and will throw an exception if attempted. But it’s easy

enough to forget that you’re dealing with a transient and return it from a function.

That’s not to say that you couldn’t return a transient from a function—it can be use-

ful to build a pipeline of functions that work in concert against a transient structure.

Instead, we ask that you remain mindful when doing so.

The use of transients can help to gain speed in many circumstances. But be

mindful of the trade-offs when using them, because they’re not cost-free operations.

12.3

Chunked sequences

With the release of Clojure 1.1, the granularity of Clojure’s laziness was changed from

a one-at-a-time model to a chunk-at-a-time model. Instead of walking through a

sequence one node at a time, chunked sequences provide a windowed view (Boncz

2005) on sequences some number of elements wide, as illustrated here:

(def gimme #(do (print \.) %))

(take 1 (map gimme (range 32)))

You might expect that this snippet would print (.0) because we’re only grabbing the

first element, and if you’re running Clojure 1.0, that’s exactly what you’d see. But in

later versions, the picture is different:

;=> (................................0)

If you count the dots, you’ll see exactly 32, which is what you’d expect given the state-

ment from the first paragraph. Expanding a bit further, if you increase the argument

to range to be 33 instead, you’ll see the following:

(take 1 (map gimme (range 33)))

;=> (................................0)

Again you can count 32 dots. Moving the chunky window to the right is as simple as

obtaining the 33rd element:

(take 1 (drop 32 (map gimme (range 64))))

;=> (................................................................32)

Download from Wow! eBook <www.wowebook.com>

Chunked sequences

283

Figure 12.1 Clojure’s chunked

sequences allow a windowed view

of a sequence. This model is more

efficient, in that it allows for larger

swaths of memory to be reclaimed

by the garbage collector and better

cache locality in general. There’s a

cost to total laziness, but often the

benefit gained is worth the cost.

As we showed in chapter 5, Clojure’s sequences are implemented as trees fanning out

at increments of 32 elements per node. Therefore, chunks of size 32 are a natural fit,

allowing for the garbage collection of larger chunks of memory as seen in figure 12.1.

You might be worried that chunked sequences have squashed the entire point of

lazy sequences, and for small sequences this would be correct. But the benefits of lazy

sequences are striking when dealing with cyclopean magnitudes or sequences larger

than memory. Chunked sequences in the extreme cases are an incredible boon

because not only do they make sequence functions more efficient overall, they still ful-

fill the promise of lazy sequences: avoiding full realization of interim results.

12.3.1 Regaining one-at-a-time laziness

There are legitimate concerns about this chunked model, and one such concern is

the desire for a one-at-a-time model to avoid exploding computations. Assuming that

you have such a requirement, one counterpoint against chunked sequences is that of

building an infinite sequence of Mersenne primes.3 Implicit realization of the first 32

Mersenne primes through chunked sequences will finish long after the Sun has died.

But you can use lazy-seq to create a function seq1 that can be used to restrict (or

dechunkify, if you will) a lazy sequence and enforce the one-at-a-time model, as in the

following listing.

Listing 12.2 A dechunkifying seq1 function

(defn seq1 [s]

(lazy-seq

(when-let [[x] (seq s)]

(cons x (seq1 (rest s))))))

(take 1 (map gimme (seq1 (range 32))))

;=> (.0)

(take 1 (drop 32 (map gimme (seq1 (range 64)))))

;=> (.................................32)

3 See http://en.wikipedia.org/wiki/Mersenne_Primes.

Download from Wow! eBook <www.wowebook.com>

284

CHAPTER 12 Performance

You can again safely generate your lazy, infinite

sequence of Mersenne primes. The world

rejoices. But seq1 eliminates the garbage-

collection efficiencies of the chunked model and

again regressed back to that shown in figure 12.2.

Clojure may one day provide an official API

for one-at-a-time lazy sequence granularity, but

for now seq1 will suffice. We advise that you

Figure 12.2 Using seq1, you can again

instead stick to the chunked model, because

reclaim the one-at-a-time sequence model.

Though not as efficient as the chunked

you’ll probably never notice its effects during

model, it does again provide total

normal usage.

sequence laziness.

12.4 Memoization

As we mentioned briefly in section 11.4, memoization (Michie 1968) refers to storing

a cache of values local to a function so that its arguments can be retrieved rather than

calculated on every call. The cache is a simple mapping of a given set of arguments to

a previously calculated result. In order for this to work, the memoized function must

be referentially transparent, which we discussed in section 7.1. Clojure comes with a

memoize function that can be used to build a memoized version of any referentially

transparent function, as shown:

(def gcd (memoize

(fn [x y]

(cond

(> x y) (recur (- x y) y)

(< x y) (recur x (- y x))

:else x))))

(gcd 1000645475 56130776629010010)

;=> 215

Defining a “greatest common denominator” function using memoize helps to speed

subsequent calculations using the arguments 1000645475 and 56130776629010010.

The function memoize wraps another function4 in a cache lookup pass-through func-

tion and returns it. This allows you to use memoize on literally any referentially trans-

parent function. The operation of the memoize is analogous to, but not exactly the

operation of Clojure’s lazy sequences that cache the results of their realized portions.

This general technique can be useful, but the indiscriminate storage provided by

memoize might not always be appropriate. Therefore, we’ll take a step back and devise

a way to generalize the operation of memoization into useful abstractions and build a

framework for employing caching strategies more appropriate to the domain at hand.

You might’ve noticed that we explicitly bound the Var gcd to the memoization of an anonymous function but

then used recur for implementing the function body. This approach suffers from the inability to cache the

intermediate results (Norvig 1991) of gcd. We leave the solution to this short-coming as an exercise for the

reader.

Download from Wow! eBook <www.wowebook.com>

Memoization

285

Similar to Haskell’s typeclasses, Clojure’s protocols define a set of signatures pro-

viding a framework of adherence to a given set of features. This section serves a three-

fold goal:

 Discussion of memoization

 Discussion of protocol design

 Discussion of abstraction-oriented programming

12.4.1 Re-examining memoization

As mentioned in section 11.4, memoization is a personal affair, requiring a certain

domain knowledge to perform efficiently and correctly. That’s not to say that the core

memoize function is useless, only that the base case doesn’t cover all cases. In this sec-

tion, we’ll define a memoization protocol in terms of the primitive operations:

lookup, has?, hit, and miss. Instead of providing a memoization facility that allows

the removal of individual cache items, it’s a better idea to provide one that allows for

dynamic cache-handling strategies.5

12.4.2 A memoization protocol

The protocol for a general-purpose cache feature is provided in the following listing.

Listing 12.3 A protocol for caching

(defprotocol CacheProtocol

(lookup [cache e])

(has? [cache e] )

(hit [cache e])

(miss [cache e ret]))

The function lookup retrieves the item in the cache if it exists. The function has? will

check for a cached value. The function hit is called when an item is found in the

cache, and miss is called when it’s not. If you’re familiar with creating Java interfaces,

the process of creating a protocol should be familiar. Moving on, we next implement

the core memoize functionality.

Listing 12.4 A basic cache type

(deftype BasicCache [cache]

CacheProtocol

(lookup [_ item]

(get cache item))

(has? [_ item]

(contains? cache item))

(hit [this item] this)

(miss [_ item result]

(BasicCache. (assoc cache item result))))

5 This section is motivated by the fantastic work of the brilliant Clojurians Meikel Brandmeyer, Christophe

Grand, and Eugen Dück summarized at http://kotka.de/blog/2010/03/memoize_done_right.html.

Download from Wow! eBook <www.wowebook.com>

286

CHAPTER 12 Performance

The BasicCache takes a cache on construction used for its internal operations. Test-

ing the basic caching protocol in isolation shows:

(def cache (BasicCache. {}))

(lookup (miss cache '(servo) :robot) '(servo))

;=> :robot

In the case of a miss, the item to be cached is added and a new instance of BasicCache

(with the cached entry added) is returned for retrieval using lookup. This is a simple

model for a basic caching protocol, but not terribly useful in isolation. We can go fur-

ther by creating an auxiliary function through, meaning in effect, “pass an element

through the cache and return its value”:

(defn through [cache f item]

(if (has? cache item)

(hit cache item)

(miss cache item (delay (apply f item)))))

With through, the value corresponding to a cache item (function arguments in this

case) would either be retrieved from the cache via the hit function, or calculated and

stored via miss. You’ll notice that the calculation (apply f item) is wrapped in a

delay call instead of performed outright or lazily through an ad hoc initialization

mechanism. The use of an explicit delay in this way helps to ensure that the value is

calculated only on first retrieval. With these pieces in place, we can then create a

PluggableMemoization type, as shown next.

Listing 12.5 A type implementing pluggable memoization

(deftype PluggableMemoization [f cache]

CacheProtocol

(has? [_ item] (has? cache item))

(hit [this item] this)

(miss [_ item result]

(PluggableMemoization. f (miss cache item result)))

(lookup [_ item]

(lookup cache item)))

The purpose of the PluggableMemoization type is to act as a delegate to an underlying

implementation of a CacheProtocol occurring in the implementations for hit, miss,

and lookup. Likewise, the PluggableMemoization delegation is interposed at the

protocol points to ensure that when utilizing the CacheProtocol, the Pluggable-

Memoization type is used and not the BasicCache. We’ve made a clear distinction

between a caching protocol fulfilled by BasicCache and a concretized memoization

fulfilled by PluggableMemoization and through. With the creation of separate abstrac-

tions, you can use the appropriate concrete realization in its proper context. Clojure

programs will be composed of various abstractions. In fact, the term abstraction-oriented

programming is used to describe Clojure’s specific philosophy of design.

The original manipulable-memoize function from section 11.4 is modified in the

following listing to conform to our memoization cache realization.

Download from Wow! eBook <www.wowebook.com>

Understanding coercion

287

Listing 12.6 A function for applying pluggable memoization to a function

(defn memoization-impl [cache-impl]

(let [cache (atom cache-impl)]

(with-meta

(fn [& args]

(let [cs (swap! cache through (.f cache-impl) args)]

@(lookup cs args)))

{:cache cache})))

If you’ll recall from the implementation of the through function, we stored delay

objects in the cache requiring they be deferenced when looked up. Returning to our

old friend the slowly function, we can exercise the new memoization technique as

shown:

(def slowly (fn [x] (Thread/sleep 3000) x))

(def sometimes-slowly (memoization-impl

(PluggableMemoization.

slowly

(BasicCache. {}))))

(time [(sometimes-slowly 108) (sometimes-slowly 108)])

; "Elapsed time: 3001.611 msecs"

;=> [108 108]

(time [(sometimes-slowly 108) (sometimes-slowly 108)])

; "Elapsed time: 0.049 msecs"

;=> [108 108]

You can now fulfill your personalized memoization needs by implementing pointed real-

izations of CacheProtocol, plugging them into instances of PluggableMemoization,

and applying them as needed via function redefinition, higher-order functions, or

dynamic binding. Countless caching strategies can be used to better support your needs,

each displaying different characteristics, or if needed your problem may call for some-

thing wholly new.

We’ve only scratched the surface of memoization in this section in favor of provid-

ing a more generic substrate on which to build your own memoization strategies.

Using Clojure’s abstraction-oriented programming techniques, your own programs

will likewise be more generic and be built largely from reusable parts.

12.5

Understanding coercion

Although Clojure is a dynamically typed language, it does provide mechanisms for

specifying value types. The first of these mechanisms, type hints, was covered in sec-

tion 12.1. The second, coercion, is the subject of this section. Although the nature of

type hints and coercion are similar, their intended purposes are quite different. In the

case of coercion, its purpose is to get at the primitive data type for a value, which we’ll

show next.

Download from Wow! eBook <www.wowebook.com>

288

CHAPTER 12 Performance

12.5.1 First rule of coercion: don’t

Clojure’s compiler is sophisticated enough that in many ways it’ll be unnecessary to

coerce values into primitives. It’s often better to start with a function or code block

devoid of coercions. Unless your specific application requires the utmost speed in exe-

cution, it’s better to stick with the version that favors simplicity over the alternative.

But should you decide that coercion might be the choice for you, then this section will

provide guidance.

12.5.2 Corollary: you’re probably not doing it right

If you’ve determined that coercion can help, then it’s worth stressing that you have to

be careful when going down that road. In many cases with coercion, the act of adding

it can actually slow your functions. The reason lies in the nature of Clojure. Functional

composition leads to passing arguments back and forth between pieces, and in the cir-

cumstance of coercion you’re just boxing and unboxing6 from one call to the next.

This particular circumstance is especially devious within the body of a loop, and fol-

lows the same performance degradations observed with Java. Clojure’s unboxing is an

explicit7 operation performed using the coercion functions, so there’s a speck of light

there. Unfortunately, autoboxing is still a danger and should be avoided if speed is a

concern, as we’ll explore now:

(defn occur-count [words]

(let [res (atom {})]

(doseq [w words] (swap! res assoc w (+ 1 (@res w 0))))

@res))

(defn roll [n d]

(reduce + (take n (repeatedly #(inc (rand-int d))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))

; "Elapsed time: 4055.505 msecs"

The function occur-count will return a map of the occurrence counts8 found in a

given sequence. This fairly straightforward implementation uses the function roll to

populate a sequence with a million simulated rolls of three six-sided dice. But four sec-

onds seems like a long time to wait, so perhaps we can speed things up by using coer-

cions. An initial attempt to gain speed may be to pull out the stored count from the

table and coerce it into an int:

(defn occur-count [words]

(let [res (atom {})]

(doseq [w words]

(let [v (int (@res w 0))]

(swap! res assoc w (+ 1 v))))

@res))

Autoboxing is the automatic conversion the Java compiler makes between the primitive types and their cor-

responding object wrapper classes.

Except when directly or indirectly (via inlining or a macro body) calling a method.

Clojure has a function frequencies that does this, so we provide occur-count for illustrative purposes only.

Download from Wow! eBook <www.wowebook.com>

Understanding coercion

289

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))

; "Elapsed time: 4385.8 msecs"

Well, that didn’t work. The reason for a decrease in speed is that although we’re spec-

ifying the type at the outer loop, we haven’t reduced the need to box and unbox that

value further downstream in the roll function. We might then be led to try and opti-

mize the roll function too:

(defn roll [n d]

(let [p (int d)]

(reduce + (take n (repeatedly #(inc (rand-int p)))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))

; "Elapsed time: 4456.393 msecs"

;=> nil

Again we’ve made matters worse and have spread the problems over the surface of the

entire code. Being adventurous, we grasp for straws and attempt to force integer arith-

metic with roll by using the unchecked-inc function:

(defn roll [n d]

(let [p (int d)]

(reduce + (take n (repeatedly #(unchecked-inc (rand-int p)))))))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))

Go ahead and run that in the REPL, then go get some coffee and a bagel. Toast the

bagel. Eat the bagel. Drink the coffee. By that time, you might’ve received a result.

So what happened? In an attempt to be clever, we’ve confused the Clojure com-

piler into near unconsciousness. Instead of making direct calls to Clojure’s math func-

tions, we’re now making calls indirectly via Java reflection! You can observe this by

setting *warn-on-reflection* to true and reentering roll.

How can we speed things up? The problem isn’t with coercion itself, but instead

with the implementations of roll and occur-count. You can observe significant

speed-ups by rethinking your original implementations first and then resorting to

coercion second. The use of coercion should always be preceded by a reevaluation of

your implementation, because often by doing so you can eliminate the need for coer-

cion altogether, as shown next.

Listing 12.7 Using coercion to gain speed

(defn roll [n d]

(loop [n (int n), sum 0]

Coerce n to int

(if (zero? n)

sum

(recur (dec n) (+ sum (inc (rand-int d)))))))

(defn occur-count [words]

(reduce #(assoc %1 %2 (inc (%1 %2 0))) {} words))

(time (dorun (occur-count (take 1000000 (repeatedly #(roll 3 6))))))

; "Elapsed time: 937.346 msecs"

;=> nil

Download from Wow! eBook <www.wowebook.com>

290

CHAPTER 12 Performance

By refactoring the original functions, we’ve gained a five-fold increase in speed and

yet used only a single coercion. Additionally, we’ve managed to make the new imple-

mentation faster while also maintaining clarity. This should be a general goal when

writing your Clojure code, and when forced to make a trade between the two, it might

be a good idea to favor clarity.

12.5.3 Second rule of coercion: don’t

In the previous example, there’s too much noise in collection and sequence operations

for primitive coercion to help much. This goes to show that it’s important to remember

that the Clojure compiler will often do a better job at optimization than you.

12.5.4 Third rule of coercion: coerce a stable local

When coercing a local to a primitive type, it’s tempting to do so at the point of use,

but this practice should be avoided. A good rule of thumb for coercion is to coerce

only within a local context via let, binding, or loop. This provides a stable value point

for the primitive, allowing you to reuse that same local elsewhere in the same function

without having to again coerce at different points of use. This can be illustrated by the

following:

(defn mean

"Takes a sequence of integers and returns their mean value"

[sq]

(let [length (int (count sq))]

(if (zero? length)

(/ (int (reduce + sq)) length))))

The length value has been bound in the let, allowing it to be reused twice within the

body of the function. This allows for a cleaner implementation than the alternative,

which coerces the results of (count sq) in multiple places. Using this advice and the

fact that Clojure provides lexical scope by default, you can also avoid the need to

define a name-mangled local by instead using let to rebind original argument names

to coerced values (defn [x] (let [x (int x)] ...)).

12.5.5 Fourth rule of coercion: watch your sizes

Primitive type coercions in Clojure act the same as type truncation in Java. If a given

value is coerced into a type that can’t hold a value of its magnitude, then data loss will

occur, and in the case of integer overflow, exceptions will be thrown.

12.5.6 Fifth rule of coercion: truncate only as a goal

By default, Clojure doesn’t limit the accuracy of mathematical operations, but this can

occur when using coercion. There will be many instances in your own projects when

speed is more important than accuracy in mathematical operations. Likewise, there

will also be times when truncation is necessary, especially when dealing with Java

library methods that take primitive types:

Download from Wow! eBook <www.wowebook.com>

Summary

291

(Math/round 1.23897398798929929872987890030893796768727987138M)

; java.lang.IllegalArgumentException:

; No matching method found: round

When a method or function isn’t overloaded, the Clojure compiler can determine

whether an argument can be coerced to a primitive type and will do so if able. The

preceding issue exception arises from the fact that Math/round is overloaded, taking

either a float or double typed argument. Therefore, you have to explicitly use coer-

cion to truncate the argument:

(Math/round (float 1.23897398798929929872987890030893796768727987138M))

;=> 1

Our goal in using the truncating operation float was to get a result that we knew

wouldn’t be affected by truncation. But many instances will arise when truncation will

affect your results and will often do so to your detriment. Therefore, it’s best to be

wary when using coercion, because it propagates inaccuracies. It’s best to limit its

usage when truncation is desired and document vigorously when it’s absolutely

needed for speed.

Coercion can be an effective tool in your Clojure applications, but take care to be

sure you understand the caveats. If you take away one lesson from this section, let it be

this: do not rush to coercion.

12.6

Summary

Clojure provides numerous ways to gain speed in your applications. Using some com-

bination of type hints, transients, chunked sequences, memoization, and coercion,

you should be able to achieve noticeable performance gains. Like any powerful tool,

these performance techniques should be used cautiously and thoughtfully. But once

you’ve determined that performance can be gained, their use is minimally intrusive

and often natural to the unadorned implementation.

In the final chapter, we’ll cover a number of ways that the Clojure way of thinking

might be different from what you’re accustomed to. The discussion therein, when

explored with an open mind, will change the way that you write software.

Download from Wow! eBook <www.wowebook.com>

Clojure changes

the way you think

This chapter covers

 DSLs

 Testing

 A lack of design patterns

 Error handling and debugging

 Fare thee well

In this final chapter, we cover some tangential topics that you might already be

familiar with, but perhaps not from a Clojure perspective. Our discussion will start

with domain-specific languages (DSLs) and the unique way that Clojure applica-

tions are built from a layering of unique application-specific DSLs. Next, you’re

unlikely to be ignorant of the general push toward a test-driven development

(TDD) philosophy, with a special focus on unit testing. We’ll explore why Clojure is

especially conducive to unit testing and why it’s often unnecessary. Next, whether

you agree with the cult of design patterns or not, it’s inarguable that patterns have

changed the way that object-oriented software is designed and developed. The clas-

sical design patterns are often invisible, or at times outright nonexistent in Clojure

292

Download from Wow! eBook <www.wowebook.com>

DSLs

293

code, which we’ll discuss in this chapter. As we’ll then show, error handling in Clojure

flows in two directions: from inner functions to outer via exceptions, and from outer

functions in via dynamic bindings. Finally, we’ll explore how having the entire lan-

guage at your disposal can help to change the way that your debugging occurs. We

hope that by the time you’ve finished this chapter, you’ll agree—Clojure changes the

way you think about programming.

13.1 DSLs

Lisp is not the right language for any particular problem. Rather, Lisp encourages

one to attack a new problem by implementing new languages tailored to that

problem.

—“Lisp: A Language for Stratified Design” (Abelson 1988)

In chapter 8, we explored the notion of a domain-specific language for describing

domains. This meta-circularity, while playful, was meant to make a subtle point: Clo-

jure blurs, and often obliterates, the line between DSL and API. When a language is

built from the same data structures that the language itself manipulates, it’s known as

homoiconic (Mooers 1965). When a programming language is homoiconic, it’s simple

to mold the language into a form that bridges the gap between the problem and solu-

tion domains. When designing DSLs in Clojure, it’s important to determine when the

existing language facilities will suffice (Raymond 2003) and when it’s appropriate to

create one from whole cloth (Ghosh 2010). In this section we’ll do both and provide a

little discussion about each.

13.1.1 A ubiquitous DSL

The declarative language SQL is among the most widespread DSLs in use today. In sec-

tion 1.2 we showed a simple Clojure DSL, which provided a simple subset of the SELECT

statement that created a representational SQL string. Though that particular example

was meant to be instructive, Clojure provides a comprehensive library for relational

algebra, on which SQL is based (Date 2009). Imagine a dataset of the following:

(def artists

#{{:artist "Burial" :genre-id 1}

{:artist "Magma" :genre-id 2}

{:artist "Can" :genre-id 3}

{:artist "Faust" :genre-id 3}

{:artist "Ikonika" :genre-id 1}

{:artist "Grouper"}})

(def genres

#{{:genre-id 1 :genre-name "Dubstep"}

{:genre-id 2 :genre-name "Zeuhl"}

{:genre-id 3 :genre-name "Prog"}

{:genre-id 4 :genre-name "Drone"}})

You can try Clojure’s relational functions by entering the examples shown in the fol-

lowing listing.

Download from Wow! eBook <www.wowebook.com>

294

CHAPTER 13 Clojure changes the way you think

Listing 13.1 Examples of Clojure’s relational algebra functions

(require '[clojure.set :as ra])

(def ALL identity)

(ra/select ALL genres)

;=> #{{:genre-id 4, :genre-name "Drone"}

{:genre-id 3, :genre-name "Prog"}

{:genre-id 2, :genre-name "Zeuhl"}

{:genre-id 1, :genre-name "Dubstep"}}

(ra/select #(#{1 3} (:genre-id %)) genres)

;=> #{{:genre-id 3, :genre-name "Prog"}

{:genre-id 1, :genre-name "Dubstep"}}

(take 2 (ra/select ALL (ra/join artists genres)))

;=> #{{:artist "Burial", :genre-id 1, :genre-name "Dubstep"}

{:artist "Magma", :genre-id 2, :genre-name "Zeuhl"}}

The relational functions in clojure.set are a perfect example of the way that Clojure

blurs the line between API and DSL. No macro tricks are involved, but through the

process of functional composition, the library provides a highly expressive syntax

matching closely (Abiteboul 1995) that of SQL itself. Though you might be tempted to

create a custom query language for your own application(s), there are times when the

relational functions are exactly what you need. Your time might be better spent solv-

ing actual problems, one of which we’ll cover in the following section.

13.1.2 Putting parentheses around the specification

Many applications deal in measurements of differing units. For example, it’s widely

known that the U.S. works almost exclusively in English units of measure, whereas

most of the rest of the planet works in SI, or metric units. To convert1 from one to the

other isn’t an arduous task and can be handled easily with a set of functions of this

general form:

(defn meters->feet [m] (* m 3.28083989501312))

(defn meters->miles [m] (* m 0.000621))

(meters->feet 1609.344)

;=> 5279.9999999999945

(meters->miles 1609.344)

;=> 0.999402624

This approach will certainly work if only a few functions define the extent of your

conversion needs. But if your applications are like ours, then you probably need to

convert to and from differing units of measure of many different magnitudes. You

may also need to convert back and forth between units of time, dimension, orienta-

tion, and a host of others. Therefore it’d be nice to be able to write a specification of

unit conversions (Hoyte 2008) as a Clojure DSL and use its results as a low-level layer

1 A spectacular general-purpose JVM language named Frink excels at conversions of many different units. We

highly advocate exploring Frink at your next available opportunity: http://futureboy.us/frinkdocs/.

Download from Wow! eBook <www.wowebook.com>

DSLs

295

for high-layer application specifics. This is precisely the nature of Lisp development in

general—each level in an application provides the primitive abstractions for the levels

above it.

In this section, we’re going to create a small specification and then convert it into a

Clojure DSL using a technique coined by Rainer Joswig as “putting parentheses

around the specification.”

DEFUNITS

An ideal representation for a unit-conversion specification language would be simple:

Our base unit of distance is the meter. There are 1,000 meters in a kilometer. There

are 100 centimeters in a meter. There are 10 millimeters in a centimeter. There are

3.28083 feet in a meter. And finally, there are 5,280 feet in a mile.

Of course, to make sense of free text is a huge task in any language, so it behooves us

to change it so that it’s easier to reason about programmatically, but not so much that

it’s cumbersome for someone attempting to describe unit conversions. As a first pass,

we’ll try to group the most obvious parts using some Clojure syntactical elements:

(Our base unit of distance is the :meter

[There are 1000 :meters in a :kilometer]

[There are 100 :centimeters in a :meter]

[There are 10 :millimeters in a :centimeter]

[There are 3.28083 :feet in a :meter]

[There are 5280 :feet in a :mile])

This specification is starting to look a little like Clojure code, but it would still be diffi-

cult to parse this into a usable form. Likewise, it’ll be difficult for the person writing

the specification to use the correct terms, avoid spelling mistakes, properly punctuate,

and so forth. In a word, this form is still not useful. It’d be ideal if we could make this

into a form that’s still recognizable to both Clojure and a conversion expert. We’ll try

one more time:

(define unit of distance

{:m 1,

:km 1000,

:cm 1/100,

:mm [1/10 of a :cm],

:ft 0.3048,

:mile [is 5280 :ft]})

This almost looks like Clojure source code, except for a few minor details. We’ve

changed the measure of feet from an “in a” relationship to a relative one with regard

to the meter base unit. Also, a vector indicates the use of a different relative unit,

keeping the DSL regular in its meaning between one conversion and the next and

providing a way to describe intermediate relative units of measure. Those definitions

look like a map, so we should write a utility function that takes a unit and a map like

the preceding one and returns the number of units it takes to compose the base unit.

Download from Wow! eBook <www.wowebook.com>

296

CHAPTER 13 Clojure changes the way you think

Listing 13.2 A function for calculating compositional units of a base unit

(defn relative-units [u units]

(let [spec (u units)]

Look up unit spec

(if (nil? spec)

(throw (Exception. (str "Undefined unit " u)))

(if (vector? spec)

Deconstruct relative spec

(let [[conv to] spec]

(* conv

Multiply relative units

(relative-units to units)))

spec))))

The function relative-units goes through the map units looking up units and mul-

tiplying their compositional values. When it finds an indirect specification (such as

millimeters defined in terms of centimeters), it traverse the chain of indirect refer-

ences multiplying the factors along the way, as shown:

(relative-units :m {:m 1 :cm 100 :mm [10 :cm]})

;=> 1

(relative-units :cm {:m 1 :cm 100 :mm [10 :cm]})

;=> 100

(relative-units :mm {:m 1 :cm 100 :mm [10 :cm]})

;=> 1000

We changed the unit conversions map to remove the natural language phrase “in a,”

because English isn’t good for a DSL. Natural language often lacks the precision that a

simple yet regular form has. Now that we have the auxiliary function created, we’d like

to create a macro to interpret the unit specification as shown:

(defunits-of distance :m

:km 1000

:cm 1/100

:mm [1/10 :cm]

:ft 0.3048

:mile [5280 :ft])

This is a simplification versus the original verbal form of the conversion specification.

This final form is indubitably more conducive to parsing, yet doesn’t appreciably sacri-

fice readability. The implementation of the defunits-of macro is presented in the

following listing.

Listing 13.3 A defunits-of macro

(defmacro defunits-of [name base-unit & conversions]

(let [magnitude (gensym)

unit (gensym)

Create 

units-map (intò{~base-unit 1}

units map

(map vec (partition 2 conversions)))]

`(defmacro ~(symbol (str "unit-of-" name))

[~magnitude ~unit]

`(* ~~magnitude

Multiply magnitude 

~(case ~unit

by target unit

Download from Wow! eBook <www.wowebook.com>

DSLs

297

~@(mapcat

Unroll unit conversions

(fn [[u# & r#]

into case lookup

`[~u# ~(relative-units u# units-map)])

units-map))))))

The macro defunits-of is different than any macro that you’ve seen thus far, but it’s

typical for macros that expand into another macro definition. In this book you’ve yet

to see a macro that builds another macro and uses multiple levels of nested2 syntax-

quotes. You won’t likely see macros of this complexity often, but in this case we use

nested syntax-quotes so that we can feed structures from the inner layers of the nested

macros to the outer layers, processing each fully before proceeding. At this point, we

can now run a call to the defunits-of macro with the simplified metric to English

units conversion specification to define a new macro named unit-of-distance:

(unit-of-distance 1 :m)

;=> 1

(unit-of-distance 1 :mm)

;=> 1/1000

(unit-of-distance 1 :ft)

;=> 0.3048

(unit-of-distance 1 :mile)

;=> 1609.344

Perfect! Everything is relative to the base unit :m, just as we’d like (read as “how many

meters are in a _”). The generated macro unit-of-distance allows you to work in

your given system of measures relative to a standard system without loss of precision or

the need for a bevy of awkward conversion functions. To calculate the distance a home

run hit by the Orioles’ Matt Wieters travels in Canada is a simple call to (unit-of-

distance 441 :ft) away. The expansion of the distance specification given as

(defunits-of distance :m ...) looks approximately like the following:

(defmacro unit-of-distance [G__43 G__44]

`(* ~G__43

(case ~G__44

:mile 1609.344

:km 1000

:cm 1/100

:m 1

:mm 1/1000

:ft 0.3048)))

The defunits-of macro is an interpreter of the unit-conversion DSL, which generates

another macro unit-of-distance that performs a straightforward lookup of relative

unit values. Amazingly, the expansion given by (macroexpand '(unit-of-distance 1

:cm)) is that of a simple multiplication (* 1 1/100). This is an awe-inspiring revela-

tion. What we’ve managed to achieve is to fuse the notion of compilation and evalua-

2 We talked briefly about making sense out of nested syntax-quotes in section 8.1. However, you’re not likely to

need them very often.

Download from Wow! eBook <www.wowebook.com>

298

CHAPTER 13 Clojure changes the way you think

tion by writing a relative units of measure “mini-language” that’s interpreted into a

simple multiplication at compile time!

This is nothing new; Lisp programmers have known about this technique for

decades, but it never ceases to amaze. There’s one downside to our implementation—

it allows for circular conversion specifications (seconds defined in terms of minutes,

which are then defined in terms of seconds), but this can be identified and handled in

relative-units if you’re so inclined.

13.1.3 A note about Clojure’s approach to DSLs

DSLs and control structures implemented as macros in Common Lisp tend to be writ-

ten in a style more conducive to macro writers. But Clojure macros such as defunit-

of, cond, and case are idiomatic in their minimalism; their component parts are

paired and meant to be grouped through proper spacing. Clojure macro writers

should understand that the proliferation and placement of parentheses are legitimate

concerns for some, and as a result you should strive to reduce the number whenever

possible. Why would you explicitly group your expressions when their groupings are

only a call to partition away?

CLOJURE APHORISM If a project elicits a sense of being lost, then start from

the bottom up.

DSLs are an important part of a Clojure programmer’s toolset and stem from a long

Lisp tradition. When Paul Graham talks about “bottom-up programming” in his

perennial work On Lisp, this is what he’s referring to. In Clojure, it’s common practice

to start by defining and implementing a low-level language specifically for the levels

above. Creating complex software systems is hard, but using this approach, you can

build the complicated parts out of smaller, simpler pieces.

Clojure changes the way that you think.

13.2

Testing

Object-oriented programs can be highly complicated beasts to test properly, thanks to

mutating state coupled with the need to test across class hierarchies. Programs are a

vast tapestry of interweaving execution paths, and to test each path comprehensively is

difficult, if not impossible. In the face of unrestrained mutation, the execution paths

are overlaid with mutation paths, further adding to the chaos. Conversely, Clojure

programs tend to be compositions of pure functions with isolated pools of mutation.

The result of this approach helps to foster an environment conducive to unit testing.

But though the layers of an application are composed of numerous functions, each

individually and compositionally tested, the layers themselves and the wiring between

them must also be tested.

Test-driven development (Beck 2002) has conquered the software world, and at its

core it preaches that test development should drive the architecture of the overall

application. Unfortunately, this approach isn’t likely to bear fruit in your Clojure

Download from Wow! eBook <www.wowebook.com>

Testing

299

programs. Instead, Clojure provides the foundation for a contracts-based program

specification that’s more amenable for writing correct programs. But before we dis-

cuss contracts, we’ll touch on the ways Clojure facilitates one part of TDD, unit testing.

13.2.1 Some useful techniques

We don’t want to disparage test-driven development, because its goals are virtuous and

testing in general is essential. Because Clojure programs are organized using

namespaces, and they are themselves aggregations of functions, often pure, the act of

devising a unit-test suite at the namespace boundary is often mechanical in its direct-

ness. From a larger perspective, devising comprehensive test strategies is the subject of

numerous volumes and therefore outside of the scope of this book; but there are a few

Clojure-specific techniques that we wish to discuss.

USING WITH-VAR-ROOT TO STUB

Stubbing (Fowler 2007) is the act of supplying an imitation implementation of a func-

tion for testing purposes. One mechanism that can perform this stubbing is the

with-redefs macro implemented in the following listing. Though this exact macro

will likely be included in future versions of Clojure, it’s not in Clojure 1.2, so a defini-

tion is provided.

Listing 13.4 Macro to aid in mocking

(defn with-redefs-fn [binding-map func & args]

(let [root-bind (fn [m]

(doseq [[a-var a-val] m] (.bindRoot a-var a-val)))

old-vals (zipmap (keys binding-map)

(map deref (keys binding-map)))]

(try

(root-bind binding-map)

(apply func args)

(finally

(root-bind old-vals)))))

(defmacro with-redefs [bindings & body]

`(with-redefs-fn ~(zipmap (map #(list `var %) (take-nth 2 bindings))

(take-nth 2 (next bindings)))

(fn [] ~@body)))

The function rss-children from section 11.6 parses a Twitter RSS2 feed, returning a

sequence of the top-level feed elements. Testing functions that rely on rss-children

is futile against live Twitter feeds, so a stubbed implementation returning a known

sequence would be more prudent, as shown next.

Listing 13.5 Using with-redefs to create stubs

(defn tweetless-rss-children [s]

Create stub

'({:tag :title, :attrs nil, :content ["Stub"]}))

(defn count-rss2-children [s]

(count (rss-children s)))

Download from Wow! eBook <www.wowebook.com>

300

CHAPTER 13 Clojure changes the way you think

(with-redefs [rss-children tweetless-rss-children]

Dynamically

(count-rss2-children "dummy"))

bind with stub

;=> 1

The tweetless-rss-children function returns a sequence of some canned data.

Therefore, when testing the count-rss2-children function we temporarily change

the value of rss-children so that it resolves to tweetless-rss-children instead. This

change is made at the root of the rss-children Var and so is visible to all threads. As

long as all the test calls to it are made before control leaves the with-redefs form, the

stub will be invoked every time. Because tweet-occurrences doesn’t return until it col-

lects results from all the futures it creates, it will use the redef given by with-redefs:

(with-redefs [rss-children tweetless-rss-children]

(tweet-occurrences "dummy" "test-url"))

;=> 0

Another option that is sometimes suggested is to use binding in place of with-redefs.

This would push a thread-local binding for rss-children, which might seem attrac-

tive in that it could allow other threads to bind the same Var to a different stub func-

tion, potentially for simultaneously running different tests. But because tweet-

occurrences uses futures, the other threads will be calling rss-children and will see

the root binding rather than the stub,3 causing an error:

(binding [rss-children tweetless-rss-children]

(tweet-occurrences "dummy" "test-url"))

; java.util.concurrent.ExecutionException:

; java.io.FileNotFoundException: test-url

When the root binding of rss-children runs, it tries to actually load “test-url” and

fails, instead of calling our stub and succeeding. The with-redefs macro is a better

solution for mocking.

CLOJURE.TEST AS SPECIFICATION

Clojure ships with a testing library in the clojure.test namespace used to create test

suites that can further serve as partial system specifications. We won’t provide a com-

prehensive survey of the clojure.test functionality, but you should get a feel for how

it works. Unit-test specifications in Clojure are declarative in nature, as shown next.

Listing 13.6 clojure.test as a partial specification

(require '[clojure.test :as test])

(test/deftest feed-tests

(with-redefs [rss-children tweetless-rss-children]

(test/testing "RSS2 Child Counting"

(test/is (= 1000 (count-rss2-children "dummy"))))

Alpha versions for Clojure 1.3 handle binding’s interaction with future and Agent send differently, passing

dynamic bindings through to code executed in these other thread contexts. But because these are not the

only kinds of threads that can be spawned, with-redefs (which may be included in Clojure 1.3) is still rec-

ommended for mocking out functions during tests.

Download from Wow! eBook <www.wowebook.com>

Testing

301

(test/testing "Twitter Occurrence Counting"

(test/is (= 0 (count-tweet-text-task "#clojure" ""))))))

(defn test-ns-hook []

(feed-tests))

Clojure’s test library provides a DSL for describing unit test cases. If you’ll notice, we

added a failing test to the RSS2 Child Counting test so that when run, the test will fail

as expected:

(test/run-tests 'user)

; Testing user

;

; FAIL in (feed-tests) (NO_SOURCE_FILE:101)

; RSS2 Child Counting

; expected: (= 1000 (count-rss2-children "dummy"))

; actual: (not (= 1000 1))

;

; Ran 1 tests containing 2 assertions.

; 1 failures, 0 errors.

;=> {:type :summary, :test 1, :pass 1, :fail 1, :error 0}

Though tests are a good way to find some errors, they make few guarantees that the sys-

tem works properly. The ideal approach is the design and implementation of a frame-

work corresponding closely with the domain of the application itself. This framework

would ideally take the literal form of a domain DSL built incrementally through an

interaction with domain experts and must come before testing begins. No amount of

testing can substitute for thoroughly thinking through the fundamental design

details. That’s not to say that the domain DSLs should be fully realized from the start;

instead, the form of the DSL and its constituent parts should be reflective of the actual

domain. In our experience, there are no languages comparable to Clojure for this

kind of domain modeling, save for perhaps Haskell, Factor, and Scala. Having said

that, the domain isn’t simply defined by the shape of its language; it also includes its

expectations, which we’ll discuss presently.

13.2.2 Contracts programming

Test-driven development is in many ways a heuristic affair. People tend to only test the

error conditions and expectations that they can conceptualize. Surely there’s no such

thing as an exhaustive test suite, but in many cases test suites tend toward a local max-

ima. There’s a better way to define semantic expectations within your application:

using Clojure pre- and postconditions.

REVISITING PRE- AND POSTCONDITIONS

In section 7.1, we explored Clojure’s pre- and postcondition facility. Function con-

straint specification is a conceptually simple model for declaring the expectations for

any given function. Function constraints can cover the full range of expected condi-

tions imposed on the function’s inputs, its outputs, and their relative natures. The

beauty of specifying constraints is that they can augment a testing regimen with the

application of random values. The reason this works is that you can effectively throw out

Download from Wow! eBook <www.wowebook.com>

302

CHAPTER 13 Clojure changes the way you think

the values that fail the preconditions and instead focus on the values that cause error

in the postconditions. We’ll try this approach for a simple function to square a number:

(def sqr (partial

(contract sqr-contract

[n]

(require (number? n))

(ensure (pos? %)))

#(* % %)))

[(sqr 10) (sqr -9)]

;=> [100 81]

The contract for sqr states simply: require a number and ensure that its return is pos-

itive. Now we can create a simple test driver4 that throws many random values at it to

see if it breaks:

(doseq [n (range Short/MIN_VALUE Short/MAX_VALUE)]

(try

(sqr n)

(catch AssertionError e

(println "Error on input" n)

(throw e))))

; Error on input 0

;=> java.lang.AssertionError: Assert failed: (pos? %)

Even when adhering to the tenets of the preconditions, we’ve uncovered an error in

the sqr function at the postcondition end. Postconditions should be viewed as the

guarantee of the return value given that the preconditions are met. The reason for

the postcondition error is that the function’s contract doesn’t specify that the number

n should be nonzero. By adding a check for zero (not= 0 n) in the preconditions, we

can guarantee that the sqr function acts as expected. To perform this same verifica-

tion using unit testing is trivial in this case, but what if the edge condition wasn’t as

obvious? In such a case, it’s probable that the error might not be caught until it’s too

late. Of course, there’s no guarantee that your contracts are comprehensive, but that’s

why domain expertise is often critical when defining them.

ADVANTAGES OF PRE- AND POSTCONDITIONS

Function constraints aren’t code. They take the form of code, but that fact is only a mat-

ter of representation. Instead, constraints should be viewed as a specification language

describing expectations and result assurances. On the other hand, unit tests are code,

and code has bugs. Contracts, on the other hand, are essential semantic coupling.

Another potential advantage of contracts over tests is that in some cases, tests can

be generated from the contracts themselves. Also, pre- and postconditions are amena-

ble to being expressed as an overall description of the system itself, which can thus be

4 For the sake of highlighting this technique, we’ve simplified our test driver. Testing a limited range of input

values might not be an appropriate approach in all circumstances.

Download from Wow! eBook <www.wowebook.com>

A lack of design patterns

303

fed into a rule base for query and verification. Both of these cases are outside of the

scope of this book, but you shouldn’t be surprised if they make their way into future

versions of Clojure. There’s tremendous potential in Clojure’s pre- and postcondi-

tions. Though they’re currently low-level constructs, they can be used to express full-

blown design by contract facilities for your own applications.

Clojure changes the way that you think.

13.3 A lack of design patterns

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-

specified, bug-ridden, slow implementation of half of Common Lisp.

—Greenspun’s Tenth Rule

The book Design Patterns: Elements of Reusable Object-Oriented Software (Gamma et al

1995) was a seminal work of software design and development. You’d be hard pressed

to find a software programmer in this day and age who’s not familiar with this work.

The book describes 24 software best practices encountered throughout the course of

experience in developing software projects of varying sizes.

Design patterns have obtained a bad reputation in some circles, whereas in others

they’re considered indispensable. From our perspective, design patterns are a way to

express software best practices in a language-neutral way. But where patterns fall short

is that they don’t represent pure abstraction. Instead, design patterns have come to be

viewed as goals in and of themselves, which is likely the source of the antagonism

aimed at them. The ability to think in abstractions is an invaluable skill for a software

programmer to strengthen. In this section, we’ll attempt to dissuade you from viewing

Clojure features as design patterns (Norvig 1998) and instead as an inherent nameless

quality.

13.3.1 Clojure’s first-class design patterns

Most if not all of the patterns listed in the book are applicable to functional program-

ming languages in general, and to Clojure in particular. But at its most pragmatic, the

patterns described are aimed at patching deficiencies in popular object-oriented pro-

gramming languages. This practical view of design patterns isn’t directly relevant to

Clojure, because in many ways the patterns are ever-present and are first-class citizens

of the language itself. We won’t provide a comprehensive survey of the ways that Clo-

jure implements or eliminates popular design patterns but will provide enough to

make our point.

OBSERVER

Clojure’s add-watch and remove-watch functions provide the underpinnings of an

observer (publisher/subscriber) capability based on reference types. We can illustrate

this through the implementation of the simple defformula macro shown in listing 13.7.

Download from Wow! eBook <www.wowebook.com>

304

CHAPTER 13 Clojure changes the way you think

Listing 13.7 A macro to create spreadsheet-cell-like formulas

(defmacro defformula [nm bindings & formula]

`(let ~bindings

(let [formula# (agent ~@formula)

Create formula as Agent

update-fn# (fn [key# ref# o# n#]

(send formula# (fn [_#] ~@formula)))]

(doseq [r# ~(vec (take-nth 2 bindings))]

(add-watch r# :update-formula update-fn#))

Add watch to

(def ~nm formula#))))

each reference

(def h (ref 25))

(def ab (ref 100))

Create 

(defformula avg [at-bats ab hits h]

baseball formula

(float (/ @hits @at-bats)))

@avg

;=> 0.25

Update

(dosync (ref-set h 33))

dependent Ref

;=> 33

@avg

Observe 

;=> 0.33

formula change

By using watchers on references, you can use defformula to provide an abstract value

that changes when any of its parts change. A more traditional Lisp approach is to pro-

vide predefined hooks (Glickstein 1997) that are called at certain times within the

execution cycle. In addition, using proxy or gen-class to extend java.util.

Observable is the most straightforward way to wire into existing source code using the

Observer pattern.

STRATEGY

Algorithm strategies selected at runtime are common practice in Clojure, and there

are a number of ways to implement them. One such way is via continuation-passing

style, as we explored in section 7.3. A more general solution is to pass the desired

function as an argument to a higher-order function, such as you’d see in the ubiqui-

tous map, reduce, and filter functions. Further, we’ll provide a case of dynamic error

functions in the next section illustrating how Clojure’s multimethods are a more pow-

erful substitute for the classic strategy pattern.

VISITOR

The Visitor pattern is designed to describe a way to decouple operations on a struc-

ture from the structure itself. Even casual observers will see the parallel to Clojure’s

multimethods, protocols, types, proxies, and reify features.

ABSTRACT FACTORY

The Abstract Factory pattern is used to describe a way to create related objects without

having to name explicit types at the point of creation. Clojure’s types avoid the cre-

ation of explicit hierarchies (although ad hoc hierarchies can be created, as seen in

section 9.2). Therefore, in Clojure this particular usage scenario is relegated to use

within Java interoperability contexts. But the use of factory functions to abstract the

Download from Wow! eBook <www.wowebook.com>

A lack of design patterns

305

call to the constructors of types and records is idiomatic and in fact actively promoted.

The reasons for a Clojure-style factory are to simplify the importing requirements of a

type or record, and also to add additional project-specific functionality to the con-

structor (keyword arguments, default values, and so on).

INTERPRETER

The Interpreter pattern is in every way Greenspun’s Tenth Rule formalized. Many

projects of sufficient size can be well served by the inclusion of a specialized grammar

describing parts of the system itself. Clojure macros make the matter of creating spe-

cialized grammars a first-class member of the language.

BUILDER

The creation of complex structures from representation is central to Clojure pro-

gramming, although it’s viewed differently from a similar object-oriented approach—

the Builder pattern. In section 8.4, we used a simple data representation as the input

to Clojure’s clojure.xml/emit function to produce an analogous XML representa-

tion. If you preferred a different output representation, then you could write another

conversion function. If you preferred finer control over the constituent parts, then

you could write functions or multimethods for each and specialize at runtime.

FAÇADE

The use of Clojure namespaces, as seen in section 9.1, is the most obvious way to pro-

vide a simplified façade for a more complex API. You can also use the varying levels of

encapsulation (as outlined in section 2.4) for more localized façades.

ITERATOR

Iteration in Clojure is defined through an adherence to the seq protocol, as outlined

in section 5.1 and later elaborated on in sections 9.3 about types and protocols.

DEPENDENCY INJECTION

Though not a classical pattern in the Design Patterns sense, dependency injection has

become a de facto pattern for object-oriented languages that don’t allow overridable

class constructors. This condition requires that separate factory methods and/or

classes create concrete instances conforming to a given interface. In forsaking the

ability to define classes, Clojure completely avoids the problem that DI solves. Instead,

Clojure’s closest analogue to this “pattern” is the use of functions returning closures

that are specialized based on the original arguments. Likewise, you could use partial

application and composition similarly.

We could go further with this survey, but to do so would belabor the point: most of

what are known as design patterns are either invisible or trivial to implement in Clo-

jure. But what about the Prototype pattern, you ask? We implemented the UDP in sec-

tion 9.2. Decorators or chain of responsibility? Why not use a macro that returns a

function built from a list of forms spliced into the -> or ->> macro? Proxies would

likely be implemented as closures and so would commands. The list goes on and on,

and in the end you must face the inevitable—Clojure changes the way that you think.

Download from Wow! eBook <www.wowebook.com>

306

CHAPTER 13 Clojure changes the way you think

13.4 Error handling and debugging

Our goal throughout this book was to show the proper way to write Clojure code, with

mostly deferral and hand-waving regarding error handling and debugging. In this sec-

tion, we’ll cover these topics with what you might view as a unique twist, depending on

your programming background.

13.4.1 Error handling

As we showed in figure 10.7, there are two directions for handling errors. The first,

and likely most familiar, refers to the passive handling of exceptions bubbling outward