7.3 Day 2: Yoda and the Force

As a Jedi master, Yoda trained apprentices to use and understand the Force, the unifying presence between all living things. In this section, we get to the concepts fundamental to Clojure. We’ll talk about sequences, the abstraction layer that unifies all Clojure collections and ties them to Java collections. We’ll also look at lazy evaluation, the just-in-time strategy that computes sequence members only when you need them. And then we’ll look at that mystical language feature that is the Force for all Lisps, the macro.

Recursion with loop and recur

As you’ve learned in other languages in this book, functional languages depend on recursion rather than iteration. Here’s a recursive program to evaluate the size of a vector:

  (defn size [v]
  (if (empty? v)
  0
  (inc (size (rest v)))))
 
  (size [1 2 3])

It’s not hard to understand. The size of an empty list is zero; the size of another list is one more than the size of the tail. We’ve seen similar solutions for other languages throughout this book.

You’ve also learned that stacks can grow, so recursive algorithms will continue to consume memory until all memory is exhausted. Functional languages work around this limitation with tail recursion optimization. Clojure does not support implicit tail recursion optimization because of limitations of the JVM. You must explicitly recur through the use of loop and recur. Think of a loop as a let statement.

  (loop [x x-initial-value, y y-initial-value] (do-something-with x y))

Initially, given a vector, loop binds the variables in the even positions to the values in the odd positions. In fact, if you don’t specify a recur, loop works exactly like a let:

  user=> (loop [x 1] x)
  1

The function recur will invoke the loop again but this time pass new values.

Let’s refactor the size function to use recur:

  (defn size [v]
  (loop [l v, c 0]
  (if (empty? l)
  c
  (recur (rest l) (inc c)))))

In the second version of size, we’ll use the tail-recursion-optimized loop and recur. Since we won’t actually return a value, we’ll maintain the result in a variable called an accumulator. In this case, c will maintain a count.

This version works like a tail-optimized call, but we’re stuck with more kludgy lines of code. Sometimes, the JVM is a double-edged sword. If you want the community, you need to deal with the problems. But since this function is built into some basic collection APIs, you won’t often need to use recur. Also, Clojure gives you some excellent alternatives to recursion, including lazy sequences that we’ll get to later in this chapter.

With day 2’s bad news out of the way, we’re free to shift to more pleasant matters. Sequences will start to take us into some of the features that make Clojure special.

Sequences

A sequence is an implementation-independent abstraction around all the various containers in the Clojure ecosystem. Sequences wrap all Clojure collections (sets, maps, vectors, and the like), strings, and even file system structures (streams, directories). They also provide a common abstraction for Java containers, including Java collections, arrays, and strings. In general, if it supports the functions first, rest, and cons, you can wrap it in a sequence.

Earlier, when you were working with vectors, Clojure sometimes responded with a list in the console like this:

  user=> [1 2 3]
  [1 2 3]
  user=> (rest [1 2 3])
  (2 3)

Notice that we started with a vector. The result is not a list. repl actually responded with a sequence. That means we can treat all collections in a generic way. Let’s look at the common sequence library. It’s too rich and powerful to cover entirely in one section of a chapter, but I’ll try to give you a flavor of what’s available. I’m going to cover sequence functions that change, test, and create sequences, but I’m going to touch on them only briefly.

Tests

When you want to test a sequence, you will use a function called a predicate. These take a test function and a sequence and return a boolean. every?returnstrue if the test function is true for all items in the sequence:

  user=> (every? number? [1 2 3 :four])
  false

So, one of the items is not a number. some is true if the test is true for any of the items in the sequence:[21]

  (some nil? [1 2 nil])
  true

One of the items is nil. not-every? and not-any? are the inverses:

  user=> (not-every? odd? [1 3 5])
  false
  user=> (not-any? number? [:one :two :three])
  true

These behave exactly as you would expect. Let’s shift to functions that change sequences.

Changing a Sequence

The sequence library has a number of functions that transform sequences in various ways. You’ve already seen filter. To grab only the words with a length greater than four, use this:

  user=> (def words ["luke" "chewie" "han" "lando"])
  #'user/words
  user=> (filter (fn [word] (> (count word) 4)) words)
  ("chewie" "lando")

And you’ve also seen map, which calls a function on all the items in a collection and returns the results. You can build a sequence of squares of all items in a vector:

  user=> (map (fn [x] (* x x)) [1 1 2 3 5])
  (1 1 4 9 25)

The list comprehension combines maps and filters, as you saw in Erlang and Scala. Recall that a list comprehension combines multiple lists and filters, taking the possible combinations of the lists and applying the filters. First, let’s take a simple case. We have a couple of lists, colors and toys:

  user=> (def colors ["red" "blue"])
  #'user/colors
  user=> (def toys ["block" "car"])
  #'user/toys

We can apply a function to all the colors with a list comprehension, similar to the way a map works:

  user=> (for [x colors] (str "I like " x))
  ("I like red" "I like blue")

[x colors] binds x to an element from the colors list. (str "I like " x) is an arbitrary function that’s applied to every x from colors. It gets more interesting when you bind to more than one list:

  user=> (for [x colors, y toys] (str "I like " x " " y "s"))
  ("I like red blocks" "I like red cars"
  "I like blue blocks" "I like blue cars")

The comprehension created every possible combination from the two lists. We can also apply a filter with the :when keyword in the bindings:

  user=> (defn small-word? [w] (< (count w) 4))
  #'user/small-word?
  user=> (for [x colors, y toys, :when (small-word? y)]
  (str "I like " x " " y "s"))
  ("I like red cars" "I like blue cars")

We wrote a filter called small-word?. Any word that is less than four characters is small. We applied the small-word? filter to y with :when (small-word? y). We got all possibilities of (x, y), where x is a member of colors, y is a member of toys, and the size of y is less than four characters. The code is dense but expressive. That’s an ideal combination. Let’s move on.

You’ve seen foldl, foldleft, and inject in Erlang, Scala, and Ruby. In Lisp, the equivalent is reduce. To compute a quick total or factorial, use this:

  user=> (reduce + [1 2 3 4])
  10
  user=> (reduce * [1 2 3 4 5])
  120

You can sort a list:

  user=> (sort [3 1 2 4])
  (1 2 3 4)

and sort on the result of a function:

  user=> (defn abs [x] (if (< x 0) (- x) x))
  #'user/abs
  user=> (sort-by abs [-1 -4 3 2])
  (-1 2 3 -4)

We define a function called abs to compute an absolute value and then use that function in our sort. These are some of the most important sequence transformation functions in Clojure. Next, we’ll move on to functions that create sequences, but to do that, you’re going to have to get a little lazier.

Lazy Evaluation

In mathematics, infinite sequences of numbers are often easier to describe. In functional languages, you’d often like to have the same benefits, but you can’t actually compute an infinite sequence. The answer is lazy evaluation. Using this strategy, Clojure’s sequence library computes values only when they are actually consumed. In fact, most sequences are lazy. Let’s walk through creating some finite sequences and move into lazy sequences.

Finite Sequences with range

Unlike Ruby, Clojure supports ranges as functions. A range creates a sequence:

  user=> (range 1 10)
  (1 2 3 4 5 6 7 8 9)

Note that the upper bound is not inclusive. The sequence did not include 10. You can specify any increment:

  user=> (range 1 10 3)
  (1 4 7)

You don’t have to specify the lower bound if there is no increment:

  user=> (range 10)
  (0 1 2 3 4 5 6 7 8 9)

Zero is the default lower bound. The sequences created with range are finite. What if you wanted to create a sequence with no upper bound? That would be an infinite sequence. Let’s find out how.

Infinite Sequences and take

Let’s start with the most basic of infinite sequences, an infinite sequence of one repeated element. We can specify (repeat 1). If you try it in the repl, you’ll get 1s until you kill the process. Clearly, we need some way to grab only a finite subset. That function is take:

  user=> (take 3 (repeat "Use the Force, Luke"))
  ("Use the Force, Luke" "Use the Force, Luke" "Use the Force, Luke")

So, we created an infinite sequence of the repeated string "Use the Force, Luke", and then we took the first three. You can also repeat the elements in a list with cycle:

  user=> (take 5 (cycle [:lather :rinse :repeat]))
  (:lather :rinse :repeat :lather :rinse)

We’re taking the first five elements of the cycle from the vector [:lather :rinse :repeat]. Fair enough. We can drop the first few elements of a sequence as well:

  user=> (take 5 (drop 2 (cycle [:lather :rinse :repeat])))
  (:repeat :lather :rinse :repeat :lather)

Working from the inside out, we again build a cycle, drop the first two, and take the first five after that. But you don’t have to work inside out. You can use the new left-to-right operator (->>) to apply each function to a result:

  user=> (->> [:lather :rinse :repeat] (cycle) (drop 2) (take 5))
  (:repeat :lather :rinse :repeat :lather)

So, we take a vector, build a sequence with cycle, drop 2, and then take 5. Sometimes, left-to-right code is easier to read. What if you wanted to add some kind of separator between words? You’d use interpose:

  user=> (take 5 (interpose :and (cycle [:lather :rinse :repeat])))
  (:lather :and :rinse :and :repeat)

We’re taking the keyword :and and placing it between all the elements of an infinite sequence. Think of this function like a generalized version of Ruby’s join. What if you wanted an interpose that took interposing members from a sequence? That’s interleave:

  user=> (take 20 (interleave (cycle (range 2)) (cycle (range 3))))
  (0 0 1 1 0 2 1 0 0 1 1 2 0 0 1 1 0 2 1 0)

We’re interleaving two infinite sequences, (cycle (range 2)) and (cycle (range 3)). Then, we take the first 20. As you read the result, even numbers are (0 1 0 1 0 1 0 1 0 1), and odd numbers are (0 1 2 0 1 2 0 1 2 0). Beautiful.

The iterate function provides another way of creating sequences. Check out these examples:

  user=> (take 5 (iterate inc 1))
  (1 2 3 4 5)
  user=> (take 5 (iterate dec 0))
  (0 -1 -2 -3 -4)

iterate takes a function and a starting value. iterate then applies the function to the starting value, and then successive return values repeatedly. In these two examples, we called inc and dec.

Here’s an example that calculates consecutive pairs in the Fibonacci sequence. Remember, each number of a sequence is the sum of the last two. Given a pair, [a b], we can generate the next with [b, a + b]. We can generate an anonymous function to generate one pair, given the previous value, like this:

  user=> (defn fib-pair [[a b]] [b (+ a b)])
  #'user/fib-pair
  user=> (fib-pair [3 5])
  [5 8]

Next, we’ll use iterate to build an infinite sequence. Don’t execute this yet:

  (iterate fib-pair [1 1])

We’ll use map to grab the first element from all of those pairs:

  (map
  first
  (iterate fib-pair [1 1]))

That’s an infinite sequence. Now, we can take the first 5:

  user=> (take 5
  (map
  first
  (iterate fib-pair [1 1])))
  (1 1 2 3 5)

Or we can grab the number at index 500, like this:

  (nth (map first (iterate fib-pair [1 1])) 500)
  (225... more numbers ...626)

The performance is excellent. Using lazy sequences, you can often describe recursive problems like Fibonacci. Factorial is another example:

  user=> (defn factorial [n] (apply * (take n (iterate inc 1))))
  #'user/factorial
  user=> (factorial 5)
  120

We grab n elements from the infinite sequence (iterate inc 1). Then we take n of them and multiply them together with apply *. The solution is dead simple. Now that we’ve spent some time with lazy sequences, it’s time to explore new Clojure functions called defrecord and protocol.

defrecord and protocols

So far, we’ve talked about Java integration at a high level, but you haven’t seen much of the JVM bleed through to the Clojure language. When all is said and done, the JVM is about types and interfaces. (For you non-Java programmers, think of types as Java classes. Think of interfaces as Java classes without implementations.) To make Clojure integrate well with the JVM, the original implementation has a significant amount of Java in it.

As Clojure picked up more speed and began to prove itself as an effective JVM language, there was a strong thrust to implement more of Clojure in Clojure itself. To do so, Clojure developers needed a way to build platform-fast open extensions by programming to abstractions rather than to implementations. The results are defrecord for types and defprotocol, which groups functions together around a type. From a Clojure perspective, the best parts of OO are types and protocols (such as interfaces), and the worst parts are implementation inheritance. Clojure’s defrecord and defprotocol preserve the good parts of OO and leave the rest.

As this book is being written, these language features are important, but they are evolving. I’m going to lean hard on Stuart Halloway, cofounder of Relevance and author of Programming Clojure [Hal09], to help walk through a practical implementation. We’re going to go back to another functional language on the JVM, Scala. We’ll rewrite the Compass program in Clojure. Let’s get started.

First, we’ll define a protocol. A Clojure protocol is like a contract. Types of this protocol will support a specific set of functions, fields, and arguments. Here’s a protocol describing a Compass:

clojure/compass.clj
  (defprotocol Compass
  (direction [c])
  (left [c])
  (right [c]))

This protocol defines an abstraction called Compass and enumerates the functions that a Compass must support—direction, left, and right with the specified number of arguments. We are now free to implement the protocol with defrecord. Next, we’ll need the four directions:

  (def directions [:north :east :south :west])

We’ll need a function to handle a turn. Recall that our base direction is an integer, 0, 1, 2, and 3 represent :north, :east, :south, and :west, respectively. Every 1 you add to the base will move the compass ninety degrees to the right. We’ll take the remainder of the base/4 (more precisely, base/number-of-directions) so that we’ll wrap around correctly from :west to :north, like this:

  (defn turn
  [base amount]
  (rem (+ base amount) (count directions)))

The turn works, just as you’d expect. I’ll load the compass file and then use the turn functions:

  user=> (turn 1 1)
  2
  user=> (turn 3 1)
  0
  user=> (turn 2 3)
  1

That is, turning right once from :east gives :south, turning right once from :west gives :north, and turning right three times from :south gives :east.

It’s time to implement the protocol. We do that with defrecord. We’ll do that in pieces. First, we use defrecord to declare we’re implementing the protocol, like this:

  (defrecord SimpleCompass [bearing]
  Compass

We’re defining a new record called SimpleCompass. It has one field called bearing. Next, we will implement the Compass protocol, beginning with the direction function:

  (direction [_] (directions bearing))

The direction function looks up the element of directions at the bearing index. For example, (directions 3) returns :west. Each argument list has a reference to the instance (e.g., self in Ruby or this in Java), but we’re not using it, so we add _ to our argument list. Next, on to left and right:

  (left [_] (SimpleCompass. (turn bearing 3)))
  (right [_] (SimpleCompass. (turn bearing 1)))

Remember, in Clojure, we’re using immutable values. That means that turning will return a new, modified compass rather than changing the existing compass in place. Both left and right use syntax that you have not seen before. (SomeType. arg) means fire the constructor for SimpleCompass, binding arg to the first parameter. You can verify that entering (String. "new string") into the repl returns the new string "new string".

So, the left and right functions are easy. Each returns a new compass with the appropriate bearing, configured for the new bearing, using the turn function we defined earlier. right turns right ninety degrees once, and left turns right ninety degrees three times. So far, we have a type SimpleCompass that implements the Compass protocol. We just need a function that returns a string representation, but toString is a method on java.lang.Object. That’s easy enough to add to our type.

  Object
  (toString [this] (str "[" (direction this) "]")))

We then implement part of the Object protocol with the toString method, returning a string that looks like SimpleCompass [:north].

Now, the type is complete. Create a new compass:

  user=> (def c (SimpleCompass. 0))
  #'user/c

Turns return a new compass:

  user=> (left c) ; returns a new compass
  #:SimpleCompass{:bearing 3}
 
  user=> c ; original compass unchanged
  #:SimpleCompass{:bearing 0}

Notice that the old compass is unchanged. Since we’re defining a JVM type, you can access all fields as Java fields. But you can also access the fields in the type as Clojure map keywords:

  user=> (:bearing c)
  0

Because these types work like maps, you can easily prototype new types as maps and iteratively convert them to types as your design stabilizes. You can also replace types as maps in your tests as stubs or mocks. There are other benefits as well:

With defrecord and protocol, Clojure offers the ability to build native code for the JVM, without Java. This code can fully interact with other types on the JVM, including Java classes or interfaces. You can use them to subclass Java types or implement interfaces. Java classes can also build on your Clojure types. Of course, this is not the entire Java interop story, but it’s an important part. Now that you’ve learned to extend Java, let’s learn how to extend the Clojure language itself with macros.

Macros

For this section, we’re going to refer to the Io chapter. We implemented the Ruby unless in Messages. The form is (unless test form1). The function will execute form1 if the test is false. We can’t simply design a function, because each parameter will execute:

  user=> ; Broken unless
  user=> (defn unless [test body] (if (not test) body))
  #'user/unless
  user=> (unless true (println "Danger, danger Will Robinson"))
  Danger, danger Will Robinson
  nil

We discussed this problem in Io. Most languages execute parameters first and then put the results on the call stack. In this case, we don’t want to evaluate the block unless the condition is false. In Io, the language circumvented this problem by delaying the execution of the unless message. In Lisp, we can use macros. When we type (unless test body), we want Lisp to translate that to (if (not test) body). Macros to the rescue.

A Clojure program executes in two stages. Macro expansion translates all macros in the language to their expanded form. You can see what’s happening with a command called macroexpand. We’ve already used a couple of macros, called reader macros. A semicolon (;) is a comment, a single quote mark () is a quote, and a number sign (#) is an anonymous function. To prevent premature execution, we’ll put a quote in front of the expression we want to expand:

  user=> (macroexpand ''something-we-do-not-want-interpreted)
  (quote something-we-do-not-want-interpreted)
  user=> (macroexpand '#(count %))
  (fn* [p1__97] (count p1__97))

These are macros. In general, macro expansion will let you treat code like lists. If you don’t want a function to execute right away, quote it. Clojure will replace the arguments intact. Our unless will look like this:

  user=> (defmacro unless [test body]
  (list 'if (list 'not test) body))
  #'user/unless

Note that Clojure substitutes test and body without evaluating them, but we have to quote if and not. We also have to package them in lists. We’re building a list of code in the exact form that Clojure will execute. We can macroexpand it:

  user=> (macroexpand '(unless condition body))
  (if (not condition) body)

And we can execute it:

  user=> (unless true (println "No more danger, Will."))
  nil
  user=> (unless false (println "Now, THIS is The FORCE."))
  Now, THIS is The FORCE.
  nil

What we’ve done is change the base definition of the language. We are adding our own control structure, without requiring the language designers to add their own keywords. Macro expansion is perhaps the most powerful feature of Lisp, and few languages can do it. The secret sauce is the expression of data as code, not just a string. The code is already in a higher-order data structure.

Let’s wrap up day 2. There’s a lot of meat here. We should pause to use what we’ve learned.

What We Learned in Day 2

It’s been another packed day. You’ve added a tremendous set of abstractions to your expanding bag of tricks. Let’s review.

First, we learned to use recursion. Since the JVM doesn’t support tail-recursion optimization, we had to use loop and recur. That looping construct allowed us to implement many algorithms you would usually implement with recursive function calls, though the syntax was more invasive.

We also used sequences. With them, Clojure encapsulates access to all of its collections. With a common library, we could apply common strategies for dealing with collections. We used different functions to mutate, transform, and search sequences. Higher-order functions added power and simplicity to the sequence libraries.

With lazy sequences, we were able to add another powerful layer to sequences. Lazy sequences simplified algorithms. They also offered delayed execution, potentially significantly improving performance and loosening coupling.

Next, we spent some time implementing types. With defrecord and protocols, we were able to implement types that were full citizens on the JVM.

Finally, we used macros to add features to the language. We learned that there is a step, called macro expansion, that occurs before Clojure implements or interprets code. We implemented unless by using an if function within macro expansion.

There’s a lot to digest. Take some time to use what you’ve learned.

Day 2 Self-Study

This day was packed with some of the most sophisticated and powerful elements of the Clojure language. Take some time to explore and understand those features.

Find:

Do: