Seven Languages in Seven Weeks

7.4 Day 3: An Eye for Evil

In Star Wars, Yoda was the first to detect the evil in Darth Vader. With Clojure, Rich Hickey has identified the core problems that plague the development of concurrent object-oriented systems. We’ve said frequently that mutable state is the evil that lurks in the hearts of object-oriented programs. We’ve shown several different approaches to handling mutable state. Io and Scala used the actor-based model and provided immutable constructs that gave the programmer the power to solve those problems without mutable state. Erlang provided actors with lightweight processes, and a virtual machine that allowed effective monitoring and communication, allowing unprecedented reliability. The Clojure approach to concurrency is different. It uses software transactional memory (STM). In this section, we’ll look at STM and also several tools to share state across threaded applications.

References and Transactional Memory

Databases use transactions to ensure data integrity. Modern databases use at least two types of concurrency control. Locks prevent two competing transactions from accessing the same row at the same time. Versioning uses multiple versions to allow each transaction to have a private copy of its data. If any transaction interferes with another, the database engine simply reruns that transaction.

Languages like Java use locking to protect the resources of one thread from competing threads that might corrupt them. Locking basically puts the burden of concurrency control on the programmer. We are rapidly learning that this burden is too much to bear.

Languages like Clojure use software transactional memory (STM). This strategy uses multiple versions to maintain consistency and integrity. Unlike Scala, Ruby, or Io, when you want to change the state of a reference in Clojure, you must do so within the scope of a transaction. Let’s see how it works.

References

In Clojure, a ref (short for reference) is a wrapped piece of data. All access must conform to specified rules. In this case, the rules are to support STM. You cannot change a reference outside of a transaction. To see how it works, let’s create a reference:

	`user=> (ref "Attack of the Clones")`
	`#<Ref@ffdadcd: "Attack of the Clones">`

That’s not too interesting. We should assign the reference to a value, like this:

	`user=> (def movie (ref "Star Wars"))`
	`#'user/movie`

You can get the reference back, like this:

	`user=> movie`
	`#<Ref@579d75ee: "Star Wars">`

But we’re really worried about the value in the reference. Use deref:

	`user=> (deref movie)`
	`"Star Wars"`

Or, you can use the short form of deref:

	`user=> @movie`
	`"Star Wars"`

That’s better. Now, we can easily access the value within our reference. We haven’t tried to change the state of the reference yet. Let’s try. With Clojure, we’ll send a function that will mutate the value. The dereferenced ref will be passed as the first argument of the function:

	`user=> (alter movie str ": The Empire Strikes Back")`
	`java.lang.IllegalStateException: No transaction running (NO_SOURCE_FILE:0)`

As you can see, you can mutate state only within a transaction. Do so with the dosync function. The preferred way to modify a reference is to alter it with some transforming function, like this:

	`user=> (dosync (alter movie str ": The Empire Strikes Back"))`
	`"Star Wars: The Empire Strikes Back"`

We could also set some initial value with ref-set:

	`user=> (dosync (ref-set movie "Star Wars: The Revenge of the Sith"))`
	`"Star Wars: The Revenge of the Sith"`

You can see that the reference changed:

	`user=> @movie`
	`"Star Wars: The Revenge of the Sith"`

That’s what we expected. The reference is different. It may seem painful to modify mutable variables in this way, but Clojure is enforcing a little policy now to save a lot of pain later. We know that programs that behave in this manner will absolutely execute correctly, with respect to race conditions and deadlock. Most of our code will use functional paradigms, and we’ll save STM for the problems that could benefit the most from mutability.

Working with Atoms

If you want thread safety for a single reference, uncoordinated with any other activity, then you can use atoms. These data elements allow change outside the context of a transaction. Like a reference, a Clojure atom is an encapsulated bit of state. Let’s try it. Create an atom:

	`user=> (atom "Split at your own risk.")`
	`#<Atom@53f64158: "Split at your own risk.">`

Now, bind an atom:

	`user=> (def danger (atom "Split at your own risk."))`
	`#'user/danger`
	`user=> danger`
	`#<Atom@3a56860b: "Split at your own risk.">`
	`user=> @danger`
	`"Split at your own risk."`

You can bind danger to a new value with reset!:

	`user=> (reset! danger "Split with impunity")`
	`"Split with impunity"`
	`user=> danger`
	`#<Atom@455fc40c: "Split with impunity">`
	`user=> @danger`
	`"Split with impunity"`

reset! replaces the entire atom, but the preferred usage is to provide a function to transform the atom. If you’re changing a large vector, you can modify an atom in place with swap! like this:

	`user=> (def top-sellers (atom []))`
	`#'user/top-sellers`
	`user=> (swap! top-sellers conj {:title "Seven Languages", :author "Tate"})`
	`[{:title "Seven Languages in Seven Weeks", :author "Tate"}]`
	`user=> (swap! top-sellers conj {:title "Programming Clojure" :author "Halloway"})`
	`[{:title "Seven Languages in Seven Weeks", :author "Tate"}`
	`{:title "Programming Clojure", :author "Halloway"}]`

As with a reference, you’ll want to create a value once and then change that value with swap!. Let’s look at a practical example.

Building an Atom Cache

Now, you’ve seen both references and atoms. You’ll see the same general philosophy when we work with Haskell. You wrap a bit of state in a package that you can later mutate with functions. While references required transactions, atoms do not. Let’s build a simple atom cache. It’s a perfect problem for an atom. We’ll simply use hashes to associate names with values. This example is provided courtesy of Stuart Halloway of Relevance,[22] a consultancy that provides Clojure training and consulting.

We’ll need to create the cache, and then we’ll need functions to add elements to the cache and remove elements from the cache. First, we’ll create the cache:

clojure/atomcache.clj
	`(defn create`
	`[]`
	`(atom {}))`

We’re simply creating an atom. We’ll let the client of this class bind it. Next, we need to be able to get a cache key:

	`(defn get`
	`[cache key]`
	`(@cache key))`

We take the cache and a key as arguments. The cache is an atom, so we dereference it and return the item associated with the key. Finally, we need to put an item in the cache:

	`(defn put`
	`([cache value-map]`
	`(swap! cache merge value-map))`
	`([cache key value]`
	`(swap! cache assoc key value)))`

We defined two different functions called put. The first version uses merge to allow us to add all of the associations in a map to our cache. The second version uses assoc to add a key and value. Here’s the cache in use. We add an item to the cache and then return it:

	`(def ac (create))`
	`(put ac :quote "I'm your father, Luke.")`
	`(println (str "Cached item: " (get ac :quote)))`

And the output:

Cached item: I'm your father, Luke.

Atoms and refs are simple and safe ways to handle mutable state, synchronously. In the next few sections, we’ll look at a couple of asynchronous examples.

Working with Agents

Like an atom, an agent is a wrapped piece of data. Like an Io future, the state of a dereferenced agent will block until a value is available. Users can mutate the data asynchronously using functions, and the updates will occur in another thread. Only one function can mutate the state of an agent at a time.

Give it a try. Let’s define a function called twice that doubles the value of whatever you pass in:

	`user=> (defn twice [x] (* 2 x))`
	`#'user/twice`

Next, we’ll define an agent called tribbles that has an initial value of one:

	`user=> (def tribbles (agent 1))`
	`#'user/tribbles`

Now, we can mutate tribbles by sending the agent a value:

	`user=> (send tribbles twice)`
	`#<Agent@554d7745: 1>`

This function will run in another thread. Let’s get the value of the agent:

	`user=> @tribbles`
	`2`

Reading a value from a ref, agent, or atom will never lock and never block. Reads should be fast, and with the right abstractions around them, they can be. With this function, you can see the difference in the values that you read from each agent:

	`user=> (defn slow-twice [x]`
	`(do`
	`(Thread/sleep 5000)`
	`(* 2 x)))`
	`#'user/slow-twice`
	`user=> @tribbles`
	`2`
	`user=> (send tribbles slow-twice)`
	`#<Agent@554d7745: 16>`
	`user=> @tribbles`
	`2`
	`user=> ; do this five seconds later`
	`user=> @tribbles`
	`4`

Don’t get hung up in the syntax. (Thread/sleep 5000) simply invokes Java’s sleep method on Thread. For now, focus on the value of the agent. We defined a slower version of twice that took five seconds. That was enough time to see the differences in @tribbles over time in the repl.

So, you will get a value of tribbles. You might not get the latest changes from your own thread. If you want to be sure to get the latest value with respect to your own thread, you can call (await tribbles) or (await-for timeout tribbles), where timeout is a timeout in milliseconds. Keep in mind that await and await-for block only until actions from your thread are dispatched. This says nothing about what other threads may have asked the thread to do. If you think you want the latest value of something, you have already failed. Clojure’s tools involve working with a snapshot whose value is instantaneous and potentially out-of-date immediately. That’s exactly how versioning databases work for fast concurrency control.

Futures

In Java, you would start threads directly to solve a specific task. Certainly, you can use Java integration to start a thread in this way, but there’s often a better way. Say you wanted to create a thread to handle a complex computation around a bit of encapsulated state. You could use an agent. Or say you wanted to start the computation of a value, but you did not want to await the result. As with Io, you could use a future. Let’s take a look.

First, let’s create a future. The future returns a reference immediately:

	`user=> (def finer-things (future (Thread/sleep 5000) "take time"))`
	`#'user/finer-things`
	`user=> @finer-things`
	`"take time"`

Depending on how fast you type, you may have had to wait for the result. A future takes a body of one or more expressions, returning the value of the last expression. The future starts in another thread. If you dereference it, the future will block until the value becomes available.

So, a future is a concurrency construct that allows an asynchronous return before computation is complete. We can use futures to allow several long-running functions to run in parallel.

What We’ve Missed

Clojure is a Lisp, which is an extremely rich language in its own right. It’s based on the JVM, which has more than a decade of development. The language also mixes in some new and powerful concepts. It would be impossible to cover Clojure in one chapter of a book. There are some pieces that you should know about.

Metadata

Sometimes, it’s nice to associate metadata to a type. Clojure allows you to attach and access metadata on both symbols and collections. (with-meta value metadata) gives you a new value associated with the metadata, usually implemented as a map.

Java Integration

Clojure has excellent Java integration. We touched on Java integration very loosely, and we also built a type on the JVM. We did not use the existing Java libraries at all. We also did not extensively cover the Java compatibility forms. For example, (.toUpperCase "Fred") calls the .toUpperCase member function on the string "Fred".

Multimethods

Object-oriented languages allow one style of organization for behavior and data. Clojure allows you to build your own code organization with multimethods. You can associate a library of functions with a type. You can also implement polymorphism by using multimethods to do method dispatch based on type, metadata, arguments, and even attributes. The concept is powerful and extremely flexible. You could implement, for example, Java-style inheritance, prototype inheritance, or something entirely different.

Thread State

Clojure offers atoms, refs, and agents for various concurrency models. Sometimes, you need to store data per thread instance. Clojure allows you to do so quite simply with vars. For example, (binding [name "value"] ...) would bind name to "value" only for the current thread.

What We Learned in Day 3

Today, we walked through the concurrency structures. We encountered several interesting concurrency constructs along the way.

Refs allowed us to implement mutable state while maintaining consistency across threads. We used STM, or software transactional memory. For our part, we placed all mutations to refs within transactions, expressed using a dosync function.

Next, we used atoms, lightweight concurrency constructs with less protection but a simpler usage model. We modified an atom outside of a transaction.

Finally, we used agents to implement a pool that could be used to do long-running computations. Agents were different from Io actors, because we could mutate the value of the agent with an arbitrary function. Agents also returned a snapshot in time, a value that may be changed at any time.

Day 3 Self-Study

On day 2, your focus was on advanced programming abstractions. Day 3 brought the concurrency constructs of Clojure. In these exercises, you’ll put some of what you’ve learned to the test.

Find:

A queue implementation that blocks when the queue is empty and waits for a new item in the queue

Do:

Use refs to create a vector of accounts in memory. Create debit and credit functions to change the balance of an account.

In this section, I’m going to outline a single problem called sleeping barber. It was created by Edsger Dijkstra in 1965. It has these characteristics:

A barber shop takes customers.
Customers arrive at random intervals, from ten to thirty milliseconds.
The barber shop has three chairs in the waiting room.
The barber shop has one barber and one barber chair.
When the barber’s chair is empty, a customer sits in the chair, wakes up the barber, and gets a haircut.
If the chairs are occupied, all new customers will turn away.
Haircuts take twenty milliseconds.
After a customer receives a haircut, he gets up and leaves.

Write a multithreaded program to determine how many haircuts a barber can give in ten seconds.