Just before the climax of Edward Scissorhands, Edward learns to wield his scissors as an artist in everyday life. He molds shrubs into dinosaurs, crafts spectacular hair with the effortless skill of Vidal Sassoon, and even carves the family roast. With Scala, we’ve encountered some awkward moments, but when this language feels right, it’s borderline spectacular. Hard things, like XML and concurrency, become almost routine. Let’s take a look.
Modern programming problems meet Extensible Markup Language (XML) with increasing regularity. Scala takes the dramatic step of elevating XML to a first-class programming construct of the language. You can express XML just as easily as you do any string:
scala> val movies =
|
|
| <movies>
|
|
| <movie genre="action">Pirates of the Caribbean</movie>
|
|
| <movie genre="fairytale">Edward Scissorhands</movie>
|
|
| </movies>
|
|
movies: scala.xml.Elem =
|
|
<movies>
|
|
<movie genre="action">Pirates of the Caribbean</movie>
|
|
<movie genre="fairytale">Edward Scissorhands</movie>
|
|
</movies>
|
After you’ve defined the movies variable with XML, you can access different elements directly.
For example, to see all the inner text, you would simply type this:
scala> movies.text
|
|
res1: String =
|
|
|
|
Pirates of the Caribbean
|
|
Edward Scissorhands
|
You see all the inner text from the previous example. But we’re not limited to working with the whole block at once. We can be more selective. Scala builds in a query language that’s much like XPath, an XML search language. But since the // keyword in Scala is a comment, Scala will use \ and \\. To search the top-level nodes, you’d use one backslash, like this:
scala> val movieNodes = movies \ "movie"
|
|
movieNodes: scala.xml.NodeSeq =
|
|
<movie genre="action">Pirates of the Caribbean</movie>
|
|
<movie genre="fairytale">Edward Scissorhands</movie>
|
In that search, we looked for XML movie elements. You can find individual nodes by index:
scala> movieNodes(0)
|
|
res3: scala.xml.Node = <movie genre="action">Pirates of the Caribbean</movie>
|
We just found element number zero, or Pirates of the Caribbean. You can also look for attributes of individual XML nodes by using the @ symbol. For example, to find the genre attribute of the first element in the document, we’d do this search:
scala> movieNodes(0) \ "@genre"
|
|
res4: scala.xml.NodeSeq = action
|
This example just scratches the surface with what you can do, but you get the idea. If we mix in Prolog-style pattern matching, things get a little more exciting. Next, we’ll walk through an example of pattern matching with simple strings.
Pattern matching lets you conditionally execute code based on some piece of data. Scala will use pattern matching often, such as when you parse XML or pass messages between threads.
Here’s the simplest form of pattern matching:
| scala/chores.scala | |
def doChore(chore: String): String = chore match {
|
|
case "clean dishes" => "scrub, dry"
|
|
case "cook dinner" => "chop, sizzle"
|
|
case _ => "whine, complain"
|
|
}
|
|
println(doChore("clean dishes"))
|
|
println(doChore("mow lawn"))
|
|
We define two chores, clean dishes and cook dinner. Next to each chore, we have a code block. In this case, the code blocks simply return strings. The last chore we define is _, a wildcard. Scala executes the code block associated with the first matching chore, returning “whine, complain” if neither chore matches, like this:
>> scala chores.scala
|
|
scrub, dry
|
|
whine, complain
|
Pattern matching has some embellishments too. In Prolog, the pattern matching often had associated conditions. To implement a factorial in Scala, we specify a condition in a guard for each match statement:
| scala/factorial.scala | |
def factorial(n: Int): Int = n match {
|
|
case 0 => 1
|
|
case x if x > 0 => factorial(n - 1) * n
|
|
}
|
|
|
|
println(factorial(3))
|
|
println(factorial(0))
|
|
The first pattern match is a 0, but the second guard has the form case x if x > 0. It matches any x for x > 0. You can specify a wide variety of conditions in this way. Pattern matching can also match regular expressions and types. You’ll see an example later that defines empty classes and uses them as messages in our concurrency examples later.
Scala has first-class regular expressions. The .r method on a string can translate any string to a regular expression. On the next page is an example of a regular expression that can match uppercase or lowercase F at the beginning of a string.
scala> val reg = """^(F|f)\w*""".r
|
|
reg: scala.util.matching.Regex = ^(F|f)\w*
|
|
|
|
scala> println(reg.findFirstIn("Fantastic"))
|
|
Some(Fantastic)
|
|
|
|
scala> println(reg.findFirstIn("not Fantastic"))
|
|
None
|
We start with a simple string. We use the """ delimited form of a string, allowing multiline string and eliminating evaluation. The .r method converts the string to a regular expression. We then use the method findFirstIn to find the first occurrence.
scala> val reg = "the".r
|
|
reg: scala.util.matching.Regex = the
|
|
scala> reg.findAllIn("the way the scissors trim the hair and the shrubs")
|
|
res9: scala.util.matching.Regex.MatchIterator = non-empty iterator
|
In this example, we build a regular expression and use the findAllIn method to find all occurrences of the word the in the string "the way the scissors trim the hair and the shrubs". If we wanted, we could iterate through the entire list of matches with foreach. That’s really all there is to it. You can match with regular expressions just as you would use a string.
An interesting combination in Scala is the XML syntax in combination with pattern matching. You can go through an XML file and conditionally execute code based on the various XML elements that come back. For example, consider the following XML movies file:
| scala/movies.scala | |
val movies = <movies>
|
|
<movie>The Incredibles</movie>
|
|
<movie>WALL E</movie>
|
|
<short>Jack Jack Attack</short>
|
|
<short>Geri's Game</short>
|
|
</movies>
|
|
|
|
(movies \ "_").foreach { movie =>
|
|
movie match {
|
|
case <movie>{movieName}</movie> => println(movieName)
|
|
case <short>{shortName}</short> => println(shortName + " (short)")
|
|
}
|
|
}
|
|
It queries for all nodes in the tree. Then, it uses pattern matching to match shorts and movies. I like the way Scala makes the most common tasks trivial by working in XML syntax, pattern matching, and the XQuery-like language. The result is almost effortless.
So, that’s a basic tour of pattern matching. You’ll see it in practice in the concurrency section next.
One of the most important aspects of Scala is the way it handles concurrency. The primary constructs are actors and message passing. Actors have pools of threads and queues. When you send a message to an actor (using the ! operator), you place an object on its queue. The actor reads the message and takes action. Often, the actor uses a pattern matcher to detect the message and perform the appropriate message. Consider the kids program:
| scala/kids.scala | |
import scala.actors._
|
|
import scala.actors.Actor._
|
|
|
|
case object Poke
|
|
case object Feed
|
|
|
|
class Kid() extends Actor {
|
|
def act() {
|
|
loop {
|
|
react {
|
|
case Poke => {
|
|
println("Ow...")
|
|
println("Quit it...")
|
|
}
|
|
case Feed => {
|
|
println("Gurgle...")
|
|
println("Burp...")
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
val bart = new Kid().start
|
|
val lisa = new Kid().start
|
|
println("Ready to poke and feed...")
|
|
bart ! Poke
|
|
lisa ! Poke
|
|
bart ! Feed
|
|
lisa ! Feed
|
|
In this program, we create two empty, trivial singletons called Poke and Feed. They don’t do anything. They simply serve as messages. The meat of the program is the Kid class. Kid is an actor, meaning it will run from a pool of threads and get messages in a queue. It will process each message and move on to the next. We start a simple loop. Within that is a react construct. react receives an actor’s messages. The pattern match lets us match the appropriate message, which will always be Poke or Feed.
The rest of the script creates a couple of kids and manipulates them by sending them Poke or Feed messages. You can run it like this:
batate$ scala code/scala/kids.scala
|
|
Ready to poke and feed...
|
|
Ow...
|
|
Quit it...
|
|
Ow...
|
|
Quit it...
|
|
Gurgle...
|
|
Burp...
|
|
Gurgle...
|
|
Burp...
|
|
|
|
batate$ scala code/scala/kids.scala
|
|
Ready to poke and feed...
|
|
Ow...
|
|
Quit it...
|
|
Gurgle...
|
|
Burp...
|
|
Ow...
|
|
Quit it...
|
|
Gurgle...
|
|
Burp...
|
I run the application a couple of times to show that it is actually concurrent. Notice that the order is different. With actors, you can also react with a timeout (reactWithin), which will time out if you don’t receive the message within the specified time. Additionally, you can use receive (which blocks a thread) and receiveWithin (which blocks a thread with a timeout).
Since there’s only a limited market for simulated Simpsons, let’s do something a little more robust. In this application called sizer, we’re computing the size of web pages. We hit a few pages and then compute the size. Since there’s a lot of waiting time, we would like to get all of the pages concurrently using actors. Take a look at the overall program, and then we’ll look at some individual sections:
| scala/sizer.scala | |
import scala.io._
|
|
import scala.actors._
|
|
import Actor._
|
|
|
|
object PageLoader {
|
|
def getPageSize(url : String) = Source.fromURL(url).mkString.length
|
|
}
|
|
|
|
val urls = List("http://www.amazon.com/",
|
|
"http://www.twitter.com/",
|
|
"http://www.google.com/",
|
|
"http://www.cnn.com/" )
|
|
|
|
def timeMethod(method: () => Unit) = {
|
|
val start = System.nanoTime
|
|
method()
|
|
val end = System.nanoTime
|
|
println("Method took " + (end - start)/1000000000.0 + " seconds.")
|
|
}
|
|
|
|
def getPageSizeSequentially() = {
|
|
for(url <- urls) {
|
|
println("Size for " + url + ": " + PageLoader.getPageSize(url))
|
|
}
|
|
}
|
|
|
|
def getPageSizeConcurrently() = {
|
|
val caller = self
|
|
|
|
for(url <- urls) {
|
|
actor { caller ! (url, PageLoader.getPageSize(url)) }
|
|
}
|
|
|
|
for(i <- 1 to urls.size) {
|
|
receive {
|
|
case (url, size) =>
|
|
println("Size for " + url + ": " + size)
|
|
}
|
|
}
|
|
}
|
|
|
|
println("Sequential run:")
|
|
timeMethod { getPageSizeSequentially }
|
|
|
|
println("Concurrent run")
|
|
timeMethod { getPageSizeConcurrently }
|
|
So, let’s start at the top. We do a few basic imports to load the libraries for actors and io so we can do concurrency and HTTP requests. Next, we will compute the size of a page, given a URL:
object PageLoader {
|
|
def getPageSize(url : String) = Source.fromURL(url).mkString.length
|
|
}
|
Next, we create a val with a few URLs. After that, we build a method to time each web request:
def timeMethod(method: () => Unit) = {
|
|
val start = System.nanoTime
|
|
method()
|
|
val end = System.nanoTime
|
|
println("Method took " + (end - start)/1000000000.0 + " seconds.")
|
|
}
|
Then, we do the web requests with two different methods. The first is sequentially, where we iterate through each request in a forEach loop.
def getPageSizeSequentially() = {
|
|
for(url <- urls) {
|
|
println("Size for " + url + ": " + PageLoader.getPageSize(url))
|
|
}
|
|
}
|
Here’s the method to do things asynchronously:
def getPageSizeConcurrently() = {
|
|
val caller = self
|
|
|
|
for(url <- urls) {
|
|
actor { caller ! (url, PageLoader.getPageSize(url)) }
|
|
}
|
|
|
|
for(i <- 1 to urls.size) {
|
|
receive {
|
|
case (url, size) =>
|
|
println("Size for " + url + ": " + size)
|
|
}
|
|
}
|
|
}
|
In this actor, we know we’ll be receiving a fixed set of messages. Within a forEach loop, we send four asynchronous requests. This happens more or less instantly. Next, we simply receive four messages with receive. This method is where the real work happens. Finally, we’re ready to run the script that invokes the test:
println("Sequential run:")
|
|
timeMethod { getPageSizeSequentially }
|
|
|
|
println("Concurrent run")
|
|
timeMethod { getPageSizeConcurrently }
|
And here’s the output:
>> scala sizer.scala
|
|
Sequential run:
|
|
Size for http://www.amazon.com/: 81002
|
|
Size for http://www.twitter.com/: 43640
|
|
Size for http://www.google.com/: 8076
|
|
Size for http://www.cnn.com/: 100739
|
|
Method took 6.707612 seconds.
|
|
Concurrent run
|
|
Size for http://www.google.com/: 8076
|
|
Size for http://www.cnn.com/: 100739
|
|
Size for http://www.amazon.com/: 84600
|
|
Size for http://www.twitter.com/: 44158
|
|
Method took 3.969936 seconds.
|
As expected, the concurrent loop is faster. That’s an overview of an interesting problem in Scala. Let’s review what we learned.
What day 3 lacked in size, it made up in intensity. We built a couple of different concurrent programs and worked in direct XML processing, distributed message passing with actors, pattern matching, and regular expressions.
Over the course of the chapter, we learned four fundamental constructs that built on one another. First, we learned to use XML directly in Scala. We could query for individual elements or attributes using an XQuery-like syntax.
We then introduced Scala’s version of pattern matching. At first, it looked like a simple case statement, but as we introduced guards, types, and regular expressions, their power became readily apparent.
Next, we shifted to concurrency. We used the actor concept. Actors are objects built for concurrency. They usually have a loop statement wrapped around a react or receive method, which does the dirty work of receiving queued messages to the object. Finally, we had an inner pattern match. We used raw classes as messages. They are small, light, robust, and easy to manipulate. If we needed parameters within the message, we could just add naked attributes to our class definitions, as we did with the URL within the sizer application.
Like all of the languages in this book, Scala is far more robust than you’ve seen here. The interaction with Java classes is far deeper than I’ve shown you here, and I’ve merely scratched the surface on complex concepts such as currying. But you have a good foundation should you choose to explore further.
So, now you’ve seen some of the advanced features Scala has to offer. Now, you can try to put Scala through its paces yourself. As always, these exercises are more demanding.
Find.
For the sizer program, what would happen if you did not create a new actor for each link you wanted to follow? What would happen to the performance of the application?
Do:
Take the sizer application and add a message to count the number of links on the page.
Bonus problem: Make the sizer follow the links on a given page, and load them as well. For example, a sizer for “google.com” would compute the size for Google and all of the pages it links to.