Scala offers a a few powerful abstractions for working with sequences of values. Functional programming style encourages moving from mutable variables and state changes to declarative lists of all known states. Higher order functions like map, flatMap, and filter can very succinctly represent complex requirements.
For example, suppose you were writing a search engine and a user asked for the 10th document about Scala in a database of 10,0000 HTML documents. In a functional style, you might declare the list of all documents, compute the score for each one, and then pick the 10th item. This might look like something like this:
documents.filter(isAboutScala)(9)
The problem is that if you've got 10,000 big HTML documents, you'll need to first ask each one of 10,000 if they're about Scala to build the filtered list -- only to take the 10th document. Clearly this overhead won't scale well!
One solution is to use Scala's Stream construct; which is like a List which lazily evaluates the next item only when needed. Using a Stream would only evaluate enough documents to find the 10th match - instead of first evaluating the 10,000 documents.
For a full introduction to Streams, see the chapter "Computing with Streams" in "Scala By Example".
So how do Streams work? How is the lazy evaluation defined? Looking in to the implementation of Stream reveals many nice things about the Scala language and some of the philosophies adopted for building libraries.
The syntax for constructing a Stream requires using Stream.cons and Stream.empty:
Stream.cons(3, Stream.cons(4, Stream.empty))
Upon initial inspection, it looks like cons is a method call on the Stream object. It turns out cons is actually a nested object in the Stream object.
Stream.cons is an object, not a method on the Stream object
Looking at Stream.scala in the Scala libraries source shows this structure:
object Stream { object cons { // ... definition of Stream.cons object } // ... rest of Stream }
This nesting of objects is also possible in Java. However, Scala's use of the apply() function (which we'll look at below) means that a method invocation on the nested cons object looks like a method call on the Stream object.
Syntactic sugar and the apply() method
Recall that a Scala object may define a method apply() like so:
object List { def apply[A](xs: A*): List[A] = xs.toList // ... }
This means you can call the List object to create the list of items 1, 2, and 3 with this syntax:
val usingApply = List.apply(1, 2, 3)
or using Scala syntactic sugar:
val usingSugar = List(1, 2, 3)
For me, the combination of apply()'s syntactic sugar and nested singleton objects begin to describe the 'scalable language' idea that the Scala literature talks about so frequently. What looks like a method call is really a call to a nested object's apply method, but the syntax reads identically. The end user doesn't need to know if you're talking to a nested singleton object in a library class; it is no different than a standard function call.
But how does Stream's lazy evaluation work?
Call by name evaluation
Let's look again at Stream.cons' apply method signature:
def apply[A](head : A, tail : => Stream[A]) : Stream[A]
Note the => operator ; this indicates call-by-name evaluation. This is a form of lazy argument evaluation; the language will only evaluate tail if it needs to.
Thus, Scala allows you to construct an expensive list of items worry-free; the runtime will evaluate the Stream's items one-by-one until the specified conditions are met -- and will go no further.
To see more examples of Stream in action, check out this article about solving Project Euler problems in Scala.
The simpler way to write
Stream.cons(3, Stream.cons(4, Stream.empty))
is
3 #:: 4 #:: Stream.empty
Which is a fact that's very handy to remember when you're trying to build infinite lists:
val fibs: Stream[Int] = 1 #:: 1 #:: (fibs zip fibs.tail).map{ case (a,b) => a + b }
Lists can be built in the same way by dropping the #:
1 :: 2 :: 3 :: Nil
Posted by: Benabik | November 15, 2010 at 12:51 AM