Tag Archives: fp

A small benchmark util for Scala continued

Extending the benchmark

Note: This is a follow-up post of a previous post. You might want to read the first to get a simpler view of the extensions added here.

One improvement I wanted to add to the previous benchmark util was monitors which generate statistics for the task being benchmarked. I chose traits for implementing them. That way I’ll be able to adapt the monitor using inheritance. A Monitor is simply defined as having a begin() and end() method, where end() returns a sequence of statistical data. The type of data returned is defined by the type parameter of Monitor.

In addition to the Monitor trait the previous code get extended with a newMonitor() argument for instantiating them:

case class TaskDone[Id,Stats](id: Id, run: Int, stats: Seq[Stats])
case class TaskFail[Id,Stats](id: Id, run: Int, stats: Seq[Stats], error: Throwable)
case class BenchDone[Id,Stats](warmups: Seq[Task[Id,Stats]], runs: Seq[Task[Id,Stats]])
 
trait Monitor[Stats] {
  private[this] var step = 0
  def begin(): Unit = {
    assert(step == 0, "Improperly handled monitor.")
    step = 1
  }
  def end(): Seq[Stats] = {
    assert(step == 1, "Improperly handled monitor.")
    step = 2
    Nil
  }
}
 
type Report[Result]  = Result => Unit
type Make[Value]     = () => Value
type Task[Id,Stats]  = Either[TaskFail[Id,Stats],TaskDone[Id,Stats]]
type Bench[Id,Stats] = BenchDone[Id,Stats]
type Batch[Id,Stats] = Seq[Bench[Id,Stats]]
 
def task[Id,Stats]
    (id: Id, fn: () => Unit)
    (run: Int)
    (newMonitor: Make[Monitor[Stats]])
    (report: Report[Task[Id,Stats]]): Task[Id,Stats] = {
  val monitor = newMonitor()
  monitor.begin()
  val result = try {
    fn()
    Right(TaskDone(id=id,run=run,stats=monitor.end()))
  } catch {
    case exp: Throwable =>
      Left(TaskFail(id=id,run=run,stats=monitor.end(),error=exp))
  }
  report(result)
  result
}
 
def benchmark[Id,Stats]
    (task: Int => Make[Monitor[Stats]] => Report[Task[Id,Stats]] => Task[Id,Stats])
    (runs: Int, warmups: Int)
    (newMonitor: Make[Monitor[Stats]])
    (warmupReport: Report[Task[Id,Stats]], runReport: Report[Task[Id,Stats]])
    (benchReport: Report[Bench[Id,Stats]]): Bench[Id,Stats] = {
  assert(runs > 0, "Number of runs must be greater than zero.")
  assert(warmups >= 0, "Number of warmups must be zero or greater.")
 
  val (warmupsResults,runsResults) = (
    (1 to warmups) map (run => task(run)(newMonitor)(warmupReport)),
    (1 to runs) map (run => task(run)(newMonitor)(runReport))
  )
  val result = BenchDone(warmups=warmupsResults,runs=runsResults)
  benchReport(result)
  result
}
 
def batch[Id,Stats]
    (benchs: Seq[Report[Bench[Id,Stats]] => Bench[Id,Stats]])
    (benchReport: Report[Bench[Id,Stats]])
    (batchReport: Report[Batch[Id,Stats]]) {
  val result = benchs map (bench => bench(benchReport))
  batchReport(result)
}

The revised diagram sketch now visualizes how statistics generated by the monitor is fed back to reporters:

Example usage

In this revised example we need to define some statistical datatypes. These will be TaskTime and TaskMem which collect time and memory usage. They are case classes extended from the sealed trait TaskStats which will allow us to pattern match them.

type Id = String
 
case class MemStats(free: Long, total: Long, max: Long) {
  def used(): Long = total - free
}
 
sealed trait TaskStats
 
case class TaskTime(duration: Long) extends TaskStats
 
case class TaskMem(samples: Seq[MemStats]) extends TaskStats {
  def avgUsed(): Long = {
    val totals = samples map (mem => mem.used)
    if (totals.nonEmpty) totals.sum / totals.length else 0
  }
  def minUsed(): Long = {
    samples.collect{ case mem: MemStats => mem.used }.min
  }
  def maxUsed(): Long = {
    samples.collect{ case mem: MemStats => mem.used }.max
  }
}

For utility we add some more convenience methods for text conversion and calculating statistics of memory usage.

def timeToText(value: Long): String = {
  def pad(n: Int, value: Long): String =
    value.toString.reverse.padTo(n,"0").reverse.mkString
  // utility for turning nanoseconds into text
  val ms      = (value / 1000000L).toLong
  val msPart  = ms % 1000L
  val sec     = (ms - msPart) / 1000L
  val secPart = sec % 60L
  val min     = (sec - secPart) / 60L
  val minPart = min
  if (minPart > 0) minPart + "m" + pad(2,secPart) + "s" + pad(3,msPart) + "ms"
  else if (secPart > 0)  secPart + "s" + pad(3,msPart) + "ms"
  else msPart + "ms"
}
 
def memToText(value: Long): String = {
  def pad(n: Int, value: Long): String =
    value.toString.reverse.padTo(n,"0").reverse.mkString
  // utility for turning bytes into text
  val mb     = value / (1024L * 1024L)
  val mbPart = mb % 1024L
  val gb     = (mb - mbPart) / 1024L
  val gbPart = gb
  if (gbPart > 0) gbPart + "gb" + pad(3,mbPart) + "mb"
  else mbPart + "mb"
}
 
def taskId[Id](bench: Bench[Id,TaskStats]): Id = {
  // utility for extracting task id
  val firstDone = bench.runs find (result => result.isRight)
  val firstFail = bench.runs find (result => result.isLeft)
  if (firstDone.nonEmpty) firstDone.get.right.get.id
  else firstFail.get.left.get.id
}
 
def isDone[Id](bench: Bench[Id,TaskStats]): Boolean = {
  // utility for detecting a bench without errors
  val warmupsFail = bench.warmups collect { case Left(task) => task }
  val runsFail    = bench.runs collect { case Left(task) => task }
  warmupsFail.isEmpty && runsFail.isEmpty
}
 
def avgTime(bench: Bench[Id,TaskStats]): Long = {
  // utility for calculating average time of runs in a benchmark
  val totals = bench.runs.collect{
    case Right(task) => task.stats.collect{ case TaskTime(value) => value }.sum
    case Left(task)  => task.stats.collect{ case TaskTime(value) => value }.sum
  }
  if (totals.nonEmpty) totals.sum / totals.length else 0
}
 
def minTime(bench: Bench[Id,TaskStats]): Long = {
  // utility for calculating minimum time of runs in a benchmark
  val totals = bench.runs.collect{
    case Right(task) => task.stats.collect{ case TaskTime(value) => value }.min
    case Left(task)  => task.stats.collect{ case TaskTime(value) => value }.min
  }
  totals.min
}
 
def noTime(bench: Bench[Id,TaskStats]): Long = {
  // utility for calculating number of runs in a benchmark
  val totals = bench.runs.collect{
    case Right(task) => task.stats.collect{ case TaskTime(value) => 1 }.sum
    case Left(task)  => task.stats.collect{ case TaskTime(value) => 1 }.sum
  }
  totals.sum
}
 
def avgMem(bench: Bench[Id,TaskStats]): Long = {
  // utility for calculating average memory usage in benchmark
  val totals = bench.runs.collect{
    case Right(task) => task.stats.collect{ case mem: TaskMem => mem.avgUsed }.sum
    case Left(task)  => task.stats.collect{ case mem: TaskMem => mem.avgUsed }.sum
  }
  if (totals.nonEmpty) totals.sum / totals.length else 0
}
 
def maxMem(bench: Bench[Id,TaskStats]): Long = {
  // utility for calculating max memory usage in benchmark
  val totals = bench.runs.collect{
    case Right(task) => task.stats.collect{ case mem: TaskMem => mem.maxUsed }.max
    case Left(task)  => task.stats.collect{ case mem: TaskMem => mem.maxUsed }.max
  }
  totals.max
}
 
def noMem(bench: Bench[Id,TaskStats]): Long = {
  // utility for calculating number of memory samples taken in benchmark
  val totals = bench.runs.collect{
    case Right(task) => task.stats.collect{ case TaskMem(samples) => samples.length }.sum
    case Left(task)  => task.stats.collect{ case TaskMem(samples) => samples.length }.sum
  }
  totals.sum
}

Next up, we need to refine the reporters to account for the new datatypes and output relevant statistics on memory usage.

def taskReport(title: String)(result: Task[Id,TaskStats]) {
  result match {
    case Right(task) =>
      val time = task.stats.collect{ case TaskTime(value) => value }.sum
      println(
        "%s %s %s completed in %s"
          .format(task.id.toString, title, task.run, timeToText(time))
      )
    case Left(task) =>
      val time = task.stats.collect{ case TaskTime(value) => value }.sum
      val error = task.error.getClass.getName
      println(
        "%s %s %s failed after %s from %s"
          .format(task.id.toString, title, task.run, timeToText(time), error)
      )
  }
}
 
val warmupReport = taskReport("warmup") _
 
val runReport = taskReport("run") _
 
def benchReport[Id](bench: Bench[Id,TaskStats]) {
  val id = taskId(bench)
  if (isDone(bench)) {
    val totalTime = bench.runs.collect{
      case Right(task) =>
        task.stats.collect{ case TaskTime(value) => value }.sum
    }.sum
    println(id + " finished in " + timeToText(totalTime) + "\n")
  }
  else
    println(id + " failed\n")
}
 
def batchReport(batch: Batch[Id,TaskStats]) {
  def propsToText(): String = {
    import scala.util.{Properties => Props}
    val os = java.lang.management.ManagementFactory.getOperatingSystemMXBean
    val props = scala.collection.immutable.SortedMap(
      "os" -> (os.getName + " (" + os.getVersion + ")"),
      "arch" -> os.getArch,
      "cpus" -> os.getAvailableProcessors.toString,
      "java" -> (Props.javaVmName + " " + Props.javaVersion),
      "scala" -> Props.versionString.replace("version","").trim
    )
    props.map{ case (k,v) => k + ":" + "\"" + v + "\"" }.mkString(" ")
  }
  def statsToText(status: String, benchs: Seq[Bench[Id,TaskStats]]): String = {
    benchs
      .map(bench => (
        avgTime(bench),noTime(bench),minTime(bench),
        avgMem(bench),noMem(bench),maxMem(bench),taskId(bench)
      ))
      .sortWith((a,b) => a._1 < b._1) // sort on durations
      .map(item =>
        "%2s %12s/%-6s %12s %10s/%-6s %10s %s"
         .format(
           status,timeToText(item._1),item._2,timeToText(item._3),
           memToText(item._4),item._5,memToText(item._6),item._7)
      )
      .mkString("\n")
  }
  val (doneBenchs,failedBenchs) = batch partition isDone
  println(
    "Batch of benchmarks finished with %s completed and %s failed.\n"
      .format(doneBenchs.length,failedBenchs.length)
  )
  println("Statistics for benchmarks:\n")
  println(
    "   %12s/%-6s %12s %10s/%-6s %10s %s"
      .format("avgtime","no","mintime","avgmem","no","maxmem","id")
  )
  println(statsToText("ok", doneBenchs))
  println(statsToText("na", failedBenchs))
  println("\nSystem properties:")
  println(propsToText)
}

For implementing the Monitor trait I use mixin inheritance for this example. One for TimeUsage and one for MemUsage. These traits can be stacked together and accumulate the statistics generated. TimeUsage is just an abstraction of the previous post’s task() code. MemUsage on the other hand is a little more involved. It will stay active during benchmarking and sample memory usage at regular intervals.

For multithreaded sampling I chose actors since they are fairly simple in use. A MemUsage object simply starts the sampler actor by sending a message with NextSample. This will trigger the sampler to start regular recordings of memory usage. When the benchmark is done the MemUsage object will send and wait for reply for a LastSample message. The sampler will reply with the sequence of MemStats samples collected and terminate. Last, the samples are stored in a TaskMem object and accumulated with the other TaskStats results.

trait TimeUsage extends Monitor[TaskStats] {
  var startTime: Long = 0L
 
  override def begin() {
    super.begin()
    startTime = System.nanoTime  
  }
 
  override def end(): Seq[TaskStats] = {
    List(TaskTime(duration=System.nanoTime - startTime)) ++ super.end()
  }
}
 
trait MemUsage extends Monitor[TaskStats] {
  import MemUsage.{NextSample,LastSample}
 
  val sampleRate: Long
  private val rt = java.lang.Runtime.getRuntime
  private val sampler = new actors.Actor {
    var samples = List.empty[MemStats]
    var running = true
    def act() {
      while(running) {
        try { receive {
          case NextSample =>
            samples ::=
              MemStats(free=rt.freeMemory, total=rt.totalMemory, max=rt.maxMemory)
            java.lang.Thread.sleep(sampleRate)
            this ! NextSample
          case LastSample =>
            running = false
            samples ::=
              MemStats(free=rt.freeMemory, total=rt.totalMemory, max=rt.maxMemory)
            reply(samples)
        } }
        catch {
          case error: java.lang.OutOfMemoryError =>
            // skip this sample then wait for LastSample
            java.lang.Thread.sleep(sampleRate)
        }
      }
    }
  }
 
  override def begin() {
    rt.gc()
    sampler.start()
    sampler ! NextSample
    super.begin()
  }
 
  override def end(): Seq[TaskStats] = {
    val stats = super.end()
    (sampler !? LastSample) match {
      case samples => List(TaskMem(samples.asInstanceOf[List[MemStats]])) ++ stats
    }
  }
}
 
object MemUsage {
  private case object NextSample
  private case object LastSample
}

In the new benchmark code we now need to make some adjustments to accommodate the extensions made. First we need to add a function for generating new monitors. Here we use inheritance to select which statistics we are interested in collecting and set relevant configuration. Finally, we also need to update task() calls. Since its a partially applied function where the Stats type parameter is specificed by a later argument, we now need to specify type parameters ahead of time.

val newMonitor = () => {
  new Monitor[TaskStats] with MemUsage with TimeUsage {
    val sampleRate = 50L
  }
}
val (warmups,runs) = (1,2)
val (start,end) = (1,5000000)
val sum = (a: Int, b: Int) => a + b
 
 
@scala.annotation.tailrec
def loopWithTailrec(start: Int, end: Int, sum: Int): Int = {
  if (start <= end) loopWithTailrec(start + 1, end, sum + start)
  else sum
}
 
val rangeTasks = List(
  task[Id,TaskStats](id="range+sum",
    fn=() => (start to end).sum) _,
  task[Id,TaskStats](id="range+foldLeft",
    fn=() => (start to end).foldLeft(0)(sum)) _,
  task[Id,TaskStats](id="range+foldRight",
    fn=() => (start to end).foldRight(0)(sum)) _
)
 
val listTasks = List(
  task[Id,TaskStats](id="list+sum",
    fn=() => List.range(start,end).sum) _,
  task[Id,TaskStats](id="list+foldLeft",
    fn=() => List.range(start,end).foldLeft(0)(sum)) _,
  task[Id,TaskStats](id="list+foldRight",
   fn=() => List.range(start,end).foldRight(0)(sum)) _
)
 
val vectorTasks = List(
  task[Id,TaskStats](id="vector+sum",
    fn=() => Vector.range(start,end).sum) _,
  task[Id,TaskStats](id="vector+foldLeft",
    fn=() => Vector.range(start,end).foldLeft(0)(sum)) _,
  task[Id,TaskStats](id="vector+foldRight",
    fn=() => Vector.range(start,end).foldRight(0)(sum)) _
)
 
val arrayTasks = List(
  task[Id,TaskStats](id="array+sum",
    fn=() => Array.range(start,end).sum) _,
  task[Id,TaskStats](id="array+foldLeft",
    fn=() => Array.range(start,end).foldLeft(0)(sum)) _,
  task[Id,TaskStats](id="array+foldRight",
    fn=() => Array.range(start,end).foldRight(0)(sum)) _
)
 
val streamTasks = List(
  task[Id,TaskStats](id="stream+sum",
    fn=() => Stream.range(start,end).sum) _,
  task[Id,TaskStats](id="stream+foldLeft",
    fn=() => Stream.range(start,end).foldLeft(0)(sum)) _,
  task[Id,TaskStats](id="stream+foldRight",
    fn=() => Stream.range(start,end).foldRight(0)(sum)) _
)
 
val itrTasks = List(
  task[Id,TaskStats](id="list+itr+foldRight",
    fn=() => List.range(start,end).iterator.foldRight(0)(sum)) _,
  task[Id,TaskStats](id="stream+itr+sum",
    fn=() => Stream.range(start,end).iterator.sum) _,
  task[Id,TaskStats](id="stream+itr+foldRight",
    fn=() => Stream.range(start,end).iterator.foldRight(0)(sum)) _
)
 
val loopTasks = List(
  task[Id,TaskStats](id="loop+for",
    fn=() => {
      var sum = 0
      for (i <- start to end) {
        sum = sum + i
      }
      sum
    }) _,
  task[Id,TaskStats](id="loop+while",
    fn=() => {
      var sum = 0
      var i = start
      while (i <= end) {
        sum = sum + i
        i = i + 1
      }
      sum
    }) _,
  task[Id,TaskStats](id="loop+tailrec",
    fn=() => {
      loopWithTailrec(start, end, 0)
    }) _
)
 
val benchs = (
  rangeTasks ++ listTasks ++ vectorTasks ++ arrayTasks ++
    streamTasks ++ itrTasks ++ loopTasks
).map(
  task => benchmark(task)(runs=runs,warmups=warmups)(newMonitor)(warmupReport,runReport) _
)
batch[Id,TaskStats](benchs)(benchReport)(batchReport)

Lets run the updated benchmark and see how memory usage plays into the equation. Now keep in mind that the memory related statistics isn’t necessarily accurate. Generated statistics accumulate memory. The JVM reserves memory for all kinds of optimizations so it might not reflect the actual memory usage of the data-structures. Runtime optimizations performed by the JVM might behave differently when code is used in an implementation. Different JVMs and hardware setups will also affect the result. Atleast you can get some idea of the time and memory usage involved and maybe get some insights into tradeoffs involved.

Batch of benchmarks finished with 16 completed and 5 failed.
 
Statistics for benchmarks:
 
        avgtime/no          mintime     avgmem/no         maxmem id
ok          5ms/2               5ms        3mb/4             3mb loop+while
ok          9ms/2               8ms        3mb/4             3mb loop+tailrec
ok         39ms/2              38ms        7mb/4            12mb loop+for
ok        152ms/2             150ms        3mb/8             7mb range+foldLeft
ok        181ms/2             178ms        4mb/9             8mb range+sum
ok        251ms/2             250ms       23mb/11           27mb array+foldRight
ok        254ms/2             253ms       21mb/10           25mb array+foldLeft
ok        294ms/2             291ms       34mb/12           58mb array+sum
ok      1s101ms/2           1s086ms       13mb/39           36mb stream+foldLeft
ok      1s366ms/2           1s361ms       78mb/45          139mb vector+foldLeft
ok      1s419ms/2           1s416ms       80mb/43          155mb vector+sum
ok      1s547ms/2           1s513ms      123mb/23          201mb range+foldRight
ok      2s002ms/2           1s989ms      112mb/36          199mb list+foldLeft
ok      2s062ms/2           2s061ms      116mb/38          212mb list+sum
ok      2s350ms/2           2s278ms      117mb/45          224mb vector+foldRight
ok      7s166ms/2           6s726ms      181mb/88          247mb list+itr+foldRight
na          2ms/2               2ms        1mb/4             1mb stream+foldRight
na      1s505ms/2           1s247ms       91mb/34          185mb list+foldRight
na     19s834ms/2          19s563ms      193mb/88          247mb stream+sum
na     26s811ms/2          26s255ms      193mb/101         247mb stream+itr+foldRight
na     26s869ms/2          26s692ms      186mb/114         247mb stream+itr+sum
 
System properties:
arch:"x86" cpus:"2" java:"Java HotSpot(TM) Client VM 1.6.0_27" os:"Windows 7 (6.1)" scala:"2.9.1.final"

A small benchmark util for Scala

Specifying the benchmark

Note: there’s now also a follow-up post which extends this benchmark.

This post was inspired by an answer to the Hidden features of Scala question over at StackOverflow.com (and also another blog post I cannot find back to). I expanded a little on the code that was posted, which resulted in the following neat lump of code.

case class TaskDone[Id](id: Id, run: Int, duration: Long)
case class TaskFail[Id](id: Id, run: Int, duration: Long, error: Throwable)
case class BenchDone[Id](warmups: Seq[Task[Id]], runs: Seq[Task[Id]])
 
type Report[Result] = Result => Unit
type Task[Id]       = Either[TaskFail[Id],TaskDone[Id]]
type Bench[Id]      = BenchDone[Id]
type Batch[Id]      = Seq[Bench[Id]]
 
def task[Id](id: Id, fn: () => Unit)(run: Int)(report: Report[Task[Id]]): Task[Id] = {
  val begin = System.nanoTime
  val result = try {
    fn()
    Right(TaskDone(id=id,run=run,duration=System.nanoTime - begin))
  } catch {
    case exp: Throwable =>
      Left(TaskFail(id=id,run=run,duration=System.nanoTime - begin,error=exp))
  }
  report(result)
  result
}
 
def benchmark[Id](task: Int => Report[Task[Id]] => Task[Id])(runs: Int, warmups: Int)
                 (warmupReport: Report[Task[Id]], runReport: Report[Task[Id]])
                 (benchReport: Report[Bench[Id]]): Bench[Id] = {
  assert(runs > 0, "Number of runs must be greater than zero.")
  assert(warmups >= 0, "Number of warmups must be zero or greater.")
 
  val (warmupsResults,runsResults) = (
    (1 to warmups) map (run => task(run)(warmupReport)),
    (1 to runs) map (run => task(run)(runReport))
  )
  val result = BenchDone(warmups=warmupsResults,runs=runsResults)
  benchReport(result)
  result
}
 
def batch[Id](benchs: Seq[Report[Bench[Id]] => Bench[Id]])
             (benchReport: Report[Bench[Id]])(batchReport: Report[Batch[Id]]) {
  val result = benchs map (bench => bench(benchReport))
  batchReport(result)
}

What this code does is create a task() with an id and a function of () => Unit to apply. This task is then added to a benchmark(), which is further specified with how many runs and warmups to perform, and a set of task reporters which handles individual results. Finally, a batch() is constructed and given a set of benchmarks and a benchmark reporter. When this batch is completed with a batch reporter, it will trigger the benchmarks to run, and subsequently the individual tasks. The benchmark results are then accumulated and fed back into the batch reporter.

The following diagram gives visual sketch of the code. Note that half-open endpoints signify arguments already specified from outside all the enclosing functions.

Example usage

Before specifying the tasks, lets first define the reporters that will handle the results. This is where the largest bulk of code resides.

For tasks we can define two kinds of reporters. One for warmup runs and the other for ordinary runs. Each reporter handles a result in form of an Either[TaskFail[Id],TaskDone[Id]] which it can pattern matched against. A Right[TaskDone[Id]] result indicates a successful run with data on id, run number and duration. If not, there is a Left[TaskFail[Id]] result that indicates a failed run with data on id, run number, duration, and an exception. A benchmark reporter is also defined which handles the accumulated results of warmups runs and ordinary runs. The resulting data from all benchmarks is then finally fed into a batch reporter.

In the following code warmupReport() and runReport() specifies the task reporters, benchReport() handles the accumulated runs, and batchReport() presents the final statistics.

type Id = String
 
def timeToText(value: Long): String = {
  // utility for turning nanoseconds into text
  val ms      = (value / 1000000L).toLong
  val msPart  = ms % 1000L
  val sec     = (ms - msPart) / 1000L
  val secPart = sec % 60L
  val min     = (sec - secPart) / 60L
  val minPart = min
  if (minPart > 0) minPart + "m" + secPart + "s" + msPart + "ms"
  else if (secPart > 0)  secPart + "s" + msPart + "ms"
  else msPart + "ms"
}
 
def taskId[Id](bench: Bench[Id]): Id = {
  // utility for extracting task id
  val firstDone = bench.runs find (result => result.isRight)
  val firstFail = bench.runs find (result => result.isLeft)
  if (firstDone.nonEmpty) firstDone.get.right.get.id
  else firstFail.get.left.get.id
}
 
def isDone[Id](bench: Bench[Id]): Boolean = {
  // utility for detecting a bench without errors
  val warmupsFail = bench.warmups collect { case Left(task) => task }
  val runsFail    = bench.runs collect { case Left(task) => task }
  warmupsFail.isEmpty && runsFail.isEmpty
}
 
def avgTime(bench: Bench[Id]): Long = {
  // utility for calculating average time of runs in a benchmark
  val totals = bench.runs.collect{
    case Right(task) => task.duration
    case Left(task)  => task.duration
  }
  if (totals.nonEmpty) totals.sum / totals.length else 0
}
 
def taskReport(title: String)(result: Task[Id]) {
  result match {
    case Right(task) =>
      println(
        task.id.toString + " " + title + " " + task.run + " completed in " +
          timeToText(task.duration)
      )
    case Left(task) =>
      println(
        task.id.toString + " " + title + "  " + task.run + " failed after " + 
          timeToText(task.duration) + " from " + task.error.getClass.getName
      )
  }
}
 
val warmupReport = taskReport("warmup") _
 
val runReport = taskReport("run") _
 
def benchReport[Id](bench: Bench[Id]) {
  val id = taskId(bench)
  if (isDone(bench)) {
    val totalTime = bench.runs.collect{ case Right(task) => task.duration }.sum
    println(id + " finished in " + timeToText(totalTime) + "\n")
  }
  else
    println(id + " failed\n")
}
 
def batchReport(batch: Batch[Id]) {
  val (doneBenchs,failedBenchs) = batch partition isDone
  println(
    "Batch of benchmarks finished with %s completed and %s failed.\n"
      .format(doneBenchs.length,failedBenchs.length)
  )
  println("Average times for benchmarks:")
  println(
    doneBenchs
      .map(bench => (taskId(bench),avgTime(bench)))
      .sortWith((a,b) => a._2 < b._2) // sort on durations
      .map(item => "%10s %s".format(timeToText(item._2),item._1))
      .mkString("\n")
  )
  println(
    failedBenchs
      .map(bench => "%10s %s".format("na",taskId(bench)))
      .mkString("\n")
  )
}

With the reporters in place we can begin to define some tasks to do benchmarks on. We want to check how well the Scala collections library handles folding over ranges. A range in Scala is simply created by writing 1 to 100 which represents the list of numbers from 1 to 100. Now, this doesn’t actually create and allocate a whole list. The individual elements are instead lazily constructed. What we want to check is how this data structure behaves when we fold it to sum up its elements. We will do fold from left-to-right and from right-to-left. If there are any problems associated with ranges they will most likely show up with large enough ranges. For comparison, we will also create ordinary lists, vectors, arrays, and streams to compare against.

val (start,end) = (1,5000000)
val sum = (a: Int, b: Int) => a + b
 
val rangeTasks = List(
  task(id="range+sum",       fn=() => (start to end).sum) _,
  task(id="range+foldLeft",  fn=() => (start to end).foldLeft(0)(sum)) _,
  task(id="range+foldRight", fn=() => (start to end).foldRight(0)(sum)) _
)
 
val listTasks = List(
  task(id="list+sum",       fn=() => List.range(start,end).sum) _,
  task(id="list+foldLeft",  fn=() => List.range(start,end).foldLeft(0)(sum)) _,
  task(id="list+foldRight", fn=() => List.range(start,end).foldRight(0)(sum)) _
)
 
val vectorTasks = List(
  task(id="vector+sum",       fn=() => Vector.range(start,end).sum) _,
  task(id="vector+foldLeft",  fn=() => Vector.range(start,end).foldLeft(0)(sum)) _,
  task(id="vector+foldRight", fn=() => Vector.range(start,end).foldRight(0)(sum)) _
)
 
val arrayTasks = List(
  task(id="array+sum",       fn=() => Array.range(start,end).sum) _,
  task(id="array+foldLeft",  fn=() => Array.range(start,end).foldLeft(0)(sum)) _,
  task(id="array+foldRight", fn=() => Array.range(start,end).foldRight(0)(sum)) _
)
 
val streamTasks = List(
  task(id="stream+sum",       fn=() => Stream.range(start,end).sum) _,
  task(id="stream+foldLeft",  fn=() => Stream.range(start,end).foldLeft(0)(sum)) _,
  task(id="stream+foldRight", fn=() => Stream.range(start,end).foldRight(0)(sum)) _
)
 
val itrTasks = List(
  task(id="list+itr+foldRight",   fn=() => List.range(start,end).iterator.foldRight(0)(sum)) _,
  task(id="stream+itr+sum",       fn=() => Stream.range(start,end).iterator.sum) _,
  task(id="stream+itr+foldRight", fn=() => Stream.range(start,end).iterator.foldRight(0)(sum)) _
)
 
val benchs = (
  rangeTasks ++ listTasks ++ vectorTasks ++ arrayTasks ++ streamTasks ++ itrTasks
).map(
  task => benchmark(task)(runs=2,warmups=1)(warmupReport,runReport) _
)
 
batch(benchs)(benchReport)(batchReport)

Running this code had the following results on my computer. It shows that foldRight() is having problems in most collections. Lists and streams seems to have stumbled upon some problems with the callstack. If the range is extended slightly further, it will also show that foldRight() on a range will run out of memory. Don’t get too worried though. If the number of elements are quite small, these constraints have little influence. Also, per Scala mailing list you can avoid this problem by turning your sequence into an iterator with seq.iterator.foldRight(), which will add the necessary optimizations to counter the lack of tail-call optimization in the JVM.

range+sum warmup 1 completed in 185ms
range+sum run 1 completed in 139ms
range+sum run 2 completed in 140ms
range+sum finished in 279ms
 
range+foldLeft warmup 1 completed in 138ms
range+foldLeft run 1 completed in 140ms
range+foldLeft run 2 completed in 140ms
range+foldLeft finished in 280ms
 
range+foldRight warmup 1 completed in 2s774ms
range+foldRight run 1 completed in 1s280ms
range+foldRight run 2 completed in 1s249ms
range+foldRight finished in 2s529ms
 
list+sum warmup 1 completed in 1s682ms
list+sum run 1 completed in 1s179ms
list+sum run 2 completed in 1s581ms
list+sum finished in 2s761ms
 
list+foldLeft warmup 1 completed in 1s428ms
list+foldLeft run 1 completed in 1s615ms
list+foldLeft run 2 completed in 1s478ms
list+foldLeft finished in 3s94ms
 
list+foldRight warmup  1 failed after 682ms from java.lang.StackOverflowError
list+foldRight run  1 failed after 869ms from java.lang.StackOverflowError
list+foldRight run  2 failed after 741ms from java.lang.StackOverflowError
list+foldRight failed
 
vector+sum warmup 1 completed in 792ms
vector+sum run 1 completed in 945ms
vector+sum run 2 completed in 783ms
vector+sum finished in 1s728ms
 
vector+foldLeft warmup 1 completed in 831ms
vector+foldLeft run 1 completed in 908ms
vector+foldLeft run 2 completed in 745ms
vector+foldLeft finished in 1s654ms
 
vector+foldRight warmup 1 completed in 2s67ms
vector+foldRight run 1 completed in 1s888ms
vector+foldRight run 2 completed in 2s51ms
vector+foldRight finished in 3s939ms
 
array+sum warmup 1 completed in 587ms
array+sum run 1 completed in 303ms
array+sum run 2 completed in 284ms
array+sum finished in 588ms
 
array+foldLeft warmup 1 completed in 268ms
array+foldLeft run 1 completed in 269ms
array+foldLeft run 2 completed in 328ms
array+foldLeft finished in 597ms
 
array+foldRight warmup 1 completed in 254ms
array+foldRight run 1 completed in 267ms
array+foldRight run 2 completed in 245ms
array+foldRight finished in 512ms
 
stream+sum warmup  1 failed after 10s30ms from java.lang.OutOfMemoryError
stream+sum run  1 failed after 9s46ms from java.lang.OutOfMemoryError
stream+sum run  2 failed after 9s271ms from java.lang.OutOfMemoryError
stream+sum failed
 
stream+foldLeft warmup 1 completed in 575ms
stream+foldLeft run 1 completed in 606ms
stream+foldLeft run 2 completed in 574ms
stream+foldLeft finished in 1s181ms
 
stream+foldRight warmup  1 failed after 1ms from java.lang.StackOverflowError
stream+foldRight run  1 failed after 2ms from java.lang.StackOverflowError
stream+foldRight run  2 failed after 2ms from java.lang.StackOverflowError
stream+foldRight failed
 
list+itr+foldRight warmup 1 completed in 2s925ms
list+itr+foldRight run 1 completed in 2s812ms
list+itr+foldRight run 2 completed in 3s19ms
list+itr+foldRight finished in 5s831ms
 
stream+itr+sum warmup 1 completed in 1s547ms
stream+itr+sum run 1 completed in 1s41ms
stream+itr+sum run 2 completed in 871ms
stream+itr+sum finished in 1s912ms
 
stream+itr+foldRight warmup 1 completed in 3s715ms
stream+itr+foldRight run 1 completed in 3s438ms
stream+itr+foldRight run 2 completed in 3s623ms
stream+itr+foldRight finished in 7s62ms
 
Batch of benchmarks finished with 15 completed and 3 failed.
 
Average times for benchmarks:
     139ms range+sum
     140ms range+foldLeft
     256ms array+foldRight
     294ms array+sum
     298ms array+foldLeft
     590ms stream+foldLeft
     827ms vector+foldLeft
     864ms vector+sum
     956ms stream+itr+sum
   1s264ms range+foldRight
   1s380ms list+sum
   1s547ms list+foldLeft
   1s969ms vector+foldRight
   2s915ms list+itr+foldRight
   3s531ms stream+itr+foldRight
        na list+foldRight
        na stream+sum
        na stream+foldRight