The popularization of the Scala programming language, noticeable by the abundance of opinions and criticism on blogs and social networks (like this one by Nikita Ivanov from GridGain and the popular Yammer case), greatly increased the amount of information about the language. However, the quality of such information often leaves much to be desired.
Whether those opinions are favorable or contrary to the Scala, they often contain outdated, superficial or biased statements. The goal of this article is to help those learning or evaluating Scala to come into their own conclusions. It presents the most common questions about language and its environment and, for each one, added clarifications, examples and links, favoring the formation of a better opinion or a more accurate assessment.
Scala is a compiled language, designed to run on a managed environment, most likely the JVM, and offers the union of the functional and object-oriented paradigms. Scala offers functional and object-oriented programming, a modern compiler and a type system checked at compile time, as in Java, but with the expressive syntax of (usually) interpreted languages, such as Groovy or Ruby. However, the same features that makes Scala expressive can also lead to performance problems and complexity. This article details where this balance needs to be considered and it is not an introduction or tutorial, but does not presume knowledge in Scala.
Is Scala more productive than Java?
As development productivity is a very subjective matter, it is necessary to decompose this issue, evaluating the features of Scala that normally support it.
Functional Programming
The functional paradigm expresses programs as functions in the mathematical sense, mapping from one value to another (f (x) = y) without side effects, such as maintaining the status of objects or input/output of data. This allows several compile time optimizations and avoids concurrency problems, as will be presented throughout this article.
Functional programming is a software paradigm based on the Lambda Calculus and has long been part of the academic environment in languages like Lisp, Scheme and Haskell. It is a trend of commercial languages such as Java and C # to incorporate features from these languages, especially closure, but this effort is limited by backward compatibility, specifications and conflicts of interest. These limitations opens space for new multi-paradigm languages to be developed, such ad as Clojure and Scala, seeking to offer the best of both Object Oriented and Functional worlds.
Features such as immutable values, collections, higher-order functions and pattern matching encourages Scala developers to use the functional style. This can make developers familiar with this paradigm more productive, but may also alienate those who do not know it. For example, a program that prints the thousandth element of the Fibonacci sequence can be written as follows in Java:
import java.math.BigInteger; public class FiboJava { private static BigInteger fibo(int x) { BigInteger a = BigInteger.ZERO; BigInteger b = BigInteger.ONE; BigInteger c = BigInteger.ZERO; for (int i = 0; i < x; i++) { c = a.add(b); a = b; b = c; } return a; } public static void main(String args[]) { System.out.println(fibo(1000)); } }
Translated literally to Scala, this code would become:
object FiboScala extends App { def fibo(x: Int): BigInt = { var a: BigInt = 0 var b: BigInt = 1 var c: BigInt = 0 for (_ <- 1 to x) { c = a + b a = b b = c } return a } println(fibo(1000)) }
These are the implementations of the iterative algorithm in Java and Scala, remarkably similar. However, a more compact and functional version, using infinite sequences and tuples can be very different:
object FiboFunctional extends App { val fibs:Stream[BigInt] = 0 #:: 1 #:: (fibs zip fibs.tail).map{ case (a,b) => a+b } println(fibs(1000)) }
This version can be considered concise or complex, according to individual abilities and preferences. Functional programming is not the only source of complexity in Scala, as will be shown throughout this article, but is relevant to the difference in productivity.
To learn more about functional programming in Scala, this video of Nathan Hamblen introduces the paradigm in general and this another, by Daniel Spiewak, goes a step further, implementing some functional data structures. The tutorials in the official documentation and the ones published by the Twitter team are also often recommended.
Less boilerplate code
Some Scala features, such as type inference, unchecked exceptions, optional objects and implicit conversions, can greatly reduce the amount of statements and checks in a program, without changing its meaning. Furthermore, Scala tries to be leaner than Java, removing features that can be expressed in terms of others. For example, there are no static references or primitive types in Scala, because the same effect can be obtained using plain objects.
To understand the difference in boilerplate code, consider this program that prints the MAC addresses of all network interfaces in the system. There are several ways to write this code, but that's the way it would probably be written in each language.
In Java:
import java.net.NetworkInterface; import java.net.SocketException; import java.util.Collections; import java.util.Enumeration; public class ListMACsJava { public static void main(String[] args) throws SocketException { Enumeration<NetworkInterface> nics = NetworkInterface.getNetworkInterfaces(); for (NetworkInterface nic : Collections.list(nics)) { byte[] mac = nic.getHardwareAddress(); for (int i = 0; mac != null && i < mac.length; i++) { System.out.format("%2x", mac[i]); System.out.print(i == mac.length - 1 ? '\n' : ':'); } } } }
In Scala:
import java.net.NetworkInterface import scala.collection.JavaConversions._ object ListMACsScala { def main(args: Array[String]) { NetworkInterface .getNetworkInterfaces .flatMap(nic => Option(nic.getHardwareAddress)) .map(_ map ("%02x" format _) mkString ":") .foreach(println(_)) } }
Or, another implementation in Scala using sequence comprehension and explicit conversion :
import collection.JavaConverters._ import java.net.NetworkInterface object ListMACsForScala extends App { val nicaddresses = for { nic <- NetworkInterface.getNetworkInterfaces.asScala addrbytes <- Option(nic.getHardwareAddress) } yield { addrbytes map { "%02x" format _ } mkString ":" } nicaddresses foreach println }
In this example, the following differences are relevant in the elimination of boilerplate code:
- Scala methods are public by default and the main method uses the procedure syntax, which returns void, so these statements may be omitted. The main method can be totally ommited using the App trait
- Scala does not force the verification of exceptions, so the try-catch blocks and throws declarations can be omitted.
- List operations can be chained fluently, without needing to adapt the underlying API or use additional libraries.
- The Option class prevents NullPointerExceptions, eliminating the checking for null.
Other features, like Implicit Conversion and Closures, may contribute to the reduction of code. Although these features do not exist in Java, some of them can be simulated or adapted, mostly by creating fluent APIs.
Traits
Traits are type definitions, similar to Java interfaces, but may contain implementation and are cumulative, offering a different approach to code reuse. A trait is similar to the combination of an abstract class and an interface in a single definition. The following example, taken from the shows how to combine traits to double, increment and/or filter a queue of integers:
import scala.collection.mutable.ArrayBuffer //Type Definition abstract class IntQueue { def get(): Int def put(x: Int) def size(): Int } //ArrayBuffer implementation class IntQueueImpl extends IntQueue { private val buf = new ArrayBuffer[Int] def get = buf remove 0 def put(x: Int) { buf += x } def size = buf length } trait Doubling extends IntQueue { abstract override def put(x: Int) { super.put(2 * x) } } trait Incrementing extends IntQueue { abstract override def put(x: Int) { super.put(x + 1) } } trait Filtering extends IntQueue { abstract override def put(x: Int) { if (x > 0) super.put(x) } }
Once defined, traits can be "mixed in" during either type definition or object instantiation:
//Mixing traits in type definition class DoublePlusOneQueue extends IntQueueImpl with Incrementing with Doubling object QueueWithTraits { def main(args: Array[String]) { val queue1 = new DoublePlusOneQueue queue1 put 1 queue1 put 2 println(queue1 get) println(queue1 get) //Mixing traits in object instantiation val queue2 = new IntQueueImpl with Filtering queue2 put -1 queue2 put 1 println(queue2 size) } }
Is Scala complex?
The same features that can make Scala more productive can also make it unreadable. A feature that is oftenly questioned is the use of symbols in method names, like the method ++ class. These methods, disguised as operators, may be useful to represent frequent operations, like list concatenation or addition of complex numbers. However, this nomenclature may easily be abused. When combined with type limits, variance and partial functions, delicate topics in any language, the statements may become very difficult, like this one mentioned in this post "Opinion:Scala is the new EJB2?":
def + + [B>: A, That] (that: TraversableOnce [B]) (implicit bf: CanBuildFrom [List [A], B, That]): That
Another difficulty is that ScalaDocs, the official documentation of the class library, is incomplete in many aspects. However, there is an effort to improve this documentation and other resources for learning, all being grouped into a documentation site.
This does not mean that there is no inherent complexity in Scala. In the article "True Scala Complexity", Zhang Yang presents a detailed example of how the type system and conversions can get confusing, even for an experienced Scala developer. Some of these complexities can be prevented, alleviated or eliminated in the future, but others shall remain, such as those resulting from the integration of a expressive type system over both functional and object-oriented paradigms.
Does Scala offer better concurrency?
Writing correct and efficient concurrent programs is difficult. Debugging these programs can be even more challenging and unpredictable. Scala offers parallelism at a high level of abstraction, but so does Java, particularly with the concurrency features introduced by Java 7. The concurrency features of the two languages are similar in their purpose, but they are very different in their architecture and a full comparison is beyond the scope of this article. However, this issue becomes clearer when analysing the features of Scala that support it.
Unfortunately, programmers have found it very difficult to reliably build robust multi-threaded applications using the shared data and locks model, especially as applications grow in size and complexity. The problem is that at each point in the program, you must reason about what data you are modifying or accessing that might be modified or accessed by other threads, and what locks are being held. At each method call, you must reason about what locks it will try to hold, and convince yourself that it will not deadlock while trying to obtain them.
Scala attempts to reduce these difficulties using the immutability and actors. Furthermore, the use of functional programming facilitates internal parallelism, automated by the compiler or libraries. The Closures proposal for Java 8 attempts to incorporate some of these features in the Java language and its libraries.
Immutability
Synchronizing access to shared mutable objects can result in much complexity in the use of concurrency primitive (locks, semaphores, etc.). Although this is not a common concern for application developers, it can be troublesome for developers of servers and frameworks. Scala tries to mitigate this problem by using immutable objects and pure functions. If an object is immutable, it can be shared or copied without worrying about who is using it, so it is naturally "thread-safe."
Unlike other functional languages, Scala does not force objects to be immutable. Mutable objects are important to implement a number of requirements, functional and nonfunctional. What Scala does is encourage the distinction between mutable and immutable using different packages and statements for each case.
This talk, by Rich Hickey, presents the main ideas behind the immutability and some considerations on how to code with it.
Actors
Low-level parallelism controls, such as locks and synchronized blocks, are sufficient to write concurrent programs correctly - but this task may not be easy. To write this type of program more productively and prevent defects, a high level concurrency control is very desirable. Such abstraction can be like Fork /Join, Software Transactional Memory, or, as featured in Scala, the Actor Model. In this abstraction, the parallelism is expressed as actors reacting to messages, rather than locking and releasing of threads.
The following example demonstrates actors estimating the value of Pi using the Monte Carlo method. This method generates random points in a square and calculates the ratio of them that falling within the inscribed circle, approximating Pi. The following implementation uses a recurring pattern in the actors model: one actor will be the "coordinator", managing several "calculators" who, either alone or cooperatively, progress towards the outcome.
The "calculator" in this example may receive two messages: "Calculate", which is replied with an estimate of Pi and "ShutDown", which shuts down the actor. As these messages are simple notifications, they can be represented as constant case objects. The calculating actor can be written as follows:
import scala.util.Random import Math._ import System._ import scala.actors.Actor import scala.actors.Actor._ case object Calculate case object ShutDown class Calculator extends Actor { val rand = new Random var pi, in, cnt = 1.0 def act { while (true) { receive { case Calculate => sender ! estimativeOfPi case ShutDown => exit } } } def estimativeOfPi: Double = { val x = rand.nextDouble - 0.5 val y = rand.nextDouble - 0.5 cnt += 1.0 if (sqrt(x * x + y * y) < 0.5) in += 1 in / cnt * 4 } }
The "coordinator" starts a list of calculators and tell them to calculate until any of them produces an accurate enough estimation and, in this case, terminates the calculation and prints the value found and the execution time:
class Coordinator(numOfCalculators: Int) extends Actor { def act { val startedAt = currentTimeMillis var calculators = List.fill(numOfCalculators)(new Calculator) calculators.foreach(c => { c.start c ! Calculate }) while (true) { receive { case estimative: Double => val error = abs(Pi - estimative) if (error > 0.0000001) sender ! Calculate else { val tempo = currentTimeMillis - startedAt calculators.foreach(_ ! ShutDown) println("Pi found by " + sender + " = " + estimative) println("Execution time: " + tempo) exit } } } } }
Finally, an object with a main method initializes the coordinator with the number of calculators to be used. The higher the number of coordinators, the more likely one of them will find the desired value sooner, reducing the total execution time.
object PiActors extends App { new Coordinator(2) start }
The actors model goes beyond the communication between threads. The Akka platform extends the model to support remote actors and adds several tools for developing distributed systems with high scalability and fault tolerance.
Although useful in many cases, the actors model and its implementation in Scala is not free of controversies. Much of its benefits are not obtained from the actors themselves, but the exchange of messages, which are usually immutable and favors the absence of shared mutable state. The default implementation in Scala, shown above, binds each actor to a native thread, which can be problematic, as shown by Tony Arcieri in the post "Why I Do not Like Scala". There are alternative implementations, using event-based actors for example, but they always come at some cost, at least of complexity.
Parallel Collections
Scala 2.9 introduced parallel collections, which makes it easy to parallelize the execution of common collections operations, such as map(), filter () and foreach(). The par() method can be used to obtain a parallel version of the collection, as shown in this example:
object ParCol extends App { (1 to 5) foreach println (1 to 5).par foreach println }
Note that the order of the printed elements in the second statement is unpredictable, as it depends on the scheduling of threads by the operating system. Aleksandar Prokopec shows interesting details of how the parallel collections were implemented in his presentation at the Scala Days 2010.
Is Scala extensible?
Scala allows developers to customize the look and feel of the language, creating new languages and altering the compilation process. Such tasks can be challenging or even make the code unreadable, but greatly extend the possibilities of language.
Domain Specific Languages
Using generic classes, abstract types, functions as objects, methods named as operators and implicit conversions, Scala code can become a domain specific language (DSL). This is very useful when the language is exposed to domain experts, allowing them to customize the system at runtime, as exemplified by Debasish Ghosh in his financial DSL.
Domain-specific languages can also be created when a more abstract and declarative language is needed by developers. For example, Apache Camel offers a Scala DSL to make the configuration of service routes more concise and correct.
The development of domain-specific languages is a subject as deep as popular. For a introduction, see this presentation by Martin Fowler, who also wrote a book on DSLs.
Changing the compilation
Scala takes the development of languages a step further with parser combinators, allowing the creation of entirely new grammars, as shown by Daniel Spiewak in this article. When even that is not enough, one can still create compiler plugins to change the build process. These plugins could be written, for example, to perform static code analysis and evaluating metrics, like PMD or FindBugs. Another possibility would be to create a plugin to change or optimize the behavior of a library at compile time.
These features can make Scala code look very different from the original language, as shown by Michael Fogus in his implementation of BASIC in Scala. These customizations of the language can be used to solve complex problems elegantly, but can also be abused, alienating developers unfamiliar with the context of the changes.
Are Scala and Java interoperable?
Although it is possible to invoke Java methods in Scala code and vice versa, the interaction between languages is not without complications.
When calling Scala methods from Java, the developer needs to understand how the features that do not exist in Java are transformed into executable objects. For example, methods with non-alphabetic names, receiving functions and tuples as parameters, do work when used in java code, but need to be written properly, as shown in this article.
When invoking Java from Scala, the problems is the features of Java that were abandoned or implemented differently in Scala, such as interfaces, annotations and collections. This article explains more about these differences and alternatives for various cases, but some of these differences may force the developer to write code in the two languages. For example, interfaces and annotation must be written in Java, since these are usable, but not declarable in Scala.
The scalac compiler does not compile java code or invoke the javac compiler to do it, but it analyses the .java
Is Scala tooling bad?
The fifteen years of history and the strength of the Java community are certainly reflected by the abundance and maturity of its tools. The Scala tools are evolving faster, especially with the efforts from Typesafe (the company behind the language and several related projects) and community contributions, as those from Twitter.
Using Java Libraries and Frameworks in Scala
One of the benefits of using a JVM language is the abundance of libraries and frameworks available for reuse. Considering the limitations of interoperability presented above, any library or framework available for Java can be used in Scala. This includes all Java EE (EJB, JSF, JAX-RS, etc.) and popular libraries, such as Hibernate and JUnit, because Java and Scala classes are virtually indistinguishable once compiled. For example, a servlet that prints the parameters of HTTP requests can be written in Scala as follows:
import javax.servlet.http._ import javax.servlet.annotation.WebServlet import scala.collection.JavaConversions._ @WebServlet(Array("/printParams")) class PrintParametersServlet extends HttpServlet { override def doGet(req: HttpServletRequest, resp: HttpServletResponse) { val out = resp.getWriter req.getParameterMap .map { case (key, value) => key + " = " + value.mkString(",") } .foreach(out println _) } }
Scala Libraries and Frameworks
The problem when using Java libraries in Scala is that they were not designed to the syntax improvements of Scala. In some cases this is just inconvenient, but sometimes it may require the boilerplate code that Scala tries to avoid.
For an extreme example, take the CollectionUtils class from the Apache Commons Collections library. It has methods to filter, transform and iterate over collections, as does the Scala standard library. A program that uses Commons Collections to print the double of each positive element from a list of integers can be written as follows in Scala:
import org.apache.commons.collections.CollectionUtils._ import org.apache.commons.collections._ import scala.collection.JavaConversions._ object Collections extends App { val myList = List.range(-5, 5) forAllDo( collect( select(myList, new Predicate { def evaluate(obj: Object): Boolean = return obj.asInstanceOf[Int] > 0 }), new Transformer { def transform(obj: Object): Object = { return (2 * obj.asInstanceOf[Int]) .asInstanceOf[Object] } }), new Closure { def execute(obj: Object) { println(obj) } }) }
While correct, that code would probably be considered very wordy or repetitive by a Scala developer. However, using native Scala collections, one can write the same program in a much more idiomatic syntax:
myList filter(_ > 0) map(_ * 2) foreach println
Although quite obvious, this is the reason why several libraries end up being re-developed in Scala. It is the case, for example, of the Scalaz library, which features several data structures more suitable to functional programming. The same goes for web frameworks (Lift, Play), libraries for TDD and BDD (ScalaTest, Specs) and many others. Some of these libraries are very similar to their Java versions, while others are radically different and inspired by libraries from other languages, such as Scalatra, based on the library Sinatra, for Ruby or ScalaCheck based on QuickCheck for Haskell.
Development Tools
All major Java IDEs (Eclipse, NetBeans and IntelliJ IDEA) servers, etc.) may be lacking or defective.
The abundance of build automation tools can be both positive and confusing. The Simple Build Tool (SBT) is widely used, almost a community standard. However, to integrate with other tools (IDEs, servers, etc.) and support the complete development process (testing, packaging, deployment, etc.) tools such as maven, ant or Buildr may be more appropriate and they already support Scala, usually providing a wider variety of plugins and integrations.
Like the libraries, many Scala tools are inspired in languages other than Java. The most popular is the interactive console (REPL), very traditional in scripting languages and useful to test and learn the language and its libraries. Such tools can be unusual to the developer familiar with the Java environment, so the migration of the development environment needs to be analyzed carefully. Not all tools will have an exact parallel, but the differences may be beneficial to the development cycle.
Scala runs on both Java and. NET?
Scala has been designed to be independent of the underlying virtual machine, but the target was clearly the JVM. At the beginning of the project in 2004, a .NET version was developed, but that never got much popularity and quickly became obsolete. However, in June 2011 a project conducted by the Ecole Polytechnique Fédérale de Lausanne has achieved significant results in adapting the Scala to the .NET platform, as explained in this article. The .NET version is still in early development and has several limitations, but it is already possible to run Scala programs on both platforms. For more information about the Scala port to .NET, see this this interview with Martin Odersky and this page of the Scala website.
Portability to other platforms, even beyond Java and .NET, is an interesting feature of Scala. An innovative project in this sense is the scala-llvm, an implementation of the Scala compiler for LLVM. However, such portability is still only a possibility, Java being the only platform that supports all the features, tools and libraries usually expected by enterprise developers.
Is Scala slow?
Leaps in level of abstraction are usually followed by performance criticism and Scala is no exception. In general, Java and Scala systems have similar runtime performance, since both are subject to the costs and benefits of the JVM. At compile time, however, the Scala compiler has much more work to do and is quite slow compared to the Java compiler.
Runtime Performance
Performance differences usually arise from the features of Scala that are not natively supported by the JVM. Some of these features, such as closures, are likely to be supported soon, but many others never will. Therefore, a lean code at a high level of abstraction written in Scala can be compiled to a large amount of bytecode, degrading runtime performance, as shown in this presentation.
This can be a problem for performance sensitive programs, but there are workarounds. The most common is to analyze the generated bytecode using javap to understand what is happening under the hood and become familiar with the performance of the features and libraries. Understanding bytecode and tuning low level performance is not a simple task, but in traditional functional languages like Haskell and Scheme implementations, such analysis is much more complex, if possible.
However, it should be noted that the language is important for the performance of a system, but is not the only factor. The benefits obtained from concurrency utilities and other features may offset or even exceed the penalties from abstraction when measuring the final latency and throughput.
Compilation Performance
Some Scala features, such as the search for implicit conversions, can take a long time to build, and the compiler still need to check both the Scala type system and the type system of the underlying platform (JVM, CLR or other). All this makes compilation slower, but does so in the benefit of flexibility and platform independence.
The IDE or build tool can use incremental compilation to alleviate these problems, but for complex compilations or continuous integration, compiler performance can be an issue.
Scala binaries are not backward compatible?
Maintaining binary compatibility is a difficult choice for the developers of a programming language. If there are clear rules for the maintenance of compatibility, as does the Chapter 13 of the Java Language Specification, then programmers can expect new code to be compatible with old binaries without recompiling everything.
However, over time, maintaining backward compatibility can make the language difficult to develop. The difficulty to improve the language while maintaining compatibility was one of the reasons for the delays in the release of Java 7 and 8 and the troublesome evolution of JavaScript.
On the other hand, not committing to backward compatibility makes a language much easier to evolve. This can be very important in the early stages of the language, to allow changing of design decisions, fixing bugs and tuning features. The problem is that significant changes, as seen evolution of Scala, can cause incompatibilities and require libraries and applications to be recompiled for the same version of the language. This problem can be mitigated usingbridge methods,migration tools, good documentation and processes, but there is no definitive solution.
For this reason, popular libraries in Scala have different distributions, compiled for different versions of the language. A single version of theBDD library specs, for example, can be found as specs_2.9.0 specs_2.8.1, targeting these respective versions of Scala. Less popular libraries may need to be recompiled using same version of the language used by the application. The build automation tools can help greatly in this task. SBT, for example, can easilycross-compile and generate binaries for several versions of Scala in a single run, as well as reference libraries correctly using both the version of the library and the version of the language.
Conclusions
The advantages and disadvantages of Scala are often stated in opinions influenced by either enthusiasm or disappointment. To properly evaluate the technology, it is important to understand the context and the facts behind such opinions, as well as the relevant design choices.
All code examples presented in this article are available in this Git Hub repository. Sincere thanks to the members of the Scala users list, which maintains a very active community and greatly helped this writing. If you are or have been a Scala developer, contribute with your opinion in the comments!
About the Author
Julio Faerman is a software engineer, developer and teacher, among other labels. Specialized in developing enterprise software with complex requirements and software process improvement, mostly for government and telecom.
Currently working at Red Hat / JBoss, after Borland, NEC and independent clients of his own company. Interested in a wide range of subjects, from algorithmic game theory to music and gardening, but always leaving time for community events and publications.