Scala combines the object-oriented and functional programming paradigms, using a concise syntax that is fully compatible with Java and runs on the JVM. Support for the functional programming style, and in particular Lambda expressions which are not now expected to be added to Java until JavaSE 8 in 2012, can help reduce the amount of boiler plate code that you are required to write, arguably making it easier to focus on the task in hand. This article provides an introduction to Scala.
To get started, just install the latest Scala distribution of the Typesafe stack, open a command prompt and type ‘scala’: this will start the REPL (read-eval print loop) interactive programming environment. You are now ready to enter your first Scala line:
scala> val columbus : Int = 1492
columbus: Int = 1492
We have just declared a variable of type Int with value 1492, just as we would do in Java with " Int columbus = 1492;". The difference here apart from the reverse syntax of putting the type after the variable in Scala is that the "val" explicitly declares the variable as immutable. If we try to modify it:
scala> columbus = 1500 <console>:8: error: reassignment to val columbus = 1500 ^
Notice how the message precisely shows where the error lies in the line. Try declaring the variable again, but this time as "var" to make it mutable. By the way, the compiler is smart enough to know that 1492 is an integer and therefore you do not need to specify the type at all:
scala> var columbus = 1492 columbus: Int = 1492scala> columbus = 1500
columbus: Int = 1500
Moving on, let’s define a class:
scala> case class Employee( name:String="guest", age:Int=30, company:String =
"DevCode" ) defined class Employee
We have defined a class Employee with 3 immutable fields called name, age and company with default values. The "case" word is analogous to the switch statement in Java, though more flexible. It means that the class has an extra mechanism for pattern matching, as well as other things, amongst which is a factory method to create instances (no need to use the "new" keyword to create one). Similarly there is no need to create default getters. Unlike Java the variables are public by default (not protected) and Scala creates a getter for public variables named after the name of the variable itself. You could make fields mutable and/or private if you wanted to by using "var" in front of the parameters (e.g. case class Person(private var name:String)).
Let’s create some instances in different ways to exhibit various features such as named and default arguments (available since Scala 2.8):
scala> val guest = Employee() guest: Employee = Employee(guest,30,DevCode)scala> val guestAge = guest.age // (the default getter for the age variable)
guestAge: Int = 300scala> val anna = Employee("Anna")
anna: Employee = Employee(Anna,30,DevCode)scala> val thomas = Employee("Thomas",41)
thomas: Employee = Employee(Thomas,41,DevCode)scala> val luke = Employee("Luke", company="LucasArt")
luke: Employee = Employee(Luke,30,LucasArt)scala> val yoda = luke.copy("Yoda", age=800)
yoda: Employee = Employee(Yoda,800,LucasArt)
However, the following
scala> val darth = Employee("Darth", "DevCode") <console>:9: error: type mismatch; found : java.lang.String("DevCode") required: Int Error occurred in an application involving default arguments. val darth = Employee("Darth", "DevCode") ^
... does not work (not because Darth is not employed at DevCode!) but because the constructor is expecting the age parameter at this position since the argument is not named explicitly.
Now we’ll move on to Collections, since that is where things are really becoming exciting.
With Generics (Java 5 onwards), Java can for example iterate over a List of Integers by writing the following:
List<Integer> numbers = new arrayList<Integer>(); numbers.add(1); numbers.add(2); numbers.add(3); for(Integer n:numbers) { System.out.println("Number "+n); }
which produces
Number 1
Number 2
Number 3
Scala collections systematically distinguish between immutable and mutable collections, but encourage immutability by constructing immutable collections by default. They simulate additions, updates or removals by returning new collections from such operations instead of modifying them.
The Scala code equivalent to the previous Java code can be written:
scala> val numbers = List(1,2,3) numbers: List[Int] = List(1, 2, 3)scala> for (n <- numbers) println("Number "+n)
Number 1
Number 2
Number 3
This "for" loop construct is very close to Java's imperative style of programming. Another way to write it in Scala (and many languages on the JVM such as Groovy, JRuby or Jython) involves a more functional style, using lambda expressions (sometimes referred to as closures). In brief Lambdas are just functions that you can pass around as parameters. These functions take parameters as input (in our case the "n" integer) and return as their result the last statement of their body. They are in the form
functionName { input => body }scala> numbers.foreach { n:Int => // just press Enter to continue on the next line
| println("Number "+n)
| }
Number 1
Number 2
Number 3
In that case the body only has one statement (println...) and therefore returns Unit i.e. an "empty result" roughly equivalent to void in Java except "void" does not return anything.
Instead of just printing out our list of numbers, let’s say we want to manipulate and transform the elements; in that case we want to invoke methods that will produce a resulting list that we can reuse later on. Let’s try some examples:
scala> val reversedList = numbers.reverse reversedList: List[Int] = List(3, 2, 1)scala> val numbersLessThan3 = numbers.filter { n => n < 3 }
numbersLessThan3: List[Int] = List(1, 2)scala> val oddNumbers = numbers.filterNot { n => n % 2 == 0 }
oddNumbers: List[Int] = List(1, 3)scala> val higherNumbers = numbers.map { n => n + 10 }
higherNumbers: List[Int] = List(11, 12, 13)
This last transformation "map" is very useful; it applies the closure to each element of the list and its result is a list of the same size containing every transformed element.
One last method we would like to introduce here is the "foldLeft" method, that propagates state from one element to the next. For instance, to sum elements in a list you need to accumulate them and keep track of the intermediate counter from one element to the next:
scala> val sumOfNumbers = numbers.foldLeft(0) { (total,element) => | total + element | } sumOfNumbers: Int = 6
The value 0 given as the first argument to foldLeft is the initial value (which means total=0 when applying the function for the first list element). The notation (total,element) represents a Tuple2 ,which is in Scala a tuple with 2 elements (e.g. to represent 3D space coordinates it is often useful to refer to a Tuple3 (x,y,z) etc...). Note that for summation the Scala API actually provides a "sum" method so the last statement could have been written:
scala> val sumOfNumbers = numbers.sum sumOfNumbers: Int = 6
There are many more of these collection transformation methods that you can check from the scaladoc API . You can also chain these methods (e.g numbers.reverse.filter...) to get more concise code, although it can affect readability.
Finally, a shorter notation equivalent to { n => n + 10 } exists in the form of (_ + 10), which means you do not have to declare the input parameter if it is just implied by the method you’re invoking; in our case "n" is called an anonymous variable because you can call it anything you like such as "x" or "number", so underscore means a blank you need to fill up with each element of your collection. (Groovy reserves the word "it" instead of _ , and Python uses "self").
scala> val higherNumbers = numbers.map(_+10) higherNumbers: List[Int] = List(11, 12, 13)
After basic manipulations on integers we are ready to jump into collection transformations involving more complex objects, for instance using the Employee class we have defined above:
scala> val allEmployees = List(luke,anna,guest,yoda,thomas) allEmployees: List[Employee] = List(Employee(Luke,30,LucasArt),
Employee(Anna,30,DevCode), Employee(guest,30,DevCode),
Employee(Yoda,800,LucasArt), Employee(Thomas,41,DevCode))
From this list of 5 elements we can keep, for instance, DevCode employees by applying a filter which will keep employees for which the anonymous function returns True:
scala> val devcodeEmployees = allEmployees.filter { _.company == "DevCode" } devcodeEmployees: List[Employee] = List(Employee(Anna,30,DevCode),
Employee(guest,30,DevCode), Employee(Thomas,41,DevCode))scala> val oldEmployees = allEmployees.filter(_.age > 100).map(_.name)
oldEmployees: List[String] = List(Yoda)
Imagine the collection of allEmployees we have is the result set we got from an SQL query similar to "SELECT * FROM employees WHERE company = ‘DevCode’ ". Now we can sort the employees by company by transforming our List[Employee] into a Map where the key is the company name, and the value is a list of all employees who belong to that company:
scala> val sortedEmployees = allEmployees.groupBy(_.company) sortedEmployees: scala.collection.immutable.Map[String,List[Employee]] = Map(DevCode -
> List(Employee(Anna,30,DevCode), Employee(guest,30,DevCode),
Employee(Thomas,41,DevCode)), LucasArt -> List(Employee(Luke,30,LucasArt),
Employee(Yoda,800,LucasArt)))
As an example to further process each List stored as a value of this (key->value) hashmap, we could imagine computing the average age of employees for each company.
Practically, that means we have to sum the ‘age’ field for every employee of each list and divide it by the number of employees in that list. Let’s do that first just for DevCode:
scala> devcodeEmployees res4: List[Employee] = List(Employee(Anna,30,DevCode), Employee(guest,30,DevCode),
Employee(Thomas,41,DevCode))scala> val devcodeAges = devcodeEmployees.map(_.age)
devcodeAges: List[Int] = List(30, 30, 41)scala> val devcodeAverageAge = devcodeAges.sum / devcodeAges.size
devcodeAverageAge: Int = 33
Going back to the more general case of our Map (key:String ->value:List[Employee]), we can now aggregate and compute the average age for each company by just writing a couple of lines:
scala> val averageAgeByCompany = sortedEmployees.map{ case(key,value)=> | value(0).copy(name="average",age=(value.map(_.age).sum)/value.size)} averageAgeByCompany: scala.collection.immutable.Iterable[Employee] =
List(Employee(average,33,DevCode), Employee(average,415,LucasArt))
The "case(key,value) " is an example of the very powerful pattern matching mechanism that Scala provides. See the Scala documentation for further explanations.
Now we’re done. What we have just written is a little "Map Reduce" algorithm. Since the aggregation of Employees for each company is totally independent of other companies, this algorithm is very straightforward to parallelize.
Equivalent implementations of the algorithm, in both Java and Scala, are given in the appendix.
References
Appendix
Map Reduce: Java
public class Employee {final String name;
final Integer age;
final String company;public Employee(String name, Integer age, String company) {
this.name = name == null ? "guest" : name;
this.age = age == null ? 30 : age;
this.company = company == null ? "DevCode" : company;
}public String getName() {
return name;
}public int getAge() {
return age;
}public String getCompany() {
return company;
}@Override
public String toString() {
return "Employee [name=" + name + ", age=" + age + ",
company="
+ company + "]";
}
}class Builder {
String name, company;
Integer age;Builder(String name) {
this.name = name;
}Employee build() {
return new Employee(name, age, company);
}Builder age(Integer age) {
this.age = age;
return this;
}Builder company(String company) {
this.company = company;
return this;
}
}import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import com.google.common.base.Function;
import com.google.common.collect.ImmutableListMultimap;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Multimaps;public class MapReduce {
public static final void main(String[] args) {
Employee guest = new Builder("Guest").build();
Employee anna = new Builder("Anna").build();
Employee thomas = new Builder("Thomas").age(41).build();
Employee luke = new
Builder("Luke").company("LucasArt").build();
Employee yoda = new
Builder("Yoda").age(800).company("LucasArt").build();Collection<Employee> employees = new ArrayList<Employee>();
employees.add(guest);
employees.add(anna);
employees.add(thomas);
employees.add(luke);
employees.add(yoda);ImmutableListMultimap<String, Employee>
personsGroupByCompany = Multimaps.index(employees, new Function<Employee,
String>() {
public String apply(Employee person) {
return person.getCompany();
}
});ImmutableSet<String> companyNamesFromMap =
personsGroupByCompany.keySet();List<Employee> averageAgeByCompany = new
ArrayList<Employee>();for(String company: companyNamesFromMap) {
List<Employee> employeesForThisCompany =
personsGroupByCompany.get(company);
int sum = 0;
for(Employee employee:
employeesForThisCompany) {
sum+= employee.getAge();
}
averageAgeByCompany.add(new
Employee("average",sum/employeesForThisCompany.size(),company));
}
System.out.println("Result: "+averageAgeByCompany);
}
}
MapReduce.scala:
case class Employee(name: String = "guest", age: Int = 30, company: String = "DevCode")object MapReduce {
def main(args: Array[String]): Unit = {val guest = Employee()
val anna = Employee("Anna")
val thomas = Employee("Thomas", 41)
val luke = Employee("Luke", company = "LucasArt")
val yoda = luke.copy("Yoda", age = 800)val allEmployees = List(luke, anna, guest, yoda, thomas)
val sortedEmployees = allEmployees.groupBy(_.company)
val averageAgeByCompany = sortedEmployees.map { case (key, value) =>
value(0).copy(name = "average", age = (value.map(_.age).sum) / value.size)
}
println("Result: "+averageAgeByCompany)
}
}
About the Author
Thomas Alexandre is a senior consultant at DevCode, specializing in Java and Scala software development. He is passionate about technology, enthusiastic about sharing knowledge and always looking for ways to code more effectively through the adoption of new open-source software and standards. In addition to 14 years of Java development, he has focused these past few years on emerging languages and web frameworks such as Groovy/Grails and Scala/Lift. Thomas has a PhD in Computer Science from the University of Lille, France, and has spent 2 years as postdoc with Carnegie Mellon University in the fields of Security and E-Commerce.