BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Ruby Concurrency, Actors, and Rubinius - Interview with MenTaLguY

Ruby Concurrency, Actors, and Rubinius - Interview with MenTaLguY

This item in japanese

With Ruby 1.9 adding Fibers (Coroutines) and the recent popularity of Erlang and Actors, a group of little known concepts entered the Ruby programming world. To get an overview of what's available in the Ruby space when it comes to concurrency, we talked to long-time Ruby community member MenTaLguY. He's been involved in Ruby concurrency and threading for a long time, such as with the fastthread library which improves threading with 1.8.x MRI. Recently he's been dabbling in Rubinius and is also a member of the JRuby team.

InfoQ: What is your Actors library for Ruby about?

MenTaLguY: I've written two (released) Actors libraries, actually. One is in the Omnibus concurrency library, and one is part of the Rubinius stdlib. Both are Ruby implementations of the Actor Model, the concurrency model popularized by Erlang. Concurrency, in the sense of running code in parallel, isn't actually that hard to do. The big problems arise when different threads of control need to share a resource or otherwise communicate. If you don't pick some simple formal model for structuring that communication, it's nearly impossible to write code that is correct or even necessarily meaningful, even if it superficially appears to "work".

"Actors" are one such model. An actor is made of a mailbox and a thread. At its discretion, an actor-thread can wait for specific sorts of messages to show up in the mailbox and then "act" on whichever one it gets, possibly sending messages to other actors. In this way, driven by the voluntary and explicit exchange of messages, threads can communicate in a way that is relatively easy to reason about.

InfoQ: How does it relate to Ruby's threading system or Ruby's new Fibers/Coroutines?

MenTaLguY: My actors libraries simply associate a mailbox with every Ruby thread to make an actor of every one, but that isn't the only way to approach actors in Ruby. Fibers are simply cooperatively-scheduled tasks within a single thread, and you can base actors on those, too, as Tony Arcieri did in his Revactor library. His approach has some advantages since Fibers are lighter-weight than full threads and you don't need to worry about preemption. On the other hand, sometimes you do need full threads (and sometimes the Ruby stdlib forces them on you).

Tony and I have been having some vigorous and productive design discussions; the plan is to make our actor implementations as compatible as possible, and to provide a simple object protocol that any other actor implementation can use to play along too. From the outside, actors can all duck-type the same -- you're always just submitting messages to a mailbox after all. For the most part it shouldn't matter whether it's a thread or a fiber or something running in another Ruby VM that will be picking them up. In principle an actor-duck could even be backed by an Erlang process somewhere (e.g. via Scott Fleckenstein's Erlectricity).

InfoQ: I saw some of your recent commits to the Rubinius repository, such as this commit dealing with Actors. Are Actors used in Rubinius?

MenTaLguY: Not as part of its implementation (which is one of the reasons I moved them from core to stdlib). I don't think they're needed at that level.

InfoQ: Any chance Actors or their mailbox implementation are (or could be) used in the message passing Multi-VM IPC that Evan recently added to Rubinius?

MenTaLguY: Actors aren't used to implement the MVM IPC mechanism, but we would like to use MVM IPC behind the scenes to allow actors in different VMs to communicate with one another.

InfoQ: What's the current state of Rubinius threading? What's used - userspace threads/kernel threads, some m:n mix of both?

MenTaLguY: We use userspace threads within a VM, but each VM runs in a separate kernel thread. Right now, if you want to use all of your CPUs, spawn a VM or two for each one. Evan wants to do m:n threading within a VM eventually, but Ruby presents a lot of technical hurdles to clear on that front. Even Ruby 1.9 hobbles its native threads so that they are effectively userspace threads.

It could be that only the Ruby implementations originally built atop native-threaded runtimes (XRuby, JRuby, IronRuby) ever have full support for native threads. Native threads might not end up being so important if MVM can be made lightweight enough or communication between CPUs becomes expensive enough (the world gets more NUMA by the day).

However, there is one case where native threads are impossible to escape: poorly-designed IO APIs that don't support asynchronous operations. Perversely, in those cases you need multiple native threads, not to utilize multiple cores, but to sit there and do nothing. Sometimes, you have no choice but to delegate a victim thread to wait for a blocking call to finish, while the rest of your code gets on with its life.

Hopefully we will see less of such APIs in the future. Tony's Revactor library offers one ray of hope on that front: it brings actors to bear on IO, so that the execution of your code can be driven by IO events rather than waiting uselessly in blocking calls or suffering inversion of control and turning into a giant explicit state machine. At the moment it is tied to MRI 1.9, but hopefully we can port it or get something similar on other Ruby implementations as well.

InfoQ: Rubinius seems to be shipped with a wealth of concurrency ideas and tools - threads, actors, multivm + message passing IPC, etc.

MenTaLguY: At this point in history, concurrency is a particularly important concern, and I think Rubinius reflects that.

InfoQ: One of these tools I noticed are Channels - what role do they play in Rubinius? (I noticed that the fast debugger uses Channels to notify the debugger thread, etc).

MenTaLguY: Channels are the basic communication primitive in Rubinius; everything else is implemented atop them. The fundamental concurrency model is more or less the asynchronous pi calculus minus replication and a few common extensions like non-deterministic choice (which potentially requires channel operations to be centrally arbitrated). I advocated pi calculus channels because of their simplicity, which generally translates to performance and maintainability.

Now, the pi calculus is good to use directly for "local" (intra-VM) things, but it is not so good for implementing distributed ones because in the pi calculus both ends of a channel are mobile. When writes are asynchronous, you can kind of fire and forget over the horizon, but reading from a channel is a synchronous operation, and if a channel gets into the hands of multiple readers those readers must rendezvous somewhere to do their reads. Not a good thing if they're particularly distant from one another!

That is one reason why I'm interested in actors for larger-scale things. Actor mailboxes are a bit like channels in the asynchronous calculus, except that only the (asynchronous) write end is independently mobile; the read end is firmly tied to a specific local agent and no special "long-distance" coordination is required.

InfoQ: Ruby 1.9 added Fibers/Coroutines - how would they be implemented in Rubinius?

MenTaLguY: I think Fibers could be implemented atop Rubinius Tasks without too much trouble; Fibers and Tasks are fairly similar.

InfoQ: Do you have any views/opinions about Fibers/Coroutines? Would you use them - and for what?

MenTaLguY: I think they are a fine alternative to explicit state machines, particularly as coroutines give you more freedom when using library code. However, a state machine can still be better if it is small enough or in situations where it would be appropriate to use something like Ragel to generate it for you.

InfoQ: Is it correct that you now have commit rights for JRuby? What are your interests in JRuby or what are you working on there?

MenTaLguY: Yes. My principal interest is concurrency: the combination of Ruby and native threads presents some interesting challenges. So I've been fixing concurrency bugs and working out what kind of concurrency guarantees we should offer in general. I would also eventually like to properly expose the Java concurrency facilities to Ruby, hopefully in a portable way (this is partly the mission of the Omnibus Concurrency library).

InfoQ: What other project are you involved in?

MenTaLguY:Aside from occasionally contributing patches to Shoes, I'm working on quite a few libraries behind the scenes, most of which will be announced/released when they're ready. However, I can point out one which I released recently, although it's not been formally announced yet: the 'case' gem, which gives you pattern matching for arrays, structs, and arbitrary predicates with the case-match operator in Ruby.

 require 'rubygems'
require 'case'

Foo = Case::Struct.new :a, :b


def example(arg)
case arg
when Foo[:blarg, Object] # matches any Foo with .a == :blarg
# ...
when Foo[10, 20] # matches only a Foo with .a == 10 and .b == 20
# ...
when Foo # matches any Foo
# ...
when Case::Any[String, Array] # matches either a String or Array
# ...
# matches a three-element array with initial elements 1, 2:
when Case[1, 2, Object]
# ...
# matches any Integer > 10:
when Case::All[Integer, Case.guard { |n| n > 10 }]
# ...
end
end

Both Tony and I use the case-match operator (===) in our actor libraries to select the sorts of messages to wait for, so this gem can be quite useful there.

For more from MenTaLguY, check out blog at http://moonbase.rydia.net/ or watch one of the projects he's involved in. For more on Actors read the recent interview with Tony Arcieri, developer of Revactor. Revactor is an application framework for high performance network applications. It targets Ruby 1.9 and makes use of features such as Fibers for Concurrency. For more on Rubinius see InfoQ's Rubinius coverage.

Rate this Article

Adoption
Style

BT