BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Podcasts Pony Language Designer Sylvan Clebsch on Pony’s Design, Garbage Collection, and Formal Verification

Pony Language Designer Sylvan Clebsch on Pony’s Design, Garbage Collection, and Formal Verification

In this podcast Charles Humble talks to Sylvan Clebsch, who is the designer of the actor-model language Pony programming and now works at Microsoft Research in Cambridge in the Programming Language Principles group.  They talk about the inspirations behind Pony, how the garbage collector avoids stop-the-world pauses, the queuing systems, work scheduler, and formal verification.  

Key Takeaways

  • Pony scales from a Raspberry Pi through a 64 core half terabyte machine to a 4096 core SGI beast
  • An actor has a 256-byte overhead, so creating hundreds of thousands of actors is possible
  • Actors have unbounded queues to prevent deadlock
  • Each actor garbage collects its own heap, so global stop-the-world pauses are not needed
  • Because the type system is data-race free, it’s impossible to have concurrency problems in Pony

Was the distributed aspect the motivation for creating Pony?

  • 1:50 Pony was designed to have the same operational semantics whether it is running in a concurrent or distributed context.
  • 2:00 Pony is useful for concurrent programming, but the distributed programming is still a work in progress.
  • 2:20 For concurrent Pony, making it the same as concurrent Pony turned out to be a performance boost rather than a performance cost.

How do you get the performance wins?

  • 2:40 It came from thinking about garbage collection, as something that can’t involve co-ordination between distributed units.
  • 2:50 A stop-the-world phase across distributed systems would have to use something like two-phase commit which wouldn’t work.
  • 3:10 To solve this, Pony has unidirectional message passing, and actors can independently garbage collect without needing to stop everything.
  • 3:25 It was designed this way to facilitate distributed actors, but it turns out to have a performance advantage because you don’t have a global stop-the-world pause.

But pointers can be passed from one actor to another?

  • 3:40 Yes, in Erlang each actor performs its own garbage collection, but message passing is by copy.
  • 3:50 This results in a lot of data being copied as messages are sent, but I wanted to focus on low-level performance.
  • 4:05 So Pony allows zero-copy message passing between actors, but without involving distributed garbage collection.

So how does that work?

  • 4:10 There was a paper at ICOOOLPS that introduced the idea, and we’re going to have a paper at OOPSLA 2017 that talks about ORCA - Object Reference Counting for Actors
  • 4:30 It says reference counting in the title, but perhaps that’s a misnomer - it doesn’t depend on the shape of the heap at all.
  • 4:45 It’s a protocol that has to do with a reference count that is an over approximation based on the number of times that object has been sent or received.
  • 5:00 Let’s say I allocate an object, which I then send to you. I’m going to increment my reference count for that object as I’ve sent it to you.
  • 5:10 On receipt you increment the foreign reference count.
  • 5:20 Neither of those counts are inside the object; they’re a view specific to the object.
  • 5:30 At that point, you can have as many references to that object in your view; hundreds, thousands, but the reference count remains one.
  • 5:40 The reference counts are only changed when the object is sent or received.
  • 5:50 If you drop your reference to that object, when you do a GC you’ll discover that you have no references left and it can be garbage collected.
  • 6:15 If third parties are involved, then passing a reference to another actor results in the foreign reference count being incremented by a fixed amount, and a message to the original object’s owner to increment its reference count.
  • 7:20 This works because Pony enforces message causality, which means every send and receive has a cause, and the causes have to be processed before they have a visible effect.

Pony also has a formal proof of the type system?

  • 8:45 There’s a paper and a Master’s thesis which covers these subjects.
  • 9:15 The language and the runtime were co-designed.
  • 9:20 The type system in Pony is data-race free - it uses reference capabilities.
  • 9:30 These capabilities allow a type to be annotated as isolated, immutable, opaque, and provide a way of restricting what can be done to an alias of an existing object.
  • 9:50 Because the type system is data-race free, you can never have a data-race in Pony.
  • 10:00 This allows the runtime to ignore a number of cases that would otherwise come up.
  • 10:10 The main beneficiary of this is the garbage collector, but every aspect of the runtime is influenced by the type system.

Do you think formal specifications for languages are inevitable?

  • 10:50 There’s two different things happening: proofs about the programming languages themselves, and proofs about programs written in those languages.
  • 11:05 Pony, Rust, Haskell, Erlang have all had proofs about the language applied to them successfully.
  • 11:20 Proving things about programs using Daphne or F* or Coq/Isabelle is hard, and scaling them out is hard.
  • 11:40 TLA+ is model checking rather than a complete proof; but it still allows for some interesting things.
  • 12:00 IronFleet uses a combination of Daphne and TLA+ to provide interesting proofs about distributed systems.
  • 12:20 What it boils down to is: yes, we need to move towards formal proofs.
  • 12:30 Traditionally this has been true in aeroplanes, medical uses and the like, but it’s starting to affect more and more systems.
  • 12:35 When we ship out new machine learning tools, people’s lives can depend on them - and that’s powerful and transformative and utterly terrifying.
  • 12:45 The other side is that to do that we need better and faster proof systems.
  • 13:00 Like fusion, formal proofs are always a technology of the future.
  • 13:30 In a data-race free type system there are lots of things that you don’t have to prove about your program.
  • 13:40 If you have referential transparency in a data-race free type system, you can consider functions in isolation without having to worry about concurrency.

How does the actor scheduler in Pony work?

  • 14:10 It’s all about queues.
  • 14:15 Pony has two different queue implementations; one is a multi-producer, single-consumer, which is used to send messages to an actor.
  • 14:20 There’s also a single-producer, multi-consumer queue, which is used to implement the schedulers.
  • 14:30 The schedulers are typically bound to cores, either one per physical core or one per logical core, depending on how your program performs under hyper-threading or pipelining.
  • 14:45 Each core is running a single scheduler thread, which has a single-producer, multi-consumer queue.
  • 14:55 These schedulers spawn the initial set of actors and send messages to them.
  • 15:05 When an actor is sent a message, the Pony runtime can determine (at no cost) whether the target actor’s queue was empty or not.
  • 15:15 If an actor’s queue is empty, it’s not running on a queue anywhere, so sending a message to an actor whose queue is empty requires it to be scheduled.
  • 15:25 If an actor’s queue is non-empty, the actor continues to be handled by the scheduler that it is currently running on, to prioritise data locality and core cache usage.
  • 15:50 Scheduling an actor typically happens on the same scheduler as the actor that sent it the message, which means it can take advantage of the data locality sent by the source actor.
  • 16:05 Any given scheduler thread only schedules actors on its own queue.
  • 16:15 What if there are no actors running on the scheduler?
  • 16:20 In that case, it picks another scheduler and steals work in a round-robin mechanism.
  • 31:40 This works great when there is more work to do than scheduler threads.

Why are Pony’s queues unbounded?

  • 17:40 Gulag’s PhD thesis on the actor model has three rules:
    • An actor can create other actors.
    • An actor can send a finite number of messages to itself or other actors.
    • An actor can select what behaviour to do when a message is received.
  • 18:10 The last one implies that an actor can maintain state, because it can choose to do different things based on the set of previously received messages.
  • 18:25 The first one implies that there is a dynamic topology, which means it can evolve over time.
  • 18:35 The middle one - sending a finite number of messages to others - having messages sent with guaranteed delivery is an important part of the system, although no constraints on when or how such messages are received.
  • 19:00 The problem comes if you’re using a bounded queue and you’re sending a message to yourself - if your queue is full, you become deadlocked.
  • 19:40 To avoid this, you need to have an unbounded queue for messages that you can send yourself.
  • 20:10 As a result, Pony has unbounded queues.
  • 20:20 Since Pony’s actors only need 256 bytes, we don’t want to have a bounded queue to add to that overhead.
  • 20:30 An unbounded queue can be dynamically provisioned based on the set of messages sent, but not if they are not needed.

How do you deal with backpressure?

  • 21:15 We have some implementations that seem to apply the backpressure that we wanted, but at too high a cost (although less than bounded queues).
  • 21:20 We’ve had some implementation which have had much lower costs but where the backpressure effect becomes effectively useless.
  • 21:30 In the meantime we’ve applied a muting and unmuting function to network input actors.
  • 21:45 This is because server-side Pony systems tend to be event driven based on data coming in via a TCP or UDP connection.
  • 22:00 It’s OK with UDP, because we can drop messages, but with TCP we need to find a way of applying backpressure to the caller.
  • 22:10 The muting and unmuting feature can be manually applied to give backpressure.
  • 22:45 We don’t like the model as it currently stands; we want to have generalised backpressure.

How have the ideas worked out in practice?

  • 23:30 We have run Pony on a 4096 core SGI box at Imperial, but because the workloads are designed to be HPC workloads means that we weren’t able to test it for long.
  • 23:50 It seems to perform extremely well on 64 core boxes with half a terabyte of NUMA memory.
  • 24:30 We’ve also run it on a Raspberry Pi as well, so it scales to both ends of the spectrum.
  • 24:40 As long as the program isn’t written to require co-ordination, we’re seeing linear performance based on number of cores.

Is this because you can pass mutable data between actors?

  • 25:40 Well, using immutable data is preferred and should be done where possible.
  • 25:10 Yes, you can create some data in one actor, then hand it over to the next actor for mutation, who can then pass it onto the next actor, and so on.
  • 25:30 There’s a number of emergent patterns we’re seeing, like being able to pass a lambda to an actor who can then run it over its local state, and then send it over to another actor.

Does the kind of algorithms you implement in an actor model affect performance?

  • 26:00 The most difficult thing is to avoid having one giant hashtable in an actor model.
  • 26:25 Otherwise you end up with a singleton bottleneck actor, running on a single core, that provides access to that hashtable.
  • 26:45 Building a system to deal with event stream processing is the way to go with actors.
  • 27:00 Instead of saying that you have data and want to run queries over it, turn it over and say you have queries and you want to flow data over it.
  • 27:15 This removes the bottlenecks and you can segment things to drive the linear performance of your application.

What inspired the creation of Pony?

  • 28:00 I used to write flight simulator software; we were doing realtime finite element analysis and hacky fluid dynamics to simulate flight.
  • 28:30 That turned out to need (distributed) event stream processing, although I didn’t know that at the time.
  • 28:40 You wanted time-delayed physical model feedback - that’s asynchronous message passing.
  • 28:50 So it started by wanting a programming language that would make writing my flight simulator easier.
  • 28:55 That style of asynchronous message passing is a staggeringly good fit for financial systems.

Are there places where you wouldn’t use Pony?

  • 29:20Absolutely - Pony isn’t a scripting language, for example. If you have something that you want to write quickly, there are many other languages that you can use.
  • 29:30 It’s also not (yet) good for client side programming. No-one in the community has produced a windowing system or bindings.
  • 29:40 If you have a problem that is in no way parallel, you’re not going to be able to take advantage of actors.
  • 30:30 (My opinion is that there are relatively few problems that aren’t concurrent.)

Is Pony an opinionated language?

  • 31:05 Pony makes a good show of being opinionated, but it’s not enforced.
  • 31:20 We have an RFC process, copied from Rust, so everybody gets to contribute to the evolution of the language.
  • 31:35 I didn’t think we’d have heavy usage of persistent data structures, but Theo Butler wrote a really nice persistent data structure package for Pony, and it turns out we now use it all over the place.
  • 31:50 There’s a tendency to say that the language shouldn’t offer two ways of doing things, but it’s OK for two packages to offer two ways of doing things.
  • 32:00 This means we’re a little slow to move things into the standard library; we would love to have a good package management system and discovery.
  • 32:25 When we have competition between packages we can see the different results of the APIs - including parser generator packages - but it’s great to see different approaches before picking one for the standard language.
  • 33:00 One of the things that we’re opinionated about is structural (duck) typing; our type system is as constraining as Go or Python.
  • 33:25 We also have a strong nominal type system using traits as well, so you can use the approach that suits the particular need.

Resources

Companies Mentioned

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption
Style

BT