Researchers at Google subsidiary DeepMind and the Swiss Plasma Center at EPFL have developed a deep reinforcement learning (RL) AI that creates control algorithms for tokamak devices used in nuclear fusion research. The system learned control policies while interacting with a simulator, and when used to control a real device was able to achieve novel plasma configurations.
The system and a set of experiments were described in a paper published in Nature. Using a simulation environment, the team used a combination of RL and an actor-critic model to train a feed-forward neural network that controls the magnetic coils of a tokamak device; the controller's objective is to maintain the shape of a high-energy plasma contained in the device. While typical tokamak controllers require expert engineering work to design, the AI system can automatically produce a controller given a set of desired plasma properties. According to DeepMind:
This is a promising new direction for plasma controller design, with the potential to accelerate fusion science, explore new configurations and aid in future tokamak development.
Fusion reactions are a potential source of cheap, clean energy, and scientists seeking ways to control those reactions often study plasma contained in a tokamak, a large toroidal device surrounded by magnetic coils which produce variable magnetic fields that contain the plasma. Because the plasma in a tokamak is unstable, designing a control system for the coils is a complex process which must be repeated whenever the desired configuration of the plasma is changed.
Image source: https://www.nature.com/articles/s41586-021-04301-9/figures/1
The DeepMind team designed a three-step process to use AI for designing tokamak controllers. First, an experiment designer provides a set of objective values for various properties of the plasma, such as current, position, and elongation. These objectives are translated into a reward function used for RL training. The RL algorithm interacts with a simulated tokamak environment, and attempts to find an optimal control policy for that reward function.
Because the simulation environment runs much slower than a typical RL environment, DeepMind optimized the controller policies using an actor-critic algorithm called maximum a posteriori policy optimization (MPO). In this scheme, the critic learns to predict a future reward for an actor's actions, and the actor uses those predictions to choose a policy. Since the actor must eventually run in real-time to control the physical tokamak device, the team used a lightweight feed-forward network. However, the critic is not so constrained, and there the researchers used a larger recurrent neural network (RNN), which could model complex time-based dynamics of the tokamak.
The trained actor model was then compiled into an executable for controlling the physical tokamak. The researchers ran a series of typical plasma experiments, in which the controller guided the plasma through desired current, shape, and position values. In all experiments, the controller maintained the desired values within acceptable tolerance. The controller also was able to manage novel plasma configurations; according to the DeepMind team, it is "probably possible" that these could be achieved by existing control approaches, but it would require "great investment" in designing and tuning such a controller.
DeepMind scientist David Pfau, a member of the research team, answered several questions about the work on Twitter. When one user asked if the main benefit of the work was reducing the time and expense of controller design, Pfau replied:
That's a good way of thinking about it. You can try out more things more easily because you don't need a control engineer re-doing all their work from scratch each time you try out a new configuration.
DeepMind open-sourced a portion of their tokamak controller RL training code, which is available on GitHub.