Facebook AI researchers open-sourced CraftAssist, a framework for building interactive assistant bots for the Minecraft video game. The bots use natural language understanding (NLU) to parse and execute text commands from human players, such as requests to build houses in the game world. The framework's modular structure can be extended by researchers to perform their own ML experiments.
The research team gave an overview of the system in a recent blog post. CraftAssist bots connect to the game using the same protocol as the standard game client, and so are able to do anything a human player can. The bots interact with other players using Minecraft's built-in text-based chat interface. Humans can give the bots commands, including high-level instructions such as "build a house next to the blue cube." The goal of the release is to help improve human-AI collaboration:
The platform is intended to support the study of agents that are fun to interact with and useful for a wide variety of tasks specified and evaluated by human participants. To encourage the wider AI research community to use the CraftAssist platform for their own experiments, we are open-sourcing the framework, as well as a baseline assistant and the tools and data we used to build it.
Robot control systems, broadly speaking, are composed of perception and action-selection sub-systems. Perception is the conversion of raw sensor data into a more abstract representation; for example, image recognition is a perception task that converts image pixels into a text label describing the image contents. When trained on datasets that contain many examples of sensor input paired with the desired output, modern deep-learning models can achieve near human-level performance on many vision and NLU tasks.
Action-selection is the process by which a robot "decides" how to interact with the world to achieve some goal; for example, which moves to make in order to win a game of Go. Many successful systems use reinforcement learning (RL), where the bot repeatedly attempts a task, with each attempt given a numeric reward outcome. Games are a common testbed for RL, as they have well-defined sets of actions and outcomes, and modern RL-trained bots can frequently outperform top human ability in many different games. Some research groups, for example Google's DeepMind, combine the perception and action-selection subsystems into a single "end-to-end" system trained by deep-reinforcement learning, and there are many virtual environments for training these systems, including a simulated habitat developed by Facebook. Microsoft has open-sourced an "AI-gym" interface for Minecraft called Project Malmo, as well as a large dataset, to encourage the use of Minecraft as a testbed for RL research.
CraftAssist does not use end-to-end learning. Instead, Facebook opted for a more "engineered," modular approach, building explicit perception and action-selection modules. As one of the team members stated in a thread on Reddit, in contrast to RL efforts such as Project Malmo that are "focusing more on learning things like navigation and sensorimotor control," CraftAssist's focus is on facilitating human/robot interaction via natural language. Furthermore, the team says in a paper published on arXiv:
Instead of superhuman performance on a single difficult task, we are interested in competency across a large number of simpler tasks, specified (perhaps poorly) by humans.
CraftAssist does include several ML-trained components. The bot parses chat dialog using a neural semantic parser that is built on a GRU and attention model. The bot also has a perception module that uses deep-learning to label different components of a building, such as "wall" or "floor." The research team hopes that that modular structure will encourage others to "plug in" their own modules that are built with ML.
In addition to the CraftAssist source code, Facebook has released several related datasets. This includes the dialog data used to train the parser and a crowd-sourced dataset of different houses built in-game by players, which could be used to train a bot to build similar structures. The code and datasets are available on GitHub.