A team of researchers from MIT and the MIT-IBM Watson AI Lab have announced the ThreeDWorld Transport Challenge, a benchmark task for embodied AI agents. The challenge is to improve research on AI agents that can control a simulated mobile robot that is guided by computer vision to pick up objects and move them to new locations.
The challenge and a set of preliminary experiments were described in a paper published on arXiv. The challenge testbed is built on the ThreeDWorld platform, a physics-based virtual-world simulation engine that produces realistic image rendering and physical interactions with objects. Agents competing in the challenge will be placed in a simulated house that contains many objects. The agent controls a simulated mobile robot which has two 9-DOF articulated arms and must plan a sequence of actions for collecting several target objects and transporting them to a desired location, using only computer vision (CV) for guidance, object detection, and collision avoidance. The team developed several benchmark agent models based on different algorithms, including machine learning and planning algorithms. The team noted from these experiments that a pure reinforcement learning (RL) agent "struggles" in the challenge, while hierarchical planning models are "still far from solving this task."
Several AI experts, for example, iRobot founder Rodney Brooks, have argued that true AI can only be achieved by an "embodied" machine that interacts with the physical world. However, such a machine adds a layer of expense and complexity to the problem, and in the case of autonomous vehicle research, serious safety concerns. Furthermore, experiments in the physical world must be run in "real time," while purely virtual experiments can be run much faster, and many experiments can be run in parallel.
Thus many AI researchers focus only on virtual agents that interact with a simulated environment. In recent years, researchers have developed several simulation platforms for embodied AI experiments, taking advantage of the availability of physics-based computer game engines and high-quality image rendering. Many have chosen to build on the Unity3D game engine; for example, AI2-THOR created by the Allen Institute for AI (AI2), or the VirtualHome environment created by a team from MIT and the University of Toronto.
ThreeDWorld is also built on Unity3D. Introduced and open-sourced in 2020, it features photorealistic video and rigid-body physics with fast and accurate collision simulation, as well as soft-body and fluid physics. By default, ThreeDWorld agents use a simulated robot called Magnebot. Magnebot has a four-wheeled base and two 9-degree-of-freedom arms. Instead of hands, Magnebot's end-effectors are magnets which can attach to the virtual objects in the world. This simplifies pickup tasks by removing the problem of grasping.
For the new ThreeDWorld Transport Challenge, the team developed a dataset consisting of 15 distinct simulated home environments, each with 6 to 8 interconnected rooms containing furniture and other items. The team also developed a high-level motion API for controlling the virtual Magnebot. In the Challenge, an agent controls a Magnebot that is spawned in a random location in a simulated house and must transport a set of objects to a goal location. The agent must transport as many objects as possible within a time limit. Because the Magnebot can only carry two objects at a time, the Challenge also features containers that the agent may use to transport several objects at a time. The agent must use computer vision to explore the house, locate objects, and avoid collisions.
To provide baseline results for the challenge, the researchers implemented several agent models, training them on 10 of the 15 virtual homes, while holding out 5 for testing. The evaluation metric for the experiments was the transport rate: the fraction of objects successfully transported to the target within the time limit. Of the models evaluated, none could successfully transport all target objects to the goal; the researchers believe this shows that the task is "very challenging and could be used as a benchmark to track the progress of embodied AI in physically realistic scenes."
Other research teams have posed similar challenges. At the 2020 Conference on Computer Vision and Pattern Recognition, the AI2 held its RoboThOR Challenge. More recently, Facebook updated its Habitat Challenge, which includes two embodied navigation tasks for its Habitat simulation platform. This is the third year for the Habitat Challenge, which is also held in conjunction with CVPR.
The ThreeDWorld platform and ThreeDWorld Transport Challenge code are available on GitHub.