The Allen Institute for AI (AI2) has announced the 2022 version of their AI2-THOR Rearrangement Challenge. The challenge requires competitors to design an autonomous agent that can move objects in a virtual room and includes several improvements including a new dataset and faster training using the latest release of the AI2-THOR simulation platform.
The challenge will be hosted at the upcoming Conference on Computer Vision and Pattern Recognition (CVPR) as part of a workshop on Embodied AI research. The challenge consists of two phases. In the first, "walkthrough," the autonomous agent moves around its environment, observing the state of objects. In the second, "unshuffle," objects in the environment are randomly rearranged and the agent must identify which objects have changed and then restore them to their original state. According to the AI2 team,
We hope that the task specification, evaluation protocol, and broader discussion [from the Challenge] will support healthy development of Embodied AI and the creation of intelligent systems that perceive, act, and accomplish increasingly long-term goals in complex physical environments.
Several AI experts have argued that true AI can only be achieved by an embodied machine that interacts with the physical world, and researchers at AI2 have claimed that "representations learned via interaction with the world are more powerful" than those learned from training on a static dataset. Most such research focuses on virtual agents that interact with a simulated environment, taking advantage of the availability of physics-based computer game engines and high-quality image rendering.
AI2-THOR is one such framework for Embodied AI research that is built on the Unity3D game engine. It provides simulated environments based on hundreds of room types, such as kitchens or bedrooms, and thousands of actionable objects. The latest release of AI2-THOR includes a "headless" mode that allows for training agents using clusters of GPUs, significantly reducing training time. Experiments done by AI2 researchers show that performance scales linearly with the size of the cluster. In one case, training time was reduced from 3.5 days using 4 GPUs to 10 hours using 32 GPUs.
The Rearrangement Challenge was first announced in 2020 as an incentive to "align and accelerate research in Embodied AI." The latest version of the Challenge includes an updated dataset which has "a more uniform balance of easy/hard episodes." Competitors evaluate their trained agent models on the dataset and submit their metrics to the competition leaderboard. AI2 has also trained several baseline models for the Challenge, using their AllenAct learning framework, which is included with the Challenge code.
Besides the AI2-THOR Rearrangement Challenge, the CVPR Embodied AI workshop features 12 other challenges, including three others based on the AI2-THOR framework. There are four challenges based on Meta AI's Habitat framework and two each for NVIDIA's Isaac Sim and Stanford University's iGibson. The workshop also includes the MIT ThreeDWorld Transport Challenge which InfoQ covered last year. The workshop's organizers include researchers from a wide range of universities and commercial enterprises, including AI2, Meta, Google, Intel, NVIDIA, Stanford University, and Georgia Tech University.
In a thread on Twitter, AI2 Research Manager Roozbeh Mottaghi noted:
Surprisingly, the leading method is a simple model that uses the CLIP encoder. It outperforms other methods that use maps, depth images, etc. Looking forward to more innovations.
The AI2-THOR code, as well as the code and several pre-trained baseline models for the Rearrangement Challenge, are available on GitHub.