In a recent paper, researchers at Microsoft Autonomous Systems and Robotics Group showed how OpenAI's ChatGPT can be used for robotics applications, including how to design prompts and how to direct ChatGPT to use specific robotic libraries to program the task at hand.
As Microsoft's engineers explain, current robotics relies on a tight feedback loop between the robot and an engineer who is responsible to code the task, observe the robot's behaviour, and correct it by writing additional code.
In Microsoft vision, ChatGPT could be used to translate a human-language description of the task to accomplish into code for the robot. This would make it possible to replace the engineer (in the loop) with a non-technical user (on the loop) only responsible to provide the original task description in human language, observe the robot, and provide any feedback about the robot's behaviour, again in human language, which ChatGPT would also turn into code to improve the behaviour.
Using their experimental approach, Microsoft researchers created a number of case studies which include zero-shot task planning to instruct a drone to inspect the content of a shelf; manipulating objects through a robotic arm; searching for a specific object in an environment using object detection and object distance APIs; and others.
In all those cases, ChatGPT was able to generate the code to control the robot as well as to ask for clarifications to better carry the task through when it found user input ambiguous, say Microsoft.
Microsoft's work to make ChatGPT usable for robotic applications focused on three main areas of investigation: how to design prompts used to guide ChatGPT, using APIs and creating new high-level APIs, and how to provide human feedback through text. Those three areas represents the keystones of a methodology to use ChatGPT for robotic tasks.
In a first step, the user defines a set of high-level APIs or function libraries that ChatGPT should use.
This library can be specific to a particular robot, and should map to existing low-level implementations from the robot's control stack or a perception library. It's very important to use descriptive names for the high-level APIs so ChatGPT can reason about their behaviors.
In the second step, the user provides a description of the task goal specified in terms of the available APIs or functions.
The prompt can also contain information about task constraints, or how ChatGPT should form its answers (specific coding language, using auxiliary parsing elements).
Finally, the user evaluates ChatGPT's code, by using a simulator or inspecting the code, and provides feedback for ChatGPT to correct its code.
When the outcome is satisfactory to the user, a robot can be programmed using the generated code.
Microsoft is also launching a collaborative open-source platform for users to share prompting strategies for different robot categories, which at the moment includes all the prompts and conversations that the Microsoft team used for their research. Additionally, they also plan to add robotics simulators and interfaces to test ChatGPT-generated algorithms.