Wayve, a company focused on deep-learning AI tech, released a state-of-the-art end-to-end model for learning a world model and vehicular driving policy based on simulation data from CARLA, allowing autonomy to cars without HD maps.
The way we interact with the world is through observation and interaction, allowing us to accumulate knowledge and deal with unpredictable situations. We call this awareness of how the world works "common sense" and it allows us to navigate our way. Our observation of others also allows us to learn and follow the rules. A similar concept in machine learning is a method called imitation learning, which allows models to learn to mimic human behavior on a given task.
Wayve’s new Model-based Imitation LEarning (MILE) is a machine-learning model, more specifically a reinforcement learning architecture, that learns a model of the world and a driving policy during offline training.
MILE can imagine and visualize diverse and plausible futures and use this ability to plan its future actions.
Autonomous driving’s dynamic agents and static environment reason in 3D geometry, so MILE converts the car’s captured images to 3D, using a depth probability distribution for each image feature together with a predefined grid of depth bins, camera intrinsics and extrinsics. These 3D feature voxels are converted to bird-eye-view through an operation called sum-pool using a predefined grid. The final step is mapping to a 1D vector to compress info about the world model. This is part of the process that defines the encoder.
The next part of observation evolves a decoder very similar to what happens in StyleGAN architecture. It is an upsampling method for different resolutions applied to encoder output, bird-eye-view and image latent vectors. In addition the decoder also outputs vehicle control.
For time modeling, MILE uses a recurrent neural network that models latent state dynamics, predicting the next latent state based on the previous one.
The model can imagine future latent states based on past context and use them to plan and predict actions using the learned driving policy. Future states can also be visualized and interpreted through the decoders.
Source: Model-Based Imitation Learning for Urban Driving
The training dataset source for the MILE project is 2.9-million frames or 32 hours of driving data from the CARLA simulator in different weather and day conditions.
For measuring the driving performance on CARLA, Wayve used three metrics: route completion, infraction penalty and driving score. Route completion entails, for a given scenario, the percentage of route completed by the driving agent. Infraction penalty is a multiplicative penalty due to various infractions from the agent (collision with pedestrians/vehicles/static objects, running red lights etc.). Driving score measures both how far the agent drives on the given route and how well it drives.
Source: Model-Based Imitation Learning for Urban Driving
MILE achieves higher generalization and a better driving score compared to other frameworks such LAV, Roach and Transfuser.
Source: Model-Based Imitation Learning for Urban Driving
The ability of MILE to imagine plausible futures and plan actions accordingly allows the model to control the vehicle in imagination.This means that the model can successfully control the vehicle without having access to the most recent observations of the world.
For downloading the model weights and checking out the Pytorch implementation go here.
One of the framework limitations is the manual rewarding function instead of being inferred from expert driver data. This would allow the agent to navigate in the world model. A second important potential issue is relying a lot on the bird-eye-view image segmentation for predicting future states. A third potential improvement is model generalization for different scenarios.
People are talking about Wayve feat on Twitter: