Google's Ground Truth team recently announced a new deep learning model for the automatic extraction of information from geo-located image files to improve Google Maps. This neural network model achieved a higher accuracy in processing the challenging French Street Name Signs (FSNS) dataset. Julian Ibarz (Google Brain Team) and Sujoy Banerjee (Ground Truth Team) wrote on Google Research Blog website about this TensorFlow model used for solving real-world image text extraction problems.
Google Maps software is used for directions, real-time traffic information and information on businesses, however to provide a better experience to its over one billion users, the information has to reflect the changing world. Street View cars have collected 80 billion images to date and it's impossible to manually analyze this very large image data set to find new or updated information for Google Maps. So one of the goals for the team is to automatically extract structured information from the geo-located images.
The new deep neural network model, now publicly available for use by developers, achieved a higher deep neural network (84.2%) in reading street names out of Street View images from the French Street Name Signs (FSNS) dataset. This model is extensible to extract other types of information out of Street View images like the business names from store fronts.
Text recognition in a natural environment like cities, roads and businesses is a challenging computer vision (CV) and machine learning problem. Factors like distortion, occlusions, directional blur, cluttered background or different viewpoints make the extraction of text from natural scenes more challenging. The Google team used a neural network based model back in 2008 to blur faces and license plates in Street View images to protect the privacy of their users. Based on this research, they have been able to use machine learning to automatically improve Google Maps with relevant up-to-date information.
The deep learning model also automatically labels new Street View imagery, normalizes the text to be consistent with the naming conventions and ignores extraneous text that's not relevant for the data analytics. This allows the team to create new addresses directly from images without even knowing the name of the street or the location of the addresses. For example, when a Street View car drives on a newly built road, the model can analyze the captured images, extract the street names and numbers, and properly create and locate the new addresses automatically on Google Maps.
To apply these models across the large Street View image datasets, the Ground Truth team uses the machine learning chip Tensor Processing Unit (TPU) to reduce the computational cost of the inferences of the pipeline.