DoorDash introduces an ML model to predict the operational status of a store in order to increase the user experience and save thousands of order cancellations. Understanding the merchant’s operational status and the ability to receive and fulfill orders is crucial for the DoorDash platform. Each marketplace on DoorDash works independently, and the information on working hours is important to inform the customer that their order can’t be delivered and to avoid other orders being submitted to a closed marketplace.
When a Dasher finds a marketplace closed, through the ‘Dasher Report Store Close’ (DSRC) on the Dasher App, the Dasher can report a closed marketplace. The process starts with a submission of a photo of the closed marketplace, and then they are refunded for the loss of time and can continue with another delivery. When a DSRC report is received, DoorDash contacts the merchant to understand its status in order to update it on the platform. This is a long, unscalable, and inefficient process. For these reasons, DoorDash has developed an ML model that can understand the status of the merchant based on some variables, including the time of the last delivery and the analysis of the photo uploaded by the Dasher.
The first attempt to create a model for inferring the operational status of a store was to calculate the conditional probability of the marketplace being closed when the DSRC report is filled:
Probability(Store is Closed | DRSC report)
The store status variable was constructed based on the status of the past DSRC reports: for example, if a dasher has completed a pick-up within 15 minutes, the store is probably open despite the DSRC. In addition, the new ML model uses the image uploaded by the dasher. The image classifier, trained internally using the DoorDash image processing platform, analyzes the uploaded images to understand if the store is closed (analyzing if the lights are off, for example). The classifier compresses the image information in a single number that can be used as a feature in the new ML model; this allows the DoorDash platform to process and use hundreds of thousands of images quickly.
A LightGBM model combines the historical data and the image data to find the final probability that the store is closed. The decision on the store status is taken based on three thresholds defined on this probability: low probability of a store being closed leads to unassigning the order and finding a new Dasher to deliver the order, intermediate probabilities would lead DoorDash to cancel the order, high probabilities would lead the platform to both cancel the order and pause the store.
DRSC ML model inputs and output action-set
After the deployment of the ML model in production and the AB testing, DoorDash can confirm the user experience is better and thousands of deliveries are saved from being canceled every week. The next steps are to make the decision thresholds dynamic by incorporating other features like time of the day, future volume stores, and future volume cancellations. Another improvement tends to develop a loss function to understand the cost of each decision about the store. Thanks to the loss function and the probability model, DoorDash will compute the expected loss and find actions that minimize it.