Key Takeaways
- Deep learning models can be very complex and understanding their internals can be challenging
- There are several methods of providing interpretability in machine learning
- To ensure the reliability of these automated systems, interpretability tools can be used to provide insight into model decision making
- Model-agnostic interpretability tools that are modular across models and can be used to share model results with others to provide transparency and empower users to have confidence in their models
- There is no single "right" way to interpret a deep learning model
Deep learning models can be very complex and understanding their internals can be challenging. Users will trust these models more if they understand how the models are arriving at their decisions. Interpretability is a growing field of research aimed at understanding how deep learning models work. In this article, I discuss the current state of interpretive methods in deep learning. I will also discuss the advantages and limitations of different approaches and show how interpretability can be used to improve the reliability of deep learning models.
In the future, machine learning could be used to automate mundane tasks that humans perform including answering customer service inquiries or processing data. To ensure the reliability of these automated systems, interpretability tools can be used to provide insight into model decision making. Model-agnostic interpretability tools that are modular across models and can be used to share model results with others to provide transparency and empower users to have confidence in their models.
Interpretability Approaches
ML interpretability refers to a user's ability to explain the decisions made by an ML system. This includes understanding the relationships between the input, the model, and the output. Interpretability increases confidence in the model, reduces bias, and ensures that the model is compliant and ethical. There are several methods of providing interpretability in machine learning, including:
- Feature Interpretability: This method involves visualizing the features that the model has learned to understand what it is learning. Feature interpretability can help identify the most important features in a deep learning model, but it may not always be reliable if it is not based on the underlying model.
- Activation Maximization: This method involves maximizing the activation of a certain neuron or layer to understand what the model is learning. This method can be useful for visualizing the contribution of network components in some cases, but the results may not always be understandable to humans depending on the use case.
- Saliency Maps: This method involves creating a heatmap of the model’s output to understand which parts of the input are most important. One disadvantage of saliency maps may be that they are not always able to provide information about the relationships between different features.
- Model Distillation: This method distills a complex model down into a simpler one (e.g., decision trees) while trying to maintain the accuracy of the original model. Because the smaller model may be thought of as a proxy model, it may not have the exact same underlying structure as the original model.
- Local Interpretable Model-Agnostic Explanations: The aim of Local Interpretable Model-Agnostic Explanations (LIME) is to explain complex models by creating an interpretable model that is faithful to the original. To do this, LIME creates an explainable model by fitting a linear model to the training data near the prediction of interest. The linear model is then used to explain the prediction of the original model. LIME may only provide explanations for individual instances, not for the overall behavior of the model.
- Shapley Value: The Shapley value can be used to determine the importance of each input variable, and which of the input variables have the most influence on the model’s output. This can be useful in selecting which input variables to include in the model and in tuning the model’s parameters to optimize its performance. Shapley values may require a lot of computing time, which means that often approximate solutions are determined in real-world problems.
To better understand how deep learning models make their decisions, we will next explore some of these methods in additional detail. Shapley values and LIME are two powerful tools for deep learning interpretability. We will look at these examples because looking at a visual representation may help make the concepts easier to understand to us and our future audiences. It may also help you understand potential applications in your own practice.
Use Case 1: LIME
Local Interpretable Model-agnostic Explanations, first published in 2016, uses local linear approximations to help understand how black box models function internally. The algorithm works by learning a sparse linear model for a given prediction. This gives users insight into how the model works and why certain predictions were made. LIME can also be used to detect and correct model biases and identify areas where a model is performing poorly. By understanding the behavior of your deep learning models, LIME can help you improve the accuracy and robustness of your models.
Utilizing an intelligible proxy model is the main principle of LIME. For instance, a text classifier might use word embeddings, and the interpretable representation might be a binary vector that denotes whether a word is present or not. The algorithm involves choosing an ideal explanation model from a set of potentially interpretable models such that the model is as simple as possible while maintaining similarity to the original model. The best solution is found by using perturbations of the instance X. The technique uses sampling and a model-agnostic approach to reduce the locality-aware loss.
Let’s look at an example of LIME as applied to Inception with an image of a cat and a dog. Using the following example image:
Figure 1. Image of a cat and dog
We can retrieve the top 5 predictions for the images:
images = transform_img_fn(['dogs.jpg'])
# I'm dividing by 2 and adding 0.5 because of how this Inception represents images
plt.imshow(images[0] / 2 + 0.5)
preds = predict_fn(images)
for x in preds.argsort()[0][-5:]:
print x, names[x], preds[0,x]
286 Egyptian cat 0.000892741
242 EntleBucher 0.0163564
239 Greater Swiss Mountain dog 0.0171362
241 Appenzeller 0.0393639
240 Bernese mountain dog 0.829222
It may be interesting to look at the features (in this case, pixels) that contributed to the prediction of the Bernese mountain dog. In this case, we can see that the face of the dog contributes heavily to its prediction.
from skimage.segmentation import mark_boundaries
temp, mask = explanation.get_image_and_mask(240, positive_only=True, num_features=5, hide_rest=True)
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))
Figure 2. Pixels contributing to dog prediction
It is also possible to see features or pixels which contributed negatively:
temp, mask = explanation.get_image_and_mask(240, positive_only=False, num_features=10, hide_rest=False)
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))
Figure 3. Positive (green) and negative (red) contributions for dog
As we can see, the face of the cat contributes negatively to the prediction of the Bernese mountain dog. Let us now examine which characteristics contribute positively to the Egyptian cat's features.
temp, mask = explanation.get_image_and_mask(286, positive_only=True, num_features=5, hide_rest=True)
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))
Figure 4. Pixels contributing to cat prediction
Additionally, we can see that the face of the Bernese mountain dog negatively contributes to the prediction of the Egyptian cat.
temp, mask = explanation.get_image_and_mask(286, positive_only=False, num_features=10, hide_rest=False)
plt.imshow(mark_boundaries(temp / 2 + 0.5, mask))
Figure 5. Positive (green) and negative (red) contributions for cat
As noted earlier, LIME has limitations and drawbacks just like other approaches. The "black box" nature of the underlying model makes it difficult to explain certain behaviors using interpretable representations. The decision to use sparse linear models also inherently implies that the explanations provided may not accurately reflect the predictions made by non-linear models (e.g., a highly non-linear model). One method to address this is to use a faithfulness estimate to select a suitable interpretable model class from a set of multiple options tailored to the specific problem context.
Use Case 2: Shapley Values
Shapley values are a model-independent approach for users to describe the importance of features in a model. They explain the impact of individual features on the predictions of the model as a whole. DL models often contain multiple layers of neurons and using this approach may not provide an effective way to describe the complex interactions between neurons in deep learning models. Another trade-off with this approach is that it requires a large number of samples to compute precise Shapley values. For some deep learning models, this might be very difficult to achieve.
By measuring the importance of each feature in a trained model, Shapley values provide a way to explain the predictions of nonlinear models in machine learning. To calculate a Shapley value, simulations are run with the feature removed from the input data, and a prediction is made for each simulation. The Shapley value for the feature is then computed as the average difference in the predictions made with and without the feature. They may be used to provide insight into the compute feature contribution and the impact of each feature on model performance.
Let’s look at an example of Shapley values on convolutional neural networks. We can use the SHAP library to generate the Shapley values:
# select background for shap
background = x_train[np.random.choice(x_train.shape[0], 1000, replace=False)]# DeepExplainer to explain predictions of the model
explainer = shap.DeepExplainer(model, background)# compute shap values
shap_values = explainer.shap_values(x_test_each_class)
# visualize SHAP values
plot_actual_predicted(images_dict, predicted_class)
print()
shap.image_plot(shap_values, x_test_each_class * 255)
The visualization of the Shapley values on underlying images my then look similar to
Figure 6. Visualization of the Shapley values
where red pixels represent a higher value. Shapley values that contribute positively towards the classification of an image into a class and where blue pixels represent a lower value Shapley values that contribute negatively towards the classification of an image into a class This type of visualization allows us to see how groups of pixels are contributing towards individual solutions and allows us to look for bias in the model, adjusting our training process to account for any potential issues.
As mentioned before, Shapley values have limitations like other approaches. They may require a lot of computing time, which means that often approximate solutions are determined for real-world problems. Like other permutation-based interpretation methods, Shapley values ignore feature dependence and may also sometimes provide an unintentional explanation. Because of this, Shapley values may potentially be misinterpreted, and it is important to make sure that biases are not hidden.
Conclusion
In the future, machine learning could be used to automate mundane tasks that humans perform including answering customer service inquiries or processing data. To ensure the reliability of these automated systems, interpretability tools can be used to provide insight into model decision making. Model-agnostic interpretability tools that are modular across models and can be used to share model results with others to provide transparency and empower users to have confidence in their models.
LIME and Shapley values are two ways to interpret the results of deep learning models. Both methods provide information about how the model arrived at its results. LIME (Local Interpretable Model-Agnostic Explanations) and Shapley values (named after mathematician Lloyd Shapley) are two methods that can be used to describe the decision-making process of deep learning models. Both approaches aim to explain the predictions of deep learning models by analyzing the contribution of each input feature. The main difference between the two approaches is that LIME is a model-agnostic approach. That is, it can be used with any type of model, but when applied to complex or non-linear models, Shapley values may have increased computational cost and complexity in interpretation.
As we have seen, interpretability is a growing area of research that seeks to understand how deep learning models work. Different methods of interpretability have different advantages and limitations. Interpretability can be used to improve the reliability of deep learning models. Some interpretability assessment methods are more expensive or time consuming than others. There is no single "right" way to interpret a deep learning model.