Challenge
Deep convolutional neural networks (CNNs) achieve very high performance on various computer vision tasks. However, the decision making process of such networks is often being seen as a black box. An interesting question in context of image classification, for example, is which pixels trigger a final classification decision.
Solution
By computing the gradient of the network for its linear part and combining it with activation information of convolutional layers we can quite accurate localise the pixels within an input image responsible for a final classification result. The image on the rights illustrates a few examples. Each row in the image presents one input example and two heatmaps describing pixel importance with respect to different categories.
Impact
Given an ability to analyse the importance of image parts with respect to a classification result we can get insights into importance of visual features of objects and also validate that network relies on plausible information for decision making reducing the risk of overfitting. Additionally, this techniques can be used to get insights in the case where important visual features are not obvious for a human observer (e.g. in tissue classification).