Today’s state of the art visual navigation agents typically consist of large deep learning models trained end to end. Such models offer little to no interpretability about the learned skills or the actions of the agent taken in response to its environment. While past works have explored interpreting deep learning models, little attention has been devoted to interpreting embodied AI systems, which often involve reasoning about the structure of the environment, target characteristics and the outcome of one’s actions. In this paper, we introduce the Interpretability System for Embodied agEnts (iSEE) for Point Goal and Object Goal navigation agents. We use iSEE to probe the dynamic representations produced by these agents for the presence of information about the agent as well as the environment. We demonstrate interesting insights about navigation agents using iSEE, including the ability to encode reachable locations (to avoid obstacles), visibility of the target, progress from the initial spawn location as well as the dramatic effect on the behaviors of agents when we mask out critical individual neurons.
Which of the above concepts are encoded in agent's dynamic representation?
Baseline model to solve ObjectNav/PointNav tasks. We investigate which concepts are encoded in the agent’s dynamic representation (RNN)
First, we train a gradient boosted tree to predict concepts from RNN and then apply explainability method SHAP to find which RNN neurons were most relevant in prediction.
Kshitij Dwivedi, Gemma Roig, Aniruddh Kembhavi, Roozbeh Mottaghi
In CVPR, 2022.
@InProceedings{Dwivedi_2022_CVPR,
author = {Dwivedi, Kshitij and Roig, Gemma and Kembhavi, Aniruddha and Mottaghi, Roozbeh},
title = {What Do Navigation Agents Learn About Their Environment?},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10276-10285}
}
This project template is borrowed from here.