Interpretable Neural Networks for Dense Image and Video Analysis (XIVA)

Applicant Dr. Simone Schaub-Meyer

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term since 2024

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 529680848

Project Description

Recent developments in deep learning have led to significant advances in many areas of computer vision. However, this progress is mainly measured by the test accuracy on a specific dataset. This unfortunately does not tell us much about how a model will behave in rare corner cases or when it is deployed in scenarios slightly deviating from the training dataset. Understanding the behavior and decision process of an artificial neural network, and deriving conclusions from this about its robustness and generalization capabilities is the goal of explainable artificial intelligence (XAI). Addressing this in the area of image and video analysis is the aim of this project XIVA (eXplainable Image and Video Analysis) by developing interpretable explanation methods for spatial and spatio-temporal vision tasks, such as image/video segmentation and motion estimation, as well as using them to improve the models itself and their robustness. The general goal of XAI is to obtain explanations that are interpretable by a human while corresponding to the actual internal behavior of a model, often referred to as being faithful. This interpretability can be obtained in two ways: 1) By providing post-hoc explanations by analyzing an existing model, either globally or locally, after it has been trained. 2) By designing inherently interpretable models providing faithful explanations by design. The XIVA project will contribute to advance the research in XAI for dense prediction tasks, which goes beyond most existing XAI methods that mainly focus on classification tasks. We will address this by investigating and developing explanation methods specifically for spatial and spatio-temporal tasks, (i) by analyzing and measuring the holistic predictive performance of models with novel human-interpretable metrics to derive insights about their global strengths and weaknesses, (ii) by developing local attribution methods that can handle and visualize spatial and spatio-temporal decision processes leading to a specific output, (iii) by focusing on realizing inherently interpretable models for dense prediction tasks that are intrinsically better suited to provide explanations and increase the robustness, and (iv) lastly, to evaluate our developed contributions with suitable, novel datasets and benchmarks targeted to evaluate explainability and robustness. Extending XAI to dense vision tasks is a necessary step forward in order to increase the understanding of broadly used models for image and video analysis and improve their robustness. This is especially important for legal or safety-critical applications, such as in the medical domain or for autonomous driving. Having a better understanding of how and why a deep neural network arrives at a particular prediction will help us to build better models in general, as we can mitigate decisions due to bias in the data, e.g., when spurious features are used for the decision process, and take actions accordingly.

DFG Programme Independent Junior Research Groups

Servicenavigation

Hauptnavigation

Interpretable Neural Networks for Dense Image and Video Analysis (XIVA)

Additional Information

Servicenavigation

Hauptnavigation

Interpretable Neural Networks for Dense Image and Video Analysis (XIVA)

Additional Information

Textvergrößerung und Kontrastanpassung