Project Details
Robust Computer Vision through Neural Analysis-by-Synthesis with 3D-aware Compositional Network Architectures
Applicant
Adam Kortylewski, Ph.D.
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 468670075
One of the most important problems for computer vision is that current deep learning approaches to computer vision perform well when they are applied in scenarios that are familiar to them, but they fail to give reliable predictions in unseen or adverse viewing conditions. They are unreliable when objects are partially occluded, seen from a previously unseen pose, or in bad weather. This lack of robustness needs to be overcome to make computer vision a reliable component of Science and our everyday lives.The goal of project is to develop deep neural networks for computer vision that are highly robust in real-world scenarios. To achieve this goal, we will research a neural analysis-by-synthesis approach to computer vision that combines the discriminative performance of deep learning with the robustness of generative vision models. This will lead to advanced neural network architectures that have the following properties:(1) 3D-aware representations: The vast majority of today's computer vision approaches process digital images in 2D only. One goal of this project is to augment DNNs with knowledge about the three-dimensional structure of our world, to enable them to recognize objects from unseen 3D viewpoints, unseen pose, and to exploit the 3D structure of visual scenes.(2) Compositional representations: Visual scenes in digital images are naturally composed of a hierarchy of entities (e.g., objects, parts, sub-parts, etc.) that interact with each other in the 3D world. One goal of this project is to enhance the robustness of DNNs by developing network architectures that exploit the hierarchical compositional structure of images. In particular, compositional representations will enable deep networks to become more reliable when individual components of the representation change, e.g. due to occlusion or changes in the viewing conditions.(3) Multi-task consistency through generative image understanding: It becomes increasingly evident that the current paradigm of solving a single vision task in isolation is very limited, because it is often highly ambiguous and becomes even more difficult in adverse viewing conditions. Building on our developments of 3D-aware (1) and compositional (2) representations, we will further enable deep networks to integrate several visual recognition tasks in a joint reasoning process. Specifically, we will integrate the compositional 3D representations into a generative image model that analyzes images by synthesizing its individual components on the level of neural feature activations that are invariant to irrelevant object details. This will enable DNNs to become robust by combining ambiguous perceptions of individual components into a consistent image interpretation.
DFG Programme
Independent Junior Research Groups
Major Instrumentation
High-performance GPU servers
Instrumentation Group
7030 Dedizierte, dezentrale Rechenanlagen, Prozeßrechner