Project Details
Projekt Print View

Weakly Supervised Learning for Depth Estimation in Monocular Images

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2019 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 420493178
 
Final Report Year 2024

Final Report Abstract

In this project, we developed novel approaches for depth estimation in monocular images. Two types of machine learning methods that merely require “weak” supervision were researched, namely learning to rank and superset learning. Furthermore, we investigated computational models for the construction of monocular depth features, motivated by human visual perception. In terms of learning-to-rank approaches, we proposed to treat the problem of depth estimation as a listwise ranking problem, leveraging the well-known Plackett-Luce probability distribution on rankings. We proposed a neural network architecture to learn the parameters of the distribution over depth rankings and realized an efficient and cost-effective way of training listwise depth ranking models. Additionally, we showed that the model allows for the estimation of shift-invariant metric depth information from ranking-only data provided at training time. To construct (interpretable) features for monocular depth estimation, we modeled and implemented four monocular criteria (linear perspective, occlusion, relative height, usual size) that are relevant for both indoor and outdoor images. We analyzed to what extent these features are implicitly learned by a data-driven deep neural network. In follow-up work, we investigated whether we can detect and correct ranking errors of a state-of-the-art model using the hand-crafted features. We suggested to utilize cross-attention in a transformer decoder to learn spatial relations from two image patches by exploiting patch-wise image context. Our experiments showed that the model can predict and correct a subset of the errors made by a state-of-the-art approach. Other project publications leveraged the concept of superset learning. Under the notion of label relaxation, we suggested the weakening of label information in a general framework, exemplified for probabilistic classification. Here, superset labels are composed of multiple candidate probability distributions, forming credal sets, whose mathematical properties are exploited to obtain an efficient and robust learning methodology against distortions from inaccurate data. In another project publication, we transfer this idea to the field of monocular depth estimation. Instead of taking sensor signals as exact measurements, we follow the idea of label relaxation by (fuzzy) supersets around the originally observed depth value. Together with generalized empirical risk minimization, this model leads to more robust and better generalizing depth regression models. Furthermore, we extended the scope of label re-modeling to the paradigm of semi-supervised learning. The richer form of supervision by (fuzzy) supersets was leveraged in a credal self-supervised learning approach. Instead of using single (precise) probabilistic distributions as pseudo-labels, credal sets are constructed by a self-learner. Combined with generalized empirical risk minimization, this method leads to a more cautious yet robust learning behavior. Moreover, we applied a similar idea to the problem of semi-supervised monocular depth estimation.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung