Weakly Supervised Learning for Depth Estimation in Monocular Images

Applicants Professor Dr. Ralph Ewerth; Professor Dr. Eyke Hüllermeier

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2019 to 2022

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 420493178

Final Report Year 2024

Final Report Abstract

In this project, we developed novel approaches for depth estimation in monocular images. Two types of machine learning methods that merely require “weak” supervision were researched, namely learning to rank and superset learning. Furthermore, we investigated computational models for the construction of monocular depth features, motivated by human visual perception. In terms of learning-to-rank approaches, we proposed to treat the problem of depth estimation as a listwise ranking problem, leveraging the well-known Plackett-Luce probability distribution on rankings. We proposed a neural network architecture to learn the parameters of the distribution over depth rankings and realized an efficient and cost-effective way of training listwise depth ranking models. Additionally, we showed that the model allows for the estimation of shift-invariant metric depth information from ranking-only data provided at training time. To construct (interpretable) features for monocular depth estimation, we modeled and implemented four monocular criteria (linear perspective, occlusion, relative height, usual size) that are relevant for both indoor and outdoor images. We analyzed to what extent these features are implicitly learned by a data-driven deep neural network. In follow-up work, we investigated whether we can detect and correct ranking errors of a state-of-the-art model using the hand-crafted features. We suggested to utilize cross-attention in a transformer decoder to learn spatial relations from two image patches by exploiting patch-wise image context. Our experiments showed that the model can predict and correct a subset of the errors made by a state-of-the-art approach. Other project publications leveraged the concept of superset learning. Under the notion of label relaxation, we suggested the weakening of label information in a general framework, exemplified for probabilistic classification. Here, superset labels are composed of multiple candidate probability distributions, forming credal sets, whose mathematical properties are exploited to obtain an efficient and robust learning methodology against distortions from inaccurate data. In another project publication, we transfer this idea to the field of monocular depth estimation. Instead of taking sensor signals as exact measurements, we follow the idea of label relaxation by (fuzzy) supersets around the originally observed depth value. Together with generalized empirical risk minimization, this model leads to more robust and better generalizing depth regression models. Furthermore, we extended the scope of label re-modeling to the paradigm of semi-supervised learning. The richer form of supervision by (fuzzy) supersets was leveraged in a credal self-supervised learning approach. Instead of using single (precise) probabilistic distributions as pseudo-labels, credal sets are constructed by a self-learner. Combined with generalized empirical risk minimization, this method leads to a more cautious yet robust learning behavior. Moreover, we applied a similar idea to the problem of semi-supervised monocular depth estimation.

Publications

Credal self-supervised learning. In Proceedlins NeurIPS, Advances in Neural Information Processing Systems (Vol. 34, pp. 14370-14382)
Lienen, J. & Hüllermeier, E.
From Label Smoothing to Label Relaxation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 8583-8591.
Lienen, Julian & Hüllermeier, Eyke
Instance weighting through data imprecisiation. International Journal of Approximate Reasoning, 134, 1-14.
Lienen, Julian & Hüllermeier, Eyke
Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14590-14599. IEEE.
Lienen, Julian; Hullermeier, Eyke; Ewerth, Ralph & Nommensen, Nils
Robust Regression for Monocular Depth Estimation. In Asian Conference on Machine Learning (pp. 1001-1016)
Lienen, J., Nommensen, N., Ewerth, R. & Hüllermeier, E.
Scikit-Weak: A Python Library for Weakly Supervised Machine Learning. Lecture Notes in Computer Science, 57-70. Springer Nature Switzerland.
Campagner, Andrea; Lienen, Julian; Hüllermeier, Eyke & Ciucci, Davide
Analyzing Results of Depth Estimation Models with Monocular Criteria. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 3739-3743. IEEE.
Theiner, Jonas; Nommensen, Nils; Rhotert, Jim; Springstein, Matthias; Müller-Budack, Eric & Ewerth, Ralph
Conformal Credal Self-Supervised Learning. Proc. COPA, Conformal and Probabilistic Prediction with Applications. Limassol, Cyprus. PMLR 204:1-20, 2023
Lienen, J.; Demir, C. & Hüllermeier, E.
Memorization-Dilation: Modeling Neural Collapse Under Noise. Proc. ICLR, 11th International Conference on Learning Representations. Kigali, Rwanda,
Nguyen, D.A. , Levie, R. , Lienen, J. , Hüllermeier, E. & Kutyniok, G.
Mitigating Label Noise through Data Ambiguation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13799-13807.
Lienen, Julian & Hüllermeier, Eyke

Servicenavigation

Hauptnavigation

Weakly Supervised Learning for Depth Estimation in Monocular Images

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

Weakly Supervised Learning for Depth Estimation in Monocular Images

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung