Project Details
Projekt Print View

Unified Uncertainty Estimation for Fine-tuned Open-Vocabulary Models in Image Classification and Object Detection

Subject Area Methods in Artificial Intelligence and Machine Learning
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term since 2025
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 563660702
 
The proposed project aims at developing of deep-learning models providing built-in scores for the estimation of both aleatoric and epistemic uncertainty. While the former refers to data-inherent uncertainty, e.g. at class boundaries, the latter refers to uncertainty due to the lack in information, e.g. when facing an outlier data point. Unified uncertainty estimation is important to pave the way for deep learning to safety-critical applications such as automated driving, medical imaging, robotics, and others. As foundation, recent open-vocabulary models for image classification and object detection, such as CLIP and Grounding Dino are considered. Up to now, despite the popularity of open-vocabulary models, uncertainty estimation methods for those models are scarce. Trained on enormous amounts of data, open-vocabulary models provide outstanding generalization capabilities. When using them for a given down-stream task, their broad knowledge can help well for the estimation of data-related epistemic uncertainty and thus to detect semantically unknown objects. By equipping open-vocabulary models with one-vs.-all classifier heads, we provide them the option to naturally reject all known classes. This capability is further supported by incorporating synthetically generated out-of-class data into the learning process. For this purpose, we utilize recent stable-diffusion based models. On the other hand, as opposed to open-vocabulary models like CLIP, a one-vs.-all classifier head can provide useful class-probability estimates, which we use for the estimation of aleatoric uncertainty, e.g. at class boundaries. This approach extends a method of ours from ordinary deep classifiers to open-vocabulary models. With recent inpainting methodology, the proposed approach can be lifted to object detection. Last but not least, we propose to distil the learned knowledge of the object detectors into light-weight ones. As resource efficiency is another important factor to enable to use to deep learning in practice, this naturally complements the aforementioned safety aspects.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung