Project Details
Projekt Print View

Visual Fine-grained Recognition

Applicant Professor Dr.-Ing. Joachim Denzler, since 1/2017
Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2015 to 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 275610656
 
Final Report Year 2019

Final Report Abstract

During the funding period of the project, we tackled three major work packages. The first point is the unsupervised part constellation discovery as well as representations for these parts. Especially in fine-grained recognition, finding and leveraging subtle differences at specific locations which differ for each class are essential for successful classification. Whereas it is possible to learn this in a supervised way, the annotation of these locations by experts is very time consuming and expensive. To resolve this problem, we developed an unsupervised part constellation model, which first generates a large set of part proposals. Then it identifies relevant parts by checking for consistent constellations of their detections, which are constrained by their relative position. Furthermore, we developed an attention-based pooling technique, which we later generalized, to learn and fine-tune part and object representations. Calculating local feature descriptions as well as locating attention improved the performance of low and medium complexity models for few-shot fine-grained recognition tasks where only a very limited number of samples per class is available. Although, we find that training the whole pipeline for our first method can be problematic, the later can be integrated into an end-to-end learnable framework. With the second main work package, we introduced methods for exemplar-specific model estimation. We point out two ways to influence the locality. On the one hand, the straightforward approach of fine-tuning a network for a specific domain already increases the region locality significantly. On the other hand, the presented α-pooling approach is a direct way to manipulate the aggregation of the local features and therefore also the locality of predictions. It is important to note that this is also learnable from data alone. The presented visualization method helps to understand the region locality of the decision for a test sample by showing the most influential regions from the training data. Additionally, we investigated how to improve local representation via fine-tuning the CNN model with a subset of the most relevant training images via new selection schemes. While local learning is beneficial for small architectures, we found that it does not yield improvements for more complex architectures. Furthermore, it appears that locality of representations increases for more complex models. Thus, we conclude that these models already focus only on very few training examples, even without additional fine-tuning, due to their complexity. Lastly, the third point we investigated during the funding period is domain adaptation for part detectors and part feature representations. Instead of performing domain adaptation directly, we investigated the influence of domain shifts by analyzing noise patterns. For that, we investigated a variety of different noise types and found that at time of publication all state-of-the-art models are strongly affected by all noise types, which were not present during training. Additionally, we showed that it is possible to estimate noise sensitivity efficiently by computing a first-order approximation of the output change given an image.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung