Visual Fine-grained Recognition

Applicant Professor Dr.-Ing. Joachim Denzler, since 1/2017

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2015 to 2020

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 275610656

Final Report Year 2019

Final Report Abstract

During the funding period of the project, we tackled three major work packages. The ﬁrst point is the unsupervised part constellation discovery as well as representations for these parts. Especially in ﬁne-grained recognition, ﬁnding and leveraging subtle differences at speciﬁc locations which differ for each class are essential for successful classiﬁcation. Whereas it is possible to learn this in a supervised way, the annotation of these locations by experts is very time consuming and expensive. To resolve this problem, we developed an unsupervised part constellation model, which ﬁrst generates a large set of part proposals. Then it identiﬁes relevant parts by checking for consistent constellations of their detections, which are constrained by their relative position. Furthermore, we developed an attention-based pooling technique, which we later generalized, to learn and ﬁne-tune part and object representations. Calculating local feature descriptions as well as locating attention improved the performance of low and medium complexity models for few-shot ﬁne-grained recognition tasks where only a very limited number of samples per class is available. Although, we ﬁnd that training the whole pipeline for our ﬁrst method can be problematic, the later can be integrated into an end-to-end learnable framework. With the second main work package, we introduced methods for exemplar-speciﬁc model estimation. We point out two ways to inﬂuence the locality. On the one hand, the straightforward approach of ﬁne-tuning a network for a speciﬁc domain already increases the region locality signiﬁcantly. On the other hand, the presented α-pooling approach is a direct way to manipulate the aggregation of the local features and therefore also the locality of predictions. It is important to note that this is also learnable from data alone. The presented visualization method helps to understand the region locality of the decision for a test sample by showing the most inﬂuential regions from the training data. Additionally, we investigated how to improve local representation via ﬁne-tuning the CNN model with a subset of the most relevant training images via new selection schemes. While local learning is beneﬁcial for small architectures, we found that it does not yield improvements for more complex architectures. Furthermore, it appears that locality of representations increases for more complex models. Thus, we conclude that these models already focus only on very few training examples, even without additional ﬁne-tuning, due to their complexity. Lastly, the third point we investigated during the funding period is domain adaptation for part detectors and part feature representations. Instead of performing domain adaptation directly, we investigated the inﬂuence of domain shifts by analyzing noise patterns. For that, we investigated a variety of different noise types and found that at time of publication all state-of-the-art models are strongly affected by all noise types, which were not present during training. Additionally, we showed that it is possible to estimate noise sensitivity efﬁciently by computing a ﬁrst-order approximation of the output change given an image.

Publications

Chimpanzee faces in the wild: Log-euclidean cnns for predicting identities and attributes of primates. In German Conference on Pattern Recognition (GCPR), pages 51–63, 2016
Freytag, Alexander; Rodner, Erik; Simon, Marcel; Loos, Alexander; Kühl, Hjalmar S. & Denzler, Joachim
Convolutional neural networks as a computational model for the underlying processes of aesthetics perception. In ECCV Workshop on Computer Vision for Art Analysis, 2016
Denzler, Joachim; Rodner, Erik & Simon, Marcel
Fine-grained recognition in the noisy wild: Sensitivity analysis of convolutional neural networks approaches. In British Machine Vision Conference (BMVC), 2016
Erik Rodner, Marcel Simon, Bob Fisher, and Joachim Denzler
Generalized orderless pooling performs implicit salient matching. In International Conference on Computer Vision (ICCV), 2017
Marcel Simon, Yang Gao, Trevor Darrell, Joachim Denzler, and Erik Rodner
In defense of active part selection for ﬁne-grained classiﬁcation. Pattern Recognition and Image Analysis, pages 658–663, 2018
Korsch, D. & Denzler, J.
Classiﬁcation-speciﬁc parts for improving ﬁnegrained visual categorization. In German Conference on Pattern Recognition (GCPR), 2019
Korsch, Dimitri; Bodesheim, Paul & Denzler, Joachim
The whole is more than its parts? from explicit to implicit pose normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–13, 2018
Simon, Marcel; Rodner, Erik; Darrell, Trevor & Denzler, Joachim

Servicenavigation

Hauptnavigation

Visual Fine-grained Recognition

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

Visual Fine-grained Recognition

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung