Project Details
Projekt Print View

Visual Fine-grained Recognition

Applicant Professor Dr.-Ing. Joachim Denzler, since 1/2017
Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2015 to 2020
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 275610656
 
Final Report Year 2019

Final Report Abstract

During the funding period of the project, we tackled three major work packages. The first point is the unsupervised part constellation discovery as well as representations for these parts. Especially in fine-grained recognition, finding and leveraging subtle differences at specific locations which differ for each class are essential for successful classification. Whereas it is possible to learn this in a supervised way, the annotation of these locations by experts is very time consuming and expensive. To resolve this problem, we developed an unsupervised part constellation model, which first generates a large set of part proposals. Then it identifies relevant parts by checking for consistent constellations of their detections, which are constrained by their relative position. Furthermore, we developed an attention-based pooling technique, which we later generalized, to learn and fine-tune part and object representations. Calculating local feature descriptions as well as locating attention improved the performance of low and medium complexity models for few-shot fine-grained recognition tasks where only a very limited number of samples per class is available. Although, we find that training the whole pipeline for our first method can be problematic, the later can be integrated into an end-to-end learnable framework. With the second main work package, we introduced methods for exemplar-specific model estimation. We point out two ways to influence the locality. On the one hand, the straightforward approach of fine-tuning a network for a specific domain already increases the region locality significantly. On the other hand, the presented α-pooling approach is a direct way to manipulate the aggregation of the local features and therefore also the locality of predictions. It is important to note that this is also learnable from data alone. The presented visualization method helps to understand the region locality of the decision for a test sample by showing the most influential regions from the training data. Additionally, we investigated how to improve local representation via fine-tuning the CNN model with a subset of the most relevant training images via new selection schemes. While local learning is beneficial for small architectures, we found that it does not yield improvements for more complex architectures. Furthermore, it appears that locality of representations increases for more complex models. Thus, we conclude that these models already focus only on very few training examples, even without additional fine-tuning, due to their complexity. Lastly, the third point we investigated during the funding period is domain adaptation for part detectors and part feature representations. Instead of performing domain adaptation directly, we investigated the influence of domain shifts by analyzing noise patterns. For that, we investigated a variety of different noise types and found that at time of publication all state-of-the-art models are strongly affected by all noise types, which were not present during training. Additionally, we showed that it is possible to estimate noise sensitivity efficiently by computing a first-order approximation of the output change given an image.

Publications

  • Chimpanzee faces in the wild: Log-euclidean cnns for predicting identities and attributes of primates. In German Conference on Pattern Recognition (GCPR), pages 51–63, 2016
    Alexander Freytag, Erik Rodner, Marcel Simon, Alexander Loos, Hjalmar Kühl, and Joachim Denzler
    (See online at https://doi.org/10.1007/978-3-319-45886-1_5)
  • Convolutional neural networks as a computational model for the underlying processes of aesthetics perception. In ECCV Workshop on Computer Vision for Art Analysis, 2016
    Joachim Denzler, Erik Rodner, and Marcel Simon
    (See online at https://doi.org/10.1007/978-3-319-46604-0_60)
  • Fine-grained recognition in the noisy wild: Sensitivity analysis of convolutional neural networks approaches. In British Machine Vision Conference (BMVC), 2016
    Erik Rodner, Marcel Simon, Bob Fisher, and Joachim Denzler
  • Generalized orderless pooling performs implicit salient matching. In International Conference on Computer Vision (ICCV), 2017
    Marcel Simon, Yang Gao, Trevor Darrell, Joachim Denzler, and Erik Rodner
  • In defense of active part selection for fine-grained classification. Pattern Recognition and Image Analysis, pages 658–663, 2018
    Dimitri Korsch and Joachim Denzler
    (See online at https://doi.org/10.1134/S105466181804020X)
  • The whole is more than its parts? from explicit to implicit pose normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–13, 2018
    Marcel Simon, Erik Rodner, Trevor Darell, and Joachim Denzler
    (See online at https://doi.org/10.1109/TPAMI.2018.2885764)
  • Classification-specific parts for improving finegrained visual categorization. In German Conference on Pattern Recognition (GCPR), 2019
    Dimitri Korsch, Paul Bodesheim, and Joachim Denzler
    (See online at https://doi.org/10.1007/978-3-030-33676-9_5)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung