Detailseite
Projekt Druckansicht

Visuelle fein-granulare Erkennung von Objekten

Antragsteller Professor Dr.-Ing. Joachim Denzler, seit 1/2017
Fachliche Zuordnung Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Förderung Förderung von 2015 bis 2020
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 275610656
 
Erstellungsjahr 2019

Zusammenfassung der Projektergebnisse

During the funding period of the project, we tackled three major work packages. The first point is the unsupervised part constellation discovery as well as representations for these parts. Especially in fine-grained recognition, finding and leveraging subtle differences at specific locations which differ for each class are essential for successful classification. Whereas it is possible to learn this in a supervised way, the annotation of these locations by experts is very time consuming and expensive. To resolve this problem, we developed an unsupervised part constellation model, which first generates a large set of part proposals. Then it identifies relevant parts by checking for consistent constellations of their detections, which are constrained by their relative position. Furthermore, we developed an attention-based pooling technique, which we later generalized, to learn and fine-tune part and object representations. Calculating local feature descriptions as well as locating attention improved the performance of low and medium complexity models for few-shot fine-grained recognition tasks where only a very limited number of samples per class is available. Although, we find that training the whole pipeline for our first method can be problematic, the later can be integrated into an end-to-end learnable framework. With the second main work package, we introduced methods for exemplar-specific model estimation. We point out two ways to influence the locality. On the one hand, the straightforward approach of fine-tuning a network for a specific domain already increases the region locality significantly. On the other hand, the presented α-pooling approach is a direct way to manipulate the aggregation of the local features and therefore also the locality of predictions. It is important to note that this is also learnable from data alone. The presented visualization method helps to understand the region locality of the decision for a test sample by showing the most influential regions from the training data. Additionally, we investigated how to improve local representation via fine-tuning the CNN model with a subset of the most relevant training images via new selection schemes. While local learning is beneficial for small architectures, we found that it does not yield improvements for more complex architectures. Furthermore, it appears that locality of representations increases for more complex models. Thus, we conclude that these models already focus only on very few training examples, even without additional fine-tuning, due to their complexity. Lastly, the third point we investigated during the funding period is domain adaptation for part detectors and part feature representations. Instead of performing domain adaptation directly, we investigated the influence of domain shifts by analyzing noise patterns. For that, we investigated a variety of different noise types and found that at time of publication all state-of-the-art models are strongly affected by all noise types, which were not present during training. Additionally, we showed that it is possible to estimate noise sensitivity efficiently by computing a first-order approximation of the output change given an image.

Projektbezogene Publikationen (Auswahl)

  • Chimpanzee faces in the wild: Log-euclidean cnns for predicting identities and attributes of primates. In German Conference on Pattern Recognition (GCPR), pages 51–63, 2016
    Alexander Freytag, Erik Rodner, Marcel Simon, Alexander Loos, Hjalmar Kühl, and Joachim Denzler
    (Siehe online unter https://doi.org/10.1007/978-3-319-45886-1_5)
  • Convolutional neural networks as a computational model for the underlying processes of aesthetics perception. In ECCV Workshop on Computer Vision for Art Analysis, 2016
    Joachim Denzler, Erik Rodner, and Marcel Simon
    (Siehe online unter https://doi.org/10.1007/978-3-319-46604-0_60)
  • Fine-grained recognition in the noisy wild: Sensitivity analysis of convolutional neural networks approaches. In British Machine Vision Conference (BMVC), 2016
    Erik Rodner, Marcel Simon, Bob Fisher, and Joachim Denzler
  • Generalized orderless pooling performs implicit salient matching. In International Conference on Computer Vision (ICCV), 2017
    Marcel Simon, Yang Gao, Trevor Darrell, Joachim Denzler, and Erik Rodner
  • In defense of active part selection for fine-grained classification. Pattern Recognition and Image Analysis, pages 658–663, 2018
    Dimitri Korsch and Joachim Denzler
    (Siehe online unter https://doi.org/10.1134/S105466181804020X)
  • The whole is more than its parts? from explicit to implicit pose normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–13, 2018
    Marcel Simon, Erik Rodner, Trevor Darell, and Joachim Denzler
    (Siehe online unter https://doi.org/10.1109/TPAMI.2018.2885764)
  • Classification-specific parts for improving finegrained visual categorization. In German Conference on Pattern Recognition (GCPR), 2019
    Dimitri Korsch, Paul Bodesheim, and Joachim Denzler
    (Siehe online unter https://doi.org/10.1007/978-3-030-33676-9_5)
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung