Non-linear Probabilistic Models for Representational Recognition and Unsupervised Learning in Vision
Final Report Abstract
Humans interpret an image of a visual scene by recognizing individual image components and their relations to each other. For instance, we recognise pedestrians, cars, bikes and traffic signs in a street scene simultaneously, and we can interpret the scene according to the relations among the recognized objects. The ability for such a complex scene recognition requires a complex machinery for inference and extensive knowledge about the visual world. Very large parts of this world knowledge are learned from experiences, i.e., they are learned by exposure to many images during our lifetime. Machine Learning techniques can replicate the process of learning components from data, and they can make use of the learned components for recognition. Component extraction algorithms such as Independent Component Analysis (ICA), Non-negative Matrix Factorization (NMF) or Sparse Coding (SC) are standard and very well-known algorithms of the field, which may be taken as evidence for the general importance of component extraction methods. However, all these standard algorithms fall short in capturing important properties of images: they neither model the non-linear superposition of image components (objects and edges occlude each other) nor do they model object and edge positions explicitly. This project aimed at overcoming these crucial limitations of standard approaches. In order to build novel algorithms better suited for visual data, novel theoretical approaches were derived to model nonlinear superposition assumptions and explicit component positions. Using the framework of probabilistic generative models (which heavily relies on Bayes’ rule) and using novel approximation methods, the development of different novel and non-linear component extraction methods was accomplished. We (A) showed that models with strong non-linearities and position invariance could be used to derive novel algorithms; and (B) we demonstrated that such algorithms could be applied to large-scale problems as they are typical for visual data. The efficiency and effectiveness of the newly developed algorithms were evaluated for different types of visual data including grey-scale and color images, image patches, whitened patches and motion trajectories. In such applications, non-linear models showed significant improvements in component extraction tasks, and they enabled new tasks the were not accomplished before. Unexpected results of the project were the very high flexibility of the developed novel training approaches which allowed efficient training also of advanced non-linear models. Furthermore, our results showed that the training technology intended for non-linear models can significantly improved linear models as well. Unexpected was also a strong task dependency of the performance of non-linear vs. linear models. Linear models can, for instance, perform bettern on image denoising tasks than non-linear models, while non-linear models can extract the actual image components better. An again very positive unexpected result was the obtained large autonomy of the obtained algorithms (i.e., their strongly reduced demand for expert knowledge). As an example, we used this property to show the applicability of an algorithm with explicit position encoding to very strongly corrupted scanned text. The results of this application were very well perceived by the computer vision community with a talk at one of the two major conferences in the field and an award granted to one project scientist at the same conference. Another application of a non-linear model could relate responses of neurons in primary visual cortex to the non-linear and invariant superposition in images. The results showed evidence for human recognition being best modeled by non-linear models. In conclusion, the scientific results obtained in this project have demonstrated the potential of non-linear models as an alternative to linear models for many tasks. The theoretical framework in which novel and mature non-linear algorithms can be developed has been created. Any application of non-linear models has, so far, resulted in an important scientific result, which strongly motivates further research in order to establish non-linear and invariant models as a novel standard Machine Learning tool.
Publications
- (2013). What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach. Advances in Neural Information Processing Systems (NIPS) 26, 243–251
Dai, Z., Exarchakis, G., and Lücke, J.
- (2014). A Truncated EM Approach for Spike-and-Slab Sparse Coding. Journal of Machine Learning Research 15: 2653-2687
Sheikh, A. S., Shelton, J. A., and Lücke, J.
- (2014). Autonomous Document Cleaning – A Generative Approach to Reconstruct Strongly Corrupted Scanned Texts. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(10): 1950-1962
Dai, Z., and Lücke, J.
(See online at https://doi.org/10.1109/TPAMI.2014.2313126) - (2015). Nonlinear Spike-and-slab Sparse Coding for Interpretable Image Encoding. PLoS One 10(5): e0124088
Shelton, J. A., Sheikh, A.-S., Bornschein, J., Sterne, P., and Lücke, J.
(See online at https://doi.org/10.1371/journal.pone.0124088)