Systematically evaluating the psychological validity of computer-vision models for vision-based conceptual representations
Final Report Abstract
The aim of the present project was to provide a large-scale psychological evaluation for a quantitative model of mental representations based on visual data. The model in question was a deep convolutional neural network (DCNN), which is used in computer vision research to classify photorealistic images into different labelled categories. The activation values in the layers of this network for a given input image can be taken as representations of this image (IMG) representations, and the similarity of two such image representations can be computed as the cosine similarity between the vectors of activation values (IMG similarities). This model was further extended with prototype (PRO) representations, which are generated by averaging the activation vectors for a set of images with the same label. In a set of five large-scale experiments with hundreds of participants and thousands of items, we confirm our hypotheses that the model-derived similarity measures predict human behavioral data. PRO similarities predict visual similarity ratings for word pairs (Study 1a), but also semantic similarity ratings for word pairs (Study 1b) and to some extent even visual similarity similarity ratings for image pairs (Study 2). IMG similarities predict visual similarity ratings for image pairs (Study 2), but also on-line data such as processing times and percent correct rates in a discrimination task (Study 4) and in a priming paradigm with a visual decision task (Study 5). Model-derived typicality scores – the similarities between an image and its category prototype – predict human typicality ratings (Study 3). Overall, this successful evaluation of the model corroborates the psychological plausibility of its representations. This clearly encourages its use in further psychological studies, and the data, model, and web interface created in this project – all of which are made publicly available – can serve as highly valuable tools for this purpose.
Publications
-
ViSpa (Vision Spaces): A computer-vision-based representation system for individual images and concept prototypes, with large-scale evaluation.. Psychological Review, 130(4), 896-934.
Günther, Fritz; Marelli, Marco; Tureski, Sam & Petilli, Marco Alessandro
