Distinguishing mirror from glass: A "big data" approach to material perception

Hideki Tamura; Konrad Eugen Prokott; Roland W Fleming

doi:10.1167/jov.22.4.4

Distinguishing mirror from glass: A "big data" approach to material perception

J Vis. 2022 Mar 2;22(4):4. doi: 10.1167/jov.22.4.4.

Authors

Hideki Tamura^{1

2}, Konrad Eugen Prokott^{3

4}, Roland W Fleming^{3

5

6}

Affiliations

¹ Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Aichi, Japan.
² tamura@cs.tut.ac.jp.
³ Department of Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany.
⁴ konrad.e.prokott@psychol.uni-giessen.de.
⁵ Center for Mind, Brain and Behavior (CMBB), University of Marburg and Justus Liebig University Giessen, Germany.
⁶ roland.w.fleming@psychol.uni-giessen.de.

Abstract

Distinguishing mirror from glass is a challenging visual inference, because both materials derive their appearance from their surroundings, yet we rarely experience difficulties in telling them apart. Very few studies have investigated how the visual system distinguishes reflections from refractions and to date, there is no image-computable model that emulates human judgments. Here we sought to develop a deep neural network that reproduces the patterns of visual judgments human observers make. To do this, we trained thousands of convolutional neural networks on more than 750,000 simulated mirror and glass objects, and compared their performance with human judgments, as well as alternative classifiers based on "hand-engineered" image features. For randomly chosen images, all classifiers and humans performed with high accuracy, and therefore correlated highly with one another. However, to assess how similar models are to humans, it is not sufficient to compare accuracy or correlation on random images. A good model should also predict the characteristic errors that humans make. We, therefore, painstakingly assembled a diagnostic image set for which humans make systematic errors, allowing us to isolate signatures of human-like performance. A large-scale, systematic search through feedforward neural architectures revealed that relatively shallow (three-layer) networks predicted human judgments better than any other models we tested. This is the first image-computable model that emulates human errors and succeeds in distinguishing mirror from glass, and hints that mid-level visual processing might be particularly important for the task.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
Neural Networks, Computer*
Visual Perception*