Foveal Vision for Humanoid Robots

Review
In: Humanoid Robotics and Neuroscience: Science, Engineering and Society. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 5.

Excerpt

Human vision is based on the ability to perceive light, which enters the brain through the eye as shown in Figure 5.1. Light travels through a number of layers before hitting the area of the eye called the retina [16]. It is converted into nerve signals in photosensitive cells, which primarily consist of cones and rods. Cones are color-sensitive and require a lot of light to be triggered. They are most densely distributed in the foveal area of the retina, which is responsible for detailed vision. There are no rods in the the fovea. Rods are more densely distributed in peripheral areas of the retina, which contain significantly fewer cones. They are much more sensitive to light than cones and are therefore responsible for vision under low-light conditions. They are monochromatic and are also responsible for motion detection. Image resolution in the periphery is much lower than in the fovea also due to the pooling of information from retinal receptors by retinal ganglion cells, which is far greater in the visual periphery than in the foveal area [18]. The distribution of cones and rods on the retina takes into account the competing evolutionary requirements for a wide field of view and high-resolution vision. The light is carried into the brain through the ganglion cells and the optic nerve, which enters the retina at the optical disc.

Each eye has six extraocular muscles attached to it and moves because the appropriate ones shorten [16]. There exist several types of eye movements, but in the context of foveated vision, the most interesting ones are saccades and smooth pursuit. A saccade is a rapid, goal-directed eye movement to bring the area of interest to the area of highest resolution, (i.e., the fovea [11]). Smooth pursuit eye movements are slower. Their task is to keep the object of interest in the fovea and stabilize its image [,[31]. Combined, they thus enable the processing of high-resolution images of the observed object. Both saccades and smooth pursuit eye movements can be complemented by the head to expand their range [31].

Object and face recognition rely on detailed analysis of foveal images [19]. Thus to improve the capabilities of artificial vision systems, which are still far less capable than human vision, it is necessary to develop active object recognition systems that enable the processing of high-resolution foveal images. On the other hand, it has been shown that for tasks such as recognition of the scene gist [18] and place recognition [19], peripheral vision is all that is needed. Active systems that enable simultaneous acquisition of variable resolution images should thus be developed to improve the performance of artificial vision systems.

Publication types

  • Review