A computational examination of the two-streams hypothesis: which pathway needs a longer memory?

Cogn Neurodyn. 2022 Feb;16(1):149-165. doi: 10.1007/s11571-021-09703-z. Epub 2021 Aug 10.

Abstract

The two visual streams hypothesis is a robust example of neural functional specialization that has inspired countless studies over the past four decades. According to one prominent version of the theory, the fundamental goal of the dorsal visual pathway is the transformation of retinal information for visually-guided motor behavior. To that end, the dorsal stream processes input using absolute (or veridical) metrics only when the movement is initiated, necessitating very little, or no, memory. Conversely, because the ventral visual pathway does not involve motor behavior (its output does not influence the real world), the ventral stream processes input using relative (or illusory) metrics and can accumulate or integrate sensory evidence over long time constants, which provides a substantial capacity for memory. In this study, we tested these relations between functional specialization, processing metrics, and memory by training identical recurrent neural networks to perform either a viewpoint-invariant object classification task or an orientation/size determination task. The former task relies on relative metrics, benefits from accumulating sensory evidence, and is usually attributed to the ventral stream. The latter task relies on absolute metrics, can be computed accurately in the moment, and is usually attributed to the dorsal stream. To quantify the amount of memory required for each task, we chose two types of neural network models. Using a long-short-term memory (LSTM) recurrent network, we found that viewpoint-invariant object categorization (object task) required a longer memory than orientation/size determination (orientation task). Additionally, to dissect this memory effect, we considered factors that contributed to longer memory in object tasks. First, we used two different sets of objects, one with self-occlusion of features and one without. Second, we defined object classes either strictly by visual feature similarity or (more liberally) by semantic label. The models required greater memory when features were self-occluded and when object classes were defined by visual feature similarity, showing that self-occlusion and visual similarity among object task samples are contributing to having a long memory. The same set of tasks modeled using modified leaky-integrator echo state recurrent networks (LiESN), however, did not replicate the results, except under some conditions. This may be because LiESNs cannot perform fine-grained memory adjustments due to their network-wide memory coefficient and fixed recurrent weights. In sum, the LSTM simulations suggest that longer memory is advantageous for performing viewpoint-invariant object classification (a putative ventral stream function) because it allows for interpolation of features across viewpoints. The results further suggest that orientation/size determination (a putative dorsal stream function) does not benefit from longer memory. These findings are consistent with the two visual streams theory of functional specialization.

Supplementary information: The online version contains supplementary material available at 10.1007/s11571-021-09703-z.

Keywords: Convolutional neural networks; Dorsal and ventral visual pathway; LSTM; Memory span; Two-streams hypothesis; Viewpoint invariant object recognition.