High-Level Visual Encoding Model Framework with Hierarchical Ventral Stream-Optimized Neural Networks

Wulue Xiao; Jingwei Li; Chi Zhang; Linyuan Wang; Panpan Chen; Ziya Yu; Li Tong; Bin Yan

doi:10.3390/brainsci12081101

High-Level Visual Encoding Model Framework with Hierarchical Ventral Stream-Optimized Neural Networks

Brain Sci. 2022 Aug 19;12(8):1101. doi: 10.3390/brainsci12081101.

Authors

Wulue Xiao^{1

2}, Jingwei Li², Chi Zhang², Linyuan Wang², Panpan Chen², Ziya Yu², Li Tong², Bin Yan²

Affiliations

¹ School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450001, China.
² Henan Key Laboratory of Imaging and Intelligent Processing, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China.

Abstract

Visual encoding models based on deep neural networks (DNN) show good performance in predicting brain activity in low-level visual areas. However, due to the amount of neural data limitation, DNN-based visual encoding models are difficult to fit for high-level visual areas, resulting in insufficient encoding performance. The ventral stream suggests that higher visual areas receive information from lower visual areas, which is not fully reflected in the current encoding models. In the present study, we propose a novel visual encoding model framework which uses the hierarchy of representations in the ventral stream to improve the model's performance in high-level visual areas. Under the framework, we propose two categories of hierarchical encoding models from the voxel and the feature perspectives to realize the hierarchical representations. From the voxel perspective, we first constructed an encoding model for the low-level visual area (V1 or V2) and extracted the voxel space predicted by the model. Then we use the extracted voxel space of the low-level visual area to predict the voxel space of the high-level visual area (V4 or LO) via constructing a voxel-to-voxel model. From the feature perspective, the feature space of the first model is extracted to predict the voxel space of the high-level visual area. The experimental results show that two categories of hierarchical encoding models effectively improve the encoding performance in V4 and LO. In addition, the proportion of the best-encoded voxels for different models in V4 and LO show that our proposed models have obvious advantages in prediction accuracy. We find that the hierarchy of representations in the ventral stream has a positive effect on improving the performance of the existing model in high-level visual areas.

Keywords: deep neural networks; encoding model; fMRI; hierarchical representations; ventral stream.

Grants and funding

62106285/Chi Zhang