A Mixed Visual Encoding Model Based on the Larger-Scale Receptive Field for Human Brain Activity

Brain Sci. 2022 Nov 29;12(12):1633. doi: 10.3390/brainsci12121633.

Abstract

Research on visual encoding models for functional magnetic resonance imaging derived from deep neural networks, especially CNN (e.g., VGG16), has been developed. However, CNNs typically use smaller kernel sizes (e.g., 3 × 3) for feature extraction in visual encoding models. Although the receptive field size of CNN can be enlarged by increasing the network depth or subsampling, it is limited by the small size of the convolution kernel, leading to an insufficient receptive field size. In biological research, the size of the neuronal population receptive field of high-level visual encoding regions is usually three to four times that of low-level visual encoding regions. Thus, CNNs with a larger receptive field size align with the biological findings. The RepLKNet model directly expands the convolution kernel size to obtain a larger-scale receptive field. Therefore, this paper proposes a mixed model to replace CNN for feature extraction in visual encoding models. The proposed model mixes RepLKNet and VGG so that the mixed model has a receptive field of different sizes to extract more feature information from the image. The experimental results indicate that the mixed model achieves better encoding performance in multiple regions of the visual cortex than the traditional convolutional model. Also, a larger-scale receptive field should be considered in building visual encoding models so that the convolution network can play a more significant role in visual representations.

Keywords: RepLKNet; a large convolution kernel; deep neural networks; fMRI; receptive field; visual encoding models.