A bio-inspired positional embedding network for transformer-based models

Neural Netw. 2023 Sep:166:204-214. doi: 10.1016/j.neunet.2023.07.015. Epub 2023 Jul 15.

Abstract

Owing to the progress of transformer-based networks, there have been significant improvements in the performance of vision models in recent years. However, there is further potential for improvement in positional embeddings that play a crucial role in distinguishing information across different positions. Based on the biological mechanisms of human visual pathways, we propose a positional embedding network that adaptively captures position information by modeling the dorsal pathway, which is responsible for spatial perception in human vision. Our proposed double-stream architecture leverages large zero-padding convolutions to learn local positional features and utilizes transformers to learn global features, effectively capturing the interaction between dorsal and ventral pathways. To evaluate the effectiveness of our method, we implemented experiments on various datasets, employing differentiated designs. Our statistical analysis demonstrates that the simple implementation significantly enhances image classification performance, and the observed trends demonstrate its biological plausibility.

Keywords: Dorsal pathway modeling; Image classification; Position embedding; Transformers; Zero padding.

MeSH terms

  • Humans
  • Learning*
  • Space Perception*
  • Visual Pathways