Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns

Ariel Goldstein; Avigail Grinstein-Dabush; Mariano Schain; Haocheng Wang; Zhuoqiao Hong; Bobbi Aubrey; Mariano Schain; Samuel A Nastase; Zaid Zada; Eric Ham; Amir Feder; Harshvardhan Gazula; Eliav Buchnik; Werner Doyle; Sasha Devore; Patricia Dugan; Roi Reichart; Daniel Friedman; Michael Brenner; Avinatan Hassidim; Orrin Devinsky; Adeen Flinker; Uri Hasson

doi:10.1038/s41467-024-46631-y

Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns

Nat Commun. 2024 Mar 30;15(1):2768. doi: 10.1038/s41467-024-46631-y.

Authors

Ariel Goldstein^{1

2}, Avigail Grinstein-Dabush^#³, Mariano Schain^#³, Haocheng Wang⁴, Zhuoqiao Hong⁴, Bobbi Aubrey^{4

5}, Mariano Schain³, Samuel A Nastase⁴, Zaid Zada⁴, Eric Ham⁴, Amir Feder³, Harshvardhan Gazula⁴, Eliav Buchnik³, Werner Doyle⁵, Sasha Devore⁵, Patricia Dugan⁵, Roi Reichart⁶, Daniel Friedman⁵, Michael Brenner^{3

7}, Avinatan Hassidim³, Orrin Devinsky⁵, Adeen Flinker^{5

8}, Uri Hasson^{3

4}

Affiliations

¹ Business School, Data Science department and Cognitive Department, Hebrew University, Jerusalem, Israel. ariel.y.goldstein@mail.huji.ac.il.
² Google Research, Tel Aviv, Israel. ariel.y.goldstein@mail.huji.ac.il.
³ Google Research, Tel Aviv, Israel.
⁴ Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, NJ, USA.
⁵ New York University Grossman School of Medicine, New York, NY, USA.
⁶ Faculty of Industrial Engineering and Management, Technion, Israel Institute of Technology, Haifa, Israel.
⁷ School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA.
⁸ New York University Tandon School of Engineering, Brooklyn, NY, USA.

^# Contributed equally.

Abstract

Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. We hypothesize that language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent language. To test this hypothesis, we densely record the neural activity patterns in the inferior frontal gyrus (IFG) of three participants using dense intracranial arrays while they listened to a 30-minute podcast. From these fine-grained spatiotemporal neural recordings, we derive a continuous vectorial representation for each word (i.e., a brain embedding) in each patient. Using stringent zero-shot mapping we demonstrate that brain embeddings in the IFG and the DLM contextual embedding space have common geometric patterns. The common geometric patterns allow us to predict the brain embedding in IFG of a given left-out word based solely on its geometrical relationship to other non-overlapping words in the podcast. Furthermore, we show that contextual embeddings capture the geometry of IFG embeddings better than static word embeddings. The continuous brain embedding space exposes a vector-based neural code for natural language processing in the human brain.

MeSH terms

Brain*
Humans
Language*
Natural Language Processing
Prefrontal Cortex

Abstract

MeSH terms

Grants and funding