America's racial framework of superiority and Americanness embedded in natural language

PNAS Nexus. 2024 Jan 2;3(1):pgad485. doi: 10.1093/pnasnexus/pgad485. eCollection 2024 Jan.

Abstract

America's racial framework can be summarized using two distinct dimensions: superiority/inferiority and Americanness/foreignness. We investigated America's racial framework in a corpus of spoken and written language using word embeddings. Word embeddings place words on a low-dimensional space where words with similar meanings are proximate, allowing researchers to test whether the positions of group and attribute words in a semantic space reflect stereotypes. We trained a word embedding model on the Corpus of Contemporary American English-a corpus of 1 billion words that span 30 years and 8 text categories-and compared the positions of racial/ethnic groups with respect to superiority and Americanness. We found that America's racial framework is embedded in American English. We also captured an additional nuance: Asian people were stereotyped as more American than Hispanic people. These results are empirical evidence that America's racial framework is embedded in American English.

Keywords: ethnicity; natural language processing; race; stereotypes; word embeddings.