LexFindR: A fast, simple, and extensible R package for finding similar words in a lexicon

Behav Res Methods. 2022 Jun;54(3):1388-1402. doi: 10.3758/s13428-021-01667-6. Epub 2021 Sep 30.

Abstract

Language scientists often need to generate lists of related words, such as potential competitors. They may do this for purposes of experimental control (e.g., selecting items matched on lexical neighborhood but varying in word frequency), or to test theoretical predictions (e.g., hypothesizing that a novel type of competitor may impact word recognition). Several online tools are available, but most are constrained to a fixed lexicon and fixed sets of competitor definitions, and may not give the user full access to or control of source data. We present LexFindR, an open-source R package that can be easily modified to include additional, novel competitor types. LexFindR is easy to use. Because it can leverage multiple CPU cores and uses vectorized code when possible, it is also extremely fast. In this article, we present an overview of LexFindR usage, illustrated with examples. We also explain the details of how we implemented several standard lexical competitor types used in spoken word recognition research (e.g., cohorts, neighbors, embeddings, rhymes), and show how "lexical dimensions" (e.g., word frequency, word length, uniqueness point) can be integrated into LexFindR workflows (for example, to calculate "frequency-weighted competitor probabilities"), for both spoken and visual word recognition research.

Keywords: Lexicon; Psycholinguistics; Word recognition.

Publication types

  • Review
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Language
  • Speech Perception*