LexFindR: A fast, simple, and extensible R package for finding similar words in a lexicon

ZhaoBin Li; Anne Marie Crinnion; James S Magnuson

doi:10.3758/s13428-021-01667-6

LexFindR: A fast, simple, and extensible R package for finding similar words in a lexicon

Behav Res Methods. 2022 Jun;54(3):1388-1402. doi: 10.3758/s13428-021-01667-6. Epub 2021 Sep 30.

Authors

ZhaoBin Li¹, Anne Marie Crinnion^{2

3}, James S Magnuson^{4

5

6

7}

Affiliations

¹ Department of Mathematics and Statistics, Carleton College, Northfield, MN, USA.
² Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, CT, USA.
³ Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA.
⁴ Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, CT, USA. james.magnuson@uconn.edu.
⁵ Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA. james.magnuson@uconn.edu.
⁶ BCBL. Basque Center on Cognition Brain and Language, Donostia-San Sebastián, Spain. james.magnuson@uconn.edu.
⁷ Ikerbasque. Basque Foundation for Science, Bilbao, Spain. james.magnuson@uconn.edu.

PMID: 34595672
DOI: 10.3758/s13428-021-01667-6

Abstract

Language scientists often need to generate lists of related words, such as potential competitors. They may do this for purposes of experimental control (e.g., selecting items matched on lexical neighborhood but varying in word frequency), or to test theoretical predictions (e.g., hypothesizing that a novel type of competitor may impact word recognition). Several online tools are available, but most are constrained to a fixed lexicon and fixed sets of competitor definitions, and may not give the user full access to or control of source data. We present LexFindR, an open-source R package that can be easily modified to include additional, novel competitor types. LexFindR is easy to use. Because it can leverage multiple CPU cores and uses vectorized code when possible, it is also extremely fast. In this article, we present an overview of LexFindR usage, illustrated with examples. We also explain the details of how we implemented several standard lexical competitor types used in spoken word recognition research (e.g., cohorts, neighbors, embeddings, rhymes), and show how "lexical dimensions" (e.g., word frequency, word length, uniqueness point) can be integrated into LexFindR workflows (for example, to calculate "frequency-weighted competitor probabilities"), for both spoken and visual word recognition research.

Keywords: Lexicon; Psycholinguistics; Word recognition.

Publication types

Review
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Humans
Language
Speech Perception*

Grants and funding

T32 DC017703/DC/NIDCD NIH HHS/United States