Signatures of cross-modal alignment in children's early concepts

Proc Natl Acad Sci U S A. 2023 Oct 17;120(42):e2309688120. doi: 10.1073/pnas.2309688120. Epub 2023 Oct 11.

Abstract

Whether supervised or unsupervised, human and machine learning is usually characterized as event-based. However, learning may also proceed by systems alignment in which mappings are inferred between entire systems, such as visual and linguistic systems. Systems alignment is possible because items that share similar visual contexts, such as a car and a truck, will also tend to share similar linguistic contexts. Because of the mirrored similarity relationships across systems, the visual and linguistic systems can be aligned at some later time absent either input. In a series of simulation studies, we considered whether children's early concepts support systems alignment. We found that children's early concepts are close to optimal for inferring novel concepts through systems alignment, enabling agents to correctly infer more than 85% of visual-word mappings absent supervision. One possible explanation for why children's early concepts support systems alignment is that they are distinguished structurally by their dense semantic neighborhoods. Artificial agents using these structural features to select concepts proved highly effective, both in environments mirroring children's conceptual world and those that exclude the concepts that children commonly acquire. For children, systems alignment and event-based learning likely complement one another. Likewise, artificial systems can benefit from incorporating these developmental principles.

Keywords: alignment; asynchronous; learning; multimodal.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Child
  • Computer Simulation
  • Humans
  • Linguistics*
  • Residence Characteristics
  • Semantics*