MCA-NMF: Multimodal Concept Acquisition with Non-Negative Matrix Factorization

Olivier Mangin; David Filliat; Louis Ten Bosch; Pierre-Yves Oudeyer

doi:10.1371/journal.pone.0140732

MCA-NMF: Multimodal Concept Acquisition with Non-Negative Matrix Factorization

PLoS One. 2015 Oct 21;10(10):e0140732. doi: 10.1371/journal.pone.0140732. eCollection 2015.

Authors

Olivier Mangin¹, David Filliat¹, Louis Ten Bosch², Pierre-Yves Oudeyer¹

Affiliations

¹ Flowers Team, Inria, Bordeaux, France; U2IS, ENSTA ParisTech, Université Paris Saclay, Saclay, France.
² Centre for Language and Speech Technology, Radboud University, Nijmegen, Netherlands.

Abstract

In this paper we introduce MCA-NMF, a computational model of the acquisition of multimodal concepts by an agent grounded in its environment. More precisely our model finds patterns in multimodal sensor input that characterize associations across modalities (speech utterances, images and motion). We propose this computational model as an answer to the question of how some class of concepts can be learnt. In addition, the model provides a way of defining such a class of plausibly learnable concepts. We detail why the multimodal nature of perception is essential to reduce the ambiguity of learnt concepts as well as to communicate about them through speech. We then present a set of experiments that demonstrate the learning of such concepts from real non-symbolic data consisting of speech sounds, images, and motions. Finally we consider structure in perceptual signals and demonstrate that a detailed knowledge of this structure, named compositional understanding can emerge from, instead of being a prerequisite of, global understanding. An open-source implementation of the MCA-NMF learner as well as scripts and associated experimental data to reproduce the experiments are publicly available.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Association Learning / physiology*
Cognition / physiology*
Computer Simulation*
Humans
Multimodal Imaging
Pattern Recognition, Visual / physiology
Speech / physiology

Grants and funding

This research was partially supported by the ERC Explorers 240007 funding (http://erc.europa.eu), Inria (http://inria.fr), ENSTA-Paristech (http://ensta-paristech.fr), and Université de Bordeaux (http://u-bordeaux.fr).