Neural scene representation and rendering

S M Ali Eslami; Danilo Jimenez Rezende; Frederic Besse; Fabio Viola; Ari S Morcos; Marta Garnelo; Avraham Ruderman; Andrei A Rusu; Ivo Danihelka; Karol Gregor; David P Reichert; Lars Buesing; Theophane Weber; Oriol Vinyals; Dan Rosenbaum; Neil Rabinowitz; Helen King; Chloe Hillier; Matt Botvinick; Daan Wierstra; Koray Kavukcuoglu; Demis Hassabis

doi:10.1126/science.aar6170

Neural scene representation and rendering

Science. 2018 Jun 15;360(6394):1204-1210. doi: 10.1126/science.aar6170.

Authors

S M Ali Eslami¹, Danilo Jimenez Rezende², Frederic Besse², Fabio Viola², Ari S Morcos², Marta Garnelo², Avraham Ruderman², Andrei A Rusu², Ivo Danihelka², Karol Gregor², David P Reichert², Lars Buesing², Theophane Weber², Oriol Vinyals², Dan Rosenbaum², Neil Rabinowitz², Helen King², Chloe Hillier², Matt Botvinick², Daan Wierstra², Koray Kavukcuoglu², Demis Hassabis²

Affiliations

¹ DeepMind, 5 New Street Square, London EC4A 3TW, UK. aeslami@google.com.
² DeepMind, 5 New Street Square, London EC4A 3TW, UK.

PMID: 29903970
DOI: 10.1126/science.aar6170

Abstract

Scene representation-the process of converting visual sensory data into concise descriptions-is a requirement for intelligent behavior. Recent work has shown that neural networks excel at this task when provided with large, labeled datasets. However, removing the reliance on human labeling remains an important open problem. To this end, we introduce the Generative Query Network (GQN), a framework within which machines learn to represent scenes using only their own sensors. The GQN takes as input images of a scene taken from different viewpoints, constructs an internal representation, and uses this representation to predict the appearance of that scene from previously unobserved viewpoints. The GQN demonstrates representation learning without human labels or domain knowledge, paving the way toward machines that autonomously learn to understand the world around them.

MeSH terms

Machine Learning*
Neural Networks, Computer*
Vision, Ocular*