Unified framework for recognition, localization and mapping using wearable cameras

Ricardo Vázquez-Martín; Antonio Bandera

doi:10.1007/s10339-012-0496-2

Unified framework for recognition, localization and mapping using wearable cameras

Cogn Process. 2012 Aug:13 Suppl 1:S351-4. doi: 10.1007/s10339-012-0496-2.

Authors

Ricardo Vázquez-Martín¹, Antonio Bandera

Affiliation

¹ Centro Andaluz de Innovación y Tecnologías de la Información y las Comunicaciones (CITIC), Málaga, Spain. rvazquez@citic.es

PMID: 22806676
DOI: 10.1007/s10339-012-0496-2

Abstract

Monocular approaches to simultaneous localization and mapping (SLAM) have recently addressed with success the challenging problem of the fast computation of dense reconstructions from a single, moving camera. Thus, if these approaches initially relied on the detection of a reduced set of interest points to estimate the camera position and the map, they are currently able to reconstruct dense maps from a handheld camera while the camera coordinates are simultaneously computed. However, these maps of 3-dimensional points usually remain meaningless, that is, with no memorable items and without providing a way of encoding spatial relationships between objects and paths. In humans and mobile robotics, landmarks play a key role in the internalization of a spatial representation of an environment. They are memorable cues that can serve to define a region of the space or the location of other objects. In a topological representation of the space, landmarks can be identified and located according to its structural, perceptive or semantic significance and distinctiveness. But on the other hand, landmarks may be difficult to be located in a metric representation of the space. Restricted to the domain of visual landmarks, this work describes an approach where the map resulting from a point-based, monocular SLAM is annotated with the semantic information provided by a set of distinguished landmarks. Both features are obtained from the image. Hence, they can be linked by associating to each landmark all those point-based features that are superimposed to the landmark in a given image (key-frame). Visual landmarks will be obtained by means of an object-based, bottom-up attention mechanism, which will extract from the image a set of proto-objects. These proto-objects could not be always associated with natural objects, but they will typically constitute significant parts of these scene objects and can be appropriately annotated with semantic information. Moreover, they will be affine covariant regions, that is, they will be invariant to affine transformation, being detected under different viewing conditions (view-point angle, rotation, scale, etc.). Monocular SLAM will be solved using the accurate parallel tracking and mapping (PTAM) framework by Klein and Murray in Proceedings of IEEE/ACM international symposium on mixed and augmented reality, 2007.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Humans
Imaging, Three-Dimensional
Pattern Recognition, Visual / physiology*
Photic Stimulation
Photography / methods*
Recognition, Psychology / physiology*
Signal Detection, Psychological*
Video Recording