3-D object recognition using 2-D views

IEEE Trans Image Process. 2008 Nov;17(11):2236-55. doi: 10.1109/TIP.2008.2003404.

Abstract

We consider the problem of recognizing 3-D objects from 2-D images using geometric models and assuming different viewing angles and positions. Our goal is to recognize and localize instances of specific objects (i.e., model-based) in a scene. This is in contrast to category-based object recognition methods where the goal is to search for instances of objects that belong to a certain visual category (e.g., faces or cars). The key contribution of our work is improving 3-D object recognition by integrating Algebraic Functions of Views (AFoVs), a powerful framework for predicting the geometric appearance of an object due to viewpoint changes, with indexing and learning. During training, we compute the space of views that groups of object features can produce under the assumption of 3-D linear transformations, by combining a small number of reference views that contain the object features using AFoVs. Unrealistic views (e.g., due to the assumption of 3-D linear transformations) are eliminated by imposing a pair of rigidity constraints based on knowledge of the transformation between the reference views of the object. To represent the space of views that an object can produce compactly while allowing efficient hypothesis generation during recognition, we propose combining indexing with learning in two stages. In the first stage, we sample the space of views of an object sparsely and represent information about the samples using indexing. In the second stage, we build probabilistic models of shape appearance by sampling the space of views of the object densely and learning the manifold formed by the samples. Learning employs the Expectation-Maximization (EM) algorithm and takes place in a "universal," lower-dimensional, space computed through Random Projection (RP). During recognition, we extract groups of point features from the scene and we use indexing to retrieve the most feasible model groups that might have produced them (i.e., hypothesis generation). The likelihood of each hypothesis is then computed using the probabilistic models of shape appearance. Only hypotheses ranked high enough are considered for further verification with the most likely hypotheses verified first. The proposed approach has been evaluated using both artificial and real data, illustrating promising performance. We also present preliminary results illustrating extensions of the AFoVs framework to predict the intensity appearance of an object. In this context, we have built a hybrid recognition framework that exploits geometric knowledge to hypothesize the location of an object in the scene and both geometrical and intesnity information to verify the hypotheses.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Image Enhancement / methods*
  • Image Interpretation, Computer-Assisted / methods*
  • Imaging, Three-Dimensional / methods*
  • Pattern Recognition, Automated / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity