Capturing Spatial Interdependence in Image Features: The Counting Grid, an Epitomic Representation for Bags of Features

IEEE Trans Pattern Anal Mach Intell. 2015 Dec;37(12):2374-87. doi: 10.1109/TPAMI.2015.2424864.

Abstract

In recent scene recognition research images or large image regions are often represented as disorganized "bags" of features which can then be analyzed using models originally developed to capture co-variation of word counts in text. However, image feature counts are likely to be constrained in different ways than word counts in text. For example, as a camera pans upwards from a building entrance over its first few floors and then further up into the sky Fig. 1 Fig. 1. Feature counts change slightly as the field of view moves. For example, the abundance of the "car" features is reduced, but the counts of the features found on building facades are increased. The counting grid model accounts for such changes naturally, and it can also account for images of different scenes.