The Linguistic Analysis of Scene Semantics: LASS

Behav Res Methods. 2020 Dec;52(6):2349-2371. doi: 10.3758/s13428-020-01390-8.

Abstract

In this paper, we define a new method for analyzing object-scene contextual relationships using computational linguistics: Linguistic Analysis of Scene Semantics, or LASS. LASS uses linguistic semantic similarity relationships between scene object and context labels embedded in a vector-space language model: Facebook Research's fastText. Importantly, the use of fastText permits semantic similarity score calculation between any set of strings and thus elements of any set of image data for which labels are available. Scene semantic similarity scores are then embedded in object segmentation mask locations in the image, creating a semantic similarity map. LASS can also be fully automated by generating context and object labels, as well as object segmentation masks, using deep learning. We compare semantic similarity maps between human- and neural network-generated annotations on a corpus of images taken from the LabelMe database. Semantic similarity maps produced by the fully automated LASS have a number of desirable properties, while maintaining a high degree of spatial and semantic similarity to them. Finally, we use LASS to evaluate the distribution of semantically consistent scene elements in space. Both show relatively uniform distributions of semantic relatedness to scene context, suggesting that contextually appropriate objects are likely to be found in all image regions. Taken together, these results suggest that LASS is accurate, automatic, flexible, and useful in a number of research contexts such as scene grammar and novelty detection.

Keywords: Computational linguistics; Natural scenes; Scene semantics.

MeSH terms

  • Databases, Factual
  • Humans
  • Language
  • Linguistics*
  • Semantics*