Using Variational Multi-view Learning for Classification of Grocery Items

Marcus Klasson; Cheng Zhang; Hedvig Kjellström

doi:10.1016/j.patter.2020.100143

Using Variational Multi-view Learning for Classification of Grocery Items

Patterns (N Y). 2020 Nov 13;1(8):100143. doi: 10.1016/j.patter.2020.100143.

Authors

Marcus Klasson¹, Cheng Zhang², Hedvig Kjellström¹

Affiliations

¹ Division of Robotics, Perception, and Learning, Lindstedtsvägen 24, 114 28 Stockholm, Sweden.
² Microsoft Research Ltd, 21 Station Road, Cambridge CB1 2FB, UK.

Abstract

An essential task for computer vision-based assistive technologies is to help visually impaired people to recognize objects in constrained environments, for instance, recognizing food items in grocery stores. In this paper, we introduce a novel dataset with natural images of groceries-fruits, vegetables, and packaged products-where all images have been taken inside grocery stores to resemble a shopping scenario. Additionally, we download iconic images and text descriptions for each item that can be utilized for better representation learning of groceries. We select a multi-view generative model, which can combine the different item information into lower-dimensional representations. The experiments show that utilizing the additional information yields higher accuracies on classifying grocery items than only using the natural images. We observe that iconic images help to construct representations separated by visual differences of the items, while text descriptions enable the model to distinguish between visually similar items by different ingredients.

Keywords: DSML 2: Proof-of-Concept: Data science output has been formulated, implemented, and tested for one domain/problem.