Supervised Deep Learning Techniques for Image Description: A Systematic Review

Entropy (Basel). 2023 Mar 23;25(4):553. doi: 10.3390/e25040553.

Abstract

Automatic image description, also known as image captioning, aims to describe the elements included in an image and their relationships. This task involves two research fields: computer vision and natural language processing; thus, it has received much attention in computer science. In this review paper, we follow the Kitchenham review methodology to present the most relevant approaches to image description methodologies based on deep learning. We focused on works using convolutional neural networks (CNN) to extract the characteristics of images and recurrent neural networks (RNN) for automatic sentence generation. As a result, 53 research articles using the encoder-decoder approach were selected, focusing only on supervised learning. The main contributions of this systematic review are: (i) to describe the most relevant image description papers implementing an encoder-decoder approach from 2014 to 2022 and (ii) to determine the main architectures, datasets, and metrics that have been applied to image description.

Keywords: computer vision; convolutional neural network; image captioning; natural language processing; recurrent neural network.

Publication types

  • Review

Grants and funding

This research received no external funding.