Uncertainty maximization in partially observable domains: A cognitive perspective

Neural Netw. 2023 May:162:456-471. doi: 10.1016/j.neunet.2023.02.044. Epub 2023 Mar 10.

Abstract

Faced with an ever-increasing complexity of their domains of application, artificial learning agents are now able to scale up in their ability to process an overwhelming amount of data. However, this comes at the cost of encoding and processing an increasing amount of redundant information. This work exploits the possibility of learning systems, applied in partially observable domains, to selectively focus on the specific type of information that is more likely related to the causal interaction among transitioning states. A temporal difference displacement criterion is defined to implement adaptive masking of the observations. It can enable a significant improvement of convergence of temporal difference algorithms applied to partially observable Markov processes, as shown by experiments performed under a variety of machine learning problems, ranging from highly complex visuals as Atari games to simple textbook control problems such as CartPole. The proposed framework can be added to most RL algorithms since it only affects the observation process, selecting the parts more promising to explain the dynamics of the environment and reducing the dimension of the observation space.

Keywords: Attention mechanisms and development; Cognitive modeling; Entropy; Neural networks for development; Partially observable Markov decision process; Reinforcement learning.

MeSH terms

  • Algorithms*
  • Cognition
  • Machine Learning*
  • Markov Chains
  • Uncertainty