The deep arbitrary polynomial chaos neural network or how Deep Artificial Neural Networks could benefit from data-driven homogeneous chaos theory

Neural Netw. 2023 Sep:166:85-104. doi: 10.1016/j.neunet.2023.06.036. Epub 2023 Jul 10.

Abstract

Artificial Intelligence and Machine learning have been widely used in various fields of mathematical computing, physical modeling, computational science, communication science, and stochastic analysis. Approaches based on Deep Artificial Neural Networks (DANN) are very popular in our days. Depending on the learning task, the exact form of DANNs is determined via their multi-layer architecture, activation functions and the so-called loss function. However, for a majority of deep learning approaches based on DANNs, the kernel structure of neural signal processing remains the same, where the node response is encoded as a linear superposition of neural activity, while the non-linearity is triggered by the activation functions. In the current paper, we suggest to analyze the neural signal processing in DANNs from the point of view of homogeneous chaos theory as known from polynomial chaos expansion (PCE). From the PCE perspective, the (linear) response on each node of a DANN could be seen as a 1st degree multi-variate polynomial of single neurons from the previous layer, i.e. linear weighted sum of monomials. From this point of view, the conventional DANN structure relies implicitly (but erroneously) on a Gaussian distribution of neural signals. Additionally, this view revels that by design DANNs do not necessarily fulfill any orthogonality or orthonormality condition for a majority of data-driven applications. Therefore, the prevailing handling of neural signals in DANNs could lead to redundant representation as any neural signal could contain some partial information from other neural signals. To tackle that challenge, we suggest to employ the data-driven generalization of PCE theory known as arbitrary polynomial chaos (aPC) to construct a corresponding multi-variate orthonormal representations on each node of a DANN. Doing so, we generalize the conventional structure of DANNs to Deep arbitrary polynomial chaos neural networks (DaPC NN). They decompose the neural signals that travel through the multi-layer structure by an adaptive construction of data-driven multi-variate orthonormal bases for each layer. Moreover, the introduced DaPC NN provides an opportunity to go beyond the linear weighted superposition of single neurons on each node. Inheriting fundamentals of PCE theory, the DaPC NN offers an additional possibility to account for high-order neural effects reflecting simultaneous interaction in multi-layer networks. Introducing the high-order weighted superposition on each node of the network mitigates the necessity to introduce non-linearity via activation functions and, hence, reduces the room for potential subjectivity in the modeling procedure. Although the current DaPC NN framework has no theoretical restrictions on the use of activation functions. The current paper also summarizes relevant properties of DaPC NNs inherited from aPC as analytical expressions for statistical quantities and sensitivity indexes on each node. We also offer an analytical form of partial derivatives that could be used in various training algorithms. Technically, DaPC NNs require similar training procedures as conventional DANNs, and all trained weights determine automatically the corresponding multi-variate data-driven orthonormal bases for all layers of DaPC NN. The paper makes use of three test cases to illustrate the performance of DaPC NN, comparing it with the performance of the conventional DANN and also with plain aPC expansion. Evidence of convergence over the training data size against validation data sets demonstrates that the DaPC NN outperforms the conventional DANN systematically. Overall, the suggested re-formulation of the kernel network structure in terms of homogeneous chaos theory is not limited to any particular architecture or any particular definition of the loss function. The DaPC NN Matlab Toolbox is available online and users are invited to adopt it for own needs.

Keywords: Artificial Intelligence; Deep Artificial Neural Network; High-order neural interactions; Machine learning; Orthogonal decomposition; Polynomial chaos expansion.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Neural Networks, Computer
  • Neurons
  • Nonlinear Dynamics*