Parallel orthogonal deep neural network

Neural Netw. 2021 Aug:140:167-183. doi: 10.1016/j.neunet.2021.03.002. Epub 2021 Mar 13.

Abstract

Ensemble learning methods combine multiple models to improve performance by exploiting their diversity. The success of these approaches relies heavily on the dissimilarity of the base models forming the ensemble. This diversity can be achieved in many ways, with well-known examples including bagging and boosting. It is the diversity of the models within an ensemble that allows the ensemble to correct the errors made by its members, and consequently leads to higher classification or regression performance. A mistake made by a base model can only be rectified if other members behave differently on that particular instance, and provide the aggregator with enough information to make an informed decision. On the contrary, lack of diversity not only lowers model performance, but also wastes computational resources. Nevertheless, in the current state of the art ensemble approaches, there is no guarantee on the level of diversity achieved, and no mechanism ensuring that each member will learn a different decision boundary from the others. In this paper, we propose a parallel orthogonal deep learning architecture in which diversity is enforced by design, through imposing an orthogonality constraint. Multiple deep neural networks are created, parallel to each other. At each parallel layer, the outputs of different base models are subject to Gram-Schmidt orthogonalization. We demonstrate that this approach leads to a high level of diversity from two perspectives. First, the models make different errors on different parts of feature space, and second, they exhibit different levels of uncertainty in their decisions. Experimental results confirm the benefits of the proposed method, compared to standard deep learning models and well-known ensemble methods, in terms of diversity and, as a result, classification performance.

Keywords: Deep learning; Diversity; Ensemble learning; Orthogonalization; Uncertainty.

MeSH terms

  • Deep Learning*