Local minima and plateaus in hierarchical structures of multilayer perceptrons

Neural Netw. 2000 Apr;13(3):317-27. doi: 10.1016/s0893-6080(00)00009-5.

Abstract

Local minima and plateaus pose a serious problem in learning of neural networks. We investigate the hierarchical geometric structure of the parameter space of three-layer perceptrons in order to show the existence of local minima and plateaus. It is proved that a critical point of the model with H - 1 hidden units always gives many critical points of the model with H hidden units. These critical points consist of many lines in the parameter space, which can cause plateaus in learning of neural networks. Based on this result, we prove that a point in the critical lines corresponding to the global minimum of the smaller model can be a local minimum or a saddle point of the larger model. We give a necessary and sufficient condition for this, and show that this kind of local minima exist as a line segment if any. The results are universal in the sense that they do not require special properties of the target, loss functions and activation functions, but only use the hierarchical structure of the model.

MeSH terms

  • Models, Neurological*
  • Neural Networks, Computer*