Designing Universally-Approximating Deep Neural Networks: A First-Order Optimization Approach

Zhoutong Wu; Mingqing Xiao; Cong Fang; Zhouchen Lin

doi:10.1109/TPAMI.2024.3380007

Designing Universally-Approximating Deep Neural Networks: A First-Order Optimization Approach

IEEE Trans Pattern Anal Mach Intell. 2024 Mar 25:PP. doi: 10.1109/TPAMI.2024.3380007. Online ahead of print.

Authors

Zhoutong Wu, Mingqing Xiao, Cong Fang, Zhouchen Lin

PMID: 38526901
DOI: 10.1109/TPAMI.2024.3380007

Abstract

Universal approximation capability, also referred to as universality, is an important property of deep neural networks, endowing them with the potency to accurately represent the underlying target function in learning tasks. In practice, the architecture of deep neural networks largely influences the performance of the models. However, most existing methodologies for designing neural architectures, such as the heuristic manual design or neural architecture search, ignore the universal approximation property, thus losing a potential safeguard about the performance. In this paper, we propose a unified framework to design the architectures of deep neural networks with a universality guarantee based on first-order optimization algorithms, where the forward pass is interpreted as the updates of an optimization algorithm. The (explicit or implicit) network is designed by replacing each gradient term in the algorithm with a learnable module similar to a two-layer network or its derivatives Specifically, we explore the realm of width-bounded neural networks, a common practical scenario, showcasing their universality. Moreover, adding operations of normalization, downsampling, and upsampling does not hurt the universality. To the best of our knowledge, this is the first work that width-bounded networks with universal approximation guarantee can be designed in a principled way. Our framework can inspire a variety of neural architectures including some renowned structures such as ResNet and DenseNet, as well as novel innovations. The experimental results on image classification problems demonstrate that the newly inspired networks are competitive and surpass the baselines of ResNet, DenseNet, as well as the advanced ConvNeXt and ViT, testifying to the effectiveness of our framework.