PWLU: Learning Specialized Activation Functions With the Piecewise Linear Unit

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12269-12286. doi: 10.1109/TPAMI.2023.3286109. Epub 2023 Sep 5.

Abstract

The choice of activation functions is crucial to deep neural networks. ReLU is a popular hand-designed activation function. Swish, the automatically searched activation function, outperforms ReLU on many challenging datasets. However, the search method has two main drawbacks. First, the tree-based search space is highly discrete and restricted, which is difficult to search. Second, the sample-based search method is inefficient in finding specialized activation functions for each dataset or neural architecture. To overcome these drawbacks, we propose a new activation function called Piecewise Linear Unit (PWLU), incorporating a carefully designed formulation and learning method. PWLU can learn specialized activation functions for different models, layers, or channels. Besides, we propose a non-uniform version of PWLU, which maintains sufficient flexibility but requires fewer intervals and parameters. Additionally, we generalize PWLU to three-dimensional space to define a piecewise linear surface named 2D-PWLU, which can be treated as a non-linear binary operator. Experimental results show that PWLU achieves SOTA performance on various tasks and models, and 2D-PWLU is better than element-wise addition when aggregating features from different branches. The proposed PWLU and its variation are easy to implement and efficient for inference, which can be widely applied in real-world applications.