Interpretable Rule Discovery Through Bilevel Optimization of Split-Rules of Nonlinear Decision Trees for Classification Problems

Yashesh Dhebar; Kalyanmoy Deb

doi:10.1109/TCYB.2020.3033003

Interpretable Rule Discovery Through Bilevel Optimization of Split-Rules of Nonlinear Decision Trees for Classification Problems

IEEE Trans Cybern. 2021 Nov;51(11):5573-5584. doi: 10.1109/TCYB.2020.3033003. Epub 2021 Nov 9.

Authors

Yashesh Dhebar, Kalyanmoy Deb

PMID: 33259315
DOI: 10.1109/TCYB.2020.3033003

Abstract

For supervised classification problems involving design, control, and other practical purposes, users are not only interested in finding a highly accurate classifier but they also demand that the obtained classifier be easily interpretable. While the definition of interpretability of a classifier can vary from case to case, here, by a humanly interpretable classifier, we restrict it to be expressed in simplistic mathematical terms. As a novel approach, we represent a classifier as an assembly of simple mathematical rules using a nonlinear decision tree (NLDT). Each conditional (nonterminal) node of the tree represents a nonlinear mathematical rule (split-rule) involving features in order to partition the dataset in the given conditional node into two nonoverlapping subsets. This partitioning is intended to minimize the impurity of the resulting child nodes. By restricting the structure of the split-rule at each conditional node and depth of the decision tree, the interpretability of the classifier is ensured. The nonlinear split-rule at a given conditional node is obtained using an evolutionary bilevel optimization algorithm, in which while the upper level focuses on arriving at an interpretable structure of the split-rule, the lower level achieves the most appropriate weights (coefficients) of individual constituents of the rule to minimize the net impurity of two resulting child nodes. The performance of the proposed algorithm is demonstrated on a number of controlled test problems, existing benchmark problems, and industrial problems. Results on 2-500 feature problems are encouraging and open up further scopes of applying the proposed approach to more challenging and complex classification tasks.