Random forest of perfect trees: concept, performance, applications and perspectives

Bioinformatics. 2021 Aug 9;37(15):2165-2174. doi: 10.1093/bioinformatics/btab074.

Abstract

Motivation: The principle of Breiman's random forest (RF) is to build and assemble complementary classification trees in a way that maximizes their variability. We propose a new type of random forest that disobeys Breiman's principles and involves building trees with no classification errors in very large quantities. We used a new type of decision tree that uses a neuron at each node as well as an in-innovative half Christmas tree structure. With these new RFs, we developed a score, based on a family of ten new statistical information criteria, called Nguyen information criteria (NICs), to evaluate the predictive qualities of features in three dimensions.

Results: The first NIC allowed the Akaike information criterion to be minimized more quickly than data obtained with the Gini index when the features were introduced in a logistic regression model. The selected features based on the NICScore showed a slight advantage compared to the support vector machines-recursive feature elimination (SVM-RFE) method. We demonstrate that the inclusion of artificial neurons in tree nodes allows a large number of classifiers in the same node to be taken into account simultaneously and results in perfect trees without classification errors.

Availability and implementation: The methods used to build the perfect trees in this article were implemented in the 'ROP' R package, archived at https://cran.r-project.org/web/packages/ROP/index.html.

Supplementary information: Supplementary data are available at Bioinformatics online.