Random forest of perfect trees: concept, performance, applications and perspectives

Jean-Michel Nguyen; Pascal Jézéquel; Pierre Gillois; Luisa Silva; Faouda Ben Azzouz; Sophie Lambert-Lacroix; Philippe Juin; Mario Campone; Aurélie Gaultier; Alexandre Moreau-Gaudry; Daniel Antonioli

doi:10.1093/bioinformatics/btab074

Random forest of perfect trees: concept, performance, applications and perspectives

Bioinformatics. 2021 Aug 9;37(15):2165-2174. doi: 10.1093/bioinformatics/btab074.

Authors

Affiliations

¹ Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques, Applications (TIMC-IMAG) -UMR 5525, Université Grenoble Alpes-CNRS, France.
² CRCINA - INCIT Department - Team 2 - 8, quai Moncousu - BP 70721 - 44007 Nantes cedex 1 , France.
³ Institut de Cancérologie de l'Ouest, Bd Jacques Monod, Unité de Bioinfomique, Saint Herblain Cedex, 44805, France.
⁴ École Centrale de Nantes, High Performance Computing Institute, Nantes Cedex 3, 44321, France.
⁵ Département STID, IUT2 de Grenoble-Université Grenoble Alpes, St Martin d'Heres, 38400, France.
⁶ CRCINA, INSERM, CNRS, Université de Nantes, Université d'Angers, Institut de Recherche en Santé-Université de Nantes, Nantes Cedex 1, 44007, France.
⁷ Oncologie Médicale, Institut de Cancérologie de l'Ouest-René Gauducheau, Saint Herblain Cedex, 44805, France.
⁸ Nantes Department of General Practice, 1 rue G. Veil, 44000, Nantes, France.
⁹ Medical Informatics, Tournemire, Quartier Bellevue, France.

Abstract

Motivation: The principle of Breiman's random forest (RF) is to build and assemble complementary classification trees in a way that maximizes their variability. We propose a new type of random forest that disobeys Breiman's principles and involves building trees with no classification errors in very large quantities. We used a new type of decision tree that uses a neuron at each node as well as an in-innovative half Christmas tree structure. With these new RFs, we developed a score, based on a family of ten new statistical information criteria, called Nguyen information criteria (NICs), to evaluate the predictive qualities of features in three dimensions.

Results: The first NIC allowed the Akaike information criterion to be minimized more quickly than data obtained with the Gini index when the features were introduced in a logistic regression model. The selected features based on the NICScore showed a slight advantage compared to the support vector machines-recursive feature elimination (SVM-RFE) method. We demonstrate that the inclusion of artificial neurons in tree nodes allows a large number of classifiers in the same node to be taken into account simultaneously and results in perfect trees without classification errors.

Availability and implementation: The methods used to build the perfect trees in this article were implemented in the 'ROP' R package, archived at https://cran.r-project.org/web/packages/ROP/index.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

Grants and funding

#OG1811080/2019/Institut de Calcul Intensif