CytoPred: 7-gene pair metric for AML cytogenetic risk prediction

Brief Bioinform. 2020 Jan 17;21(1):348-354. doi: 10.1093/bib/bby100.

Abstract

Cytogenetic-based subjective prognostication of acute myeloid leukemia (AML) patients is a cumbersome process. Top scoring pair (TSP)-based decision tree using a robust analytical algorithm with statistical rigor offers a promising alternative. We describe CytoPred as a 7-gene pair signature based on the analysis of 2547 AML patient sample gene expression data using a modified TSP algorithm to estimate cytogenetic risk. The essential modification in TSP that helped computational encumbrance includes the filtration of gene pairs above random weighted guessers as well as sampling the gene pairs from the original gene pair pool to reduce overfitting issue. The CytoPred classifies AML cohort into clinically relevant `good' and `Int_poor' prognosis groups with distinct survival differences. The 7-gene pair was derived using 1248 AML patient samples in training set and 675 samples used for internal testing of the algorithm. The finest classifier 7-gene pair was picked from an initial pool size of 6.1 × 107 gene pairs that generated 57 687 decision trees. Further, for unbiased evaluation of CytoPred performance, we did an independent validation in 624 AML patient cohort. The CytoPred well qualifies the cutoffs for diagnostic application with 98.27% sensitivity and 99.27% specificity to predictive value in Int_poor class while 97.09% sensitivity and 91.74% specificity to predictive value for good class. Furthermore, CytoPred predicts almost identical survival probabilities like cytogenetics and its performance is not much influenced by various recurrent mutations as well as individual French-American-British (FAB) subtypes. In summary, we present a robust 7-gene pair-based metric to clinically prognosticate AML patients.

Keywords: AML; cytogenetics; gene signature; top scoring pair.