Discrimination between healthy participants and people with panic disorder based on polygenic scores for psychiatric disorders and for intermediate phenotypes using machine learning

Aust N Z J Psychiatry. 2024 Apr 6:48674241242936. doi: 10.1177/00048674241242936. Online ahead of print.

Abstract

Objective: Panic disorder is a modestly heritable condition. Currently, diagnosis is based only on clinical symptoms; identifying objective biomarkers and a more reliable diagnostic procedure is desirable. We investigated whether people with panic disorder can be reliably diagnosed utilizing combinations of multiple polygenic scores for psychiatric disorders and their intermediate phenotypes, compared with single polygenic score approaches, by applying specific machine learning techniques.

Methods: Polygenic scores for 48 psychiatric disorders and intermediate phenotypes based on large-scale genome-wide association studies (n = 7556-1,131,881) were calculated for people with panic disorder (n = 718) and healthy controls (n = 1717). Discrimination between people with panic disorder and healthy controls was based on the 48 polygenic scores using five methods for classification: logistic regression, neural networks, quadratic discriminant analysis, random forests and a support vector machine. Differences in discrimination accuracy (area under the curve) due to an increased number of polygenic score combinations and differences in the accuracy across five classifiers were investigated.

Results: All five classifiers performed relatively well for distinguishing people with panic disorder from healthy controls by increasing the number of polygenic scores. Of the 48 polygenic scores, the polygenic score for anxiety UK Biobank was the most useful for discrimination by the classifiers. In combinations of two or three polygenic scores, the polygenic score for anxiety UK Biobank was included as one of polygenic scores in all classifiers. When all 48 polygenic scores were used in combination, the greatest areas under the curve significantly differed among the five classifiers. Support vector machine and logistic regression had higher accuracy than quadratic discriminant analysis and random forests. For each classifier, the greatest area under the curve was 0.600 ± 0.030 for logistic regression (polygenic score combinations N = 14), 0.591 ± 0.039 for neural networks (N = 9), 0.603 ± 0.033 for quadratic discriminant analysis (N = 10), 0.572 ± 0.039 for random forests (N = 25) and 0.617 ± 0.041 for support vector machine (N = 11). The greatest areas under the curve at the best polygenic score combination significantly differed among the five classifiers. Random forests had the lowest accuracy among classifiers. Support vector machine had higher accuracy than neural networks.

Conclusions: These findings suggest that increasing the number of polygenic score combinations up to approximately 10 effectively improved the discrimination accuracy and that support vector machine exhibited greater accuracy among classifiers. However, the discrimination accuracy for panic disorder, when based solely on polygenic score combinations, was found to be modest.

Keywords: Panic disorder; classifier; intermediate phenotype; machine learning; polygenic score.