Diagnostic accuracy of symptoms for an underlying disease: a simulation study

Yi-Sheng Chao; Chao-Jung Wu; Yi-Chun Lai; Hui-Ting Hsu; Yen-Po Cheng; Hsing-Chien Wu; Shih-Yu Huang; Wei-Chih Chen

doi:10.1038/s41598-022-14826-2

Diagnostic accuracy of symptoms for an underlying disease: a simulation study

Sci Rep. 2022 Aug 15;12(1):13810. doi: 10.1038/s41598-022-14826-2.

Authors

Yi-Sheng Chao^#¹, Chao-Jung Wu², Yi-Chun Lai³, Hui-Ting Hsu⁴, Yen-Po Cheng⁴, Hsing-Chien Wu⁵, Shih-Yu Huang^{6

7}, Wei-Chih Chen^{8

9}

Affiliations

¹ , Montreal, Canada. chaoyisheng@post.harvard.edu.
² Université du Québec à Montréal, Montreal, Canada.
³ National Yang Ming Chiao Tung University Hospital, Yilan City, Taiwan.
⁴ Changhua Christian Hospital, Changhua City, Taiwan.
⁵ Jinshan Branch, National Taiwan University Hospital, New Taipei City, Taiwan.
⁶ Shuang Ho Hospital, New Taipei City, Taiwan.
⁷ Taipei Medical University, Taipei, Taiwan.
⁸ Taipei Veterans General Hospital, Taipei, Taiwan.
⁹ National Yang Ming Chiao Tung University, Taipei, Taiwan.

^# Contributed equally.

Abstract

Symptoms have been used to diagnose conditions such as frailty and mental illnesses. However, the diagnostic accuracy of the numbers of symptoms has not been well studied. This study aims to use equations and simulations to demonstrate how the factors that determine symptom incidence influence symptoms' diagnostic accuracy for disease diagnosis. Assuming a disease causing symptoms and correlated with the other disease in 10,000 simulated subjects, 40 symptoms occurred based on 3 epidemiological measures: proportions diseased, baseline symptom incidence (among those not diseased), and risk ratios. Symptoms occurred with similar correlation coefficients. The sensitivities and specificities of single symptoms for disease diagnosis were exhibited as equations using the three epidemiological measures and approximated using linear regression in simulated populations. The areas under curves (AUCs) of the receiver operating characteristic (ROC) curves was the measure to determine the diagnostic accuracy of multiple symptoms, derived by using 2 to 40 symptoms for disease diagnosis. With respect to each AUC, the best set of sensitivity and specificity, whose difference with 1 in the absolute value was maximal, was chosen. The results showed sensitivities and specificities of single symptoms for disease diagnosis were fully explained with the three epidemiological measures in simulated subjects. The AUCs increased or decreased with more symptoms used for disease diagnosis, when the risk ratios were greater or less than 1, respectively. Based on the AUCs, with risk ratios were similar to 1, symptoms did not provide diagnostic values. When risk ratios were greater or less than 1, maximal or minimal AUCs usually could be reached with less than 30 symptoms. The maximal AUCs and their best sets of sensitivities and specificities could be well approximated with the three epidemiological and interaction terms, adjusted R-squared ≥ 0.69. However, the observed overall symptom correlations, overall symptom incidence, and numbers of symptoms explained a small fraction of the AUC variances, adjusted R-squared ≤ 0.03. In conclusion, the sensitivities and specificities of single symptoms for disease diagnosis can be explained fully by the at-risk incidence and the 1 minus baseline incidence, respectively. The epidemiological measures and baseline symptom correlations can explain large fractions of the variances of the maximal AUCs and the best sets of sensitivities and specificities. These findings are important for researchers who want to assess the diagnostic accuracy of composite diagnostic criteria.

MeSH terms

Area Under Curve
Humans
ROC Curve
Sensitivity and Specificity*