2D medical image synthesis using transformer-based denoising diffusion probabilistic model

Shaoyan Pan; Tonghe Wang; Richard L J Qiu; Marian Axente; Chih-Wei Chang; Junbo Peng; Ashish B Patel; Joseph Shelton; Sagar A Patel; Justin Roper; Xiaofeng Yang

doi:10.1088/1361-6560/acca5c

2D medical image synthesis using transformer-based denoising diffusion probabilistic model

Phys Med Biol. 2023 May 5;68(10):105004. doi: 10.1088/1361-6560/acca5c.

Authors

Shaoyan Pan^{1

2}, Tonghe Wang³, Richard L J Qiu¹, Marian Axente¹, Chih-Wei Chang¹, Junbo Peng⁴, Ashish B Patel¹, Joseph Shelton¹, Sagar A Patel¹, Justin Roper¹, Xiaofeng Yang^{1

2

4}

Affiliations

¹ Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, GA 30322, United States of America.
² Department of Biomedical Informatics, Emory University, Atlanta, GA 30322, United States of America.
³ Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States of America.
⁴ Nuclear and Radiological Engineering and Medical physics Programs, George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States of America.

Abstract

Objective. Artificial intelligence (AI) methods have gained popularity in medical imaging research. The size and scope of the training image datasets needed for successful AI model deployment does not always have the desired scale. In this paper, we introduce a medical image synthesis framework aimed at addressing the challenge of limited training datasets for AI models.Approach. The proposed 2D image synthesis framework is based on a diffusion model using a Swin-transformer-based network. This model consists of a forward Gaussian noise process and a reverse process using the transformer-based diffusion model for denoising. Training data includes four image datasets: chest x-rays, heart MRI, pelvic CT, and abdomen CT. We evaluated the authenticity, quality, and diversity of the synthetic images using visual Turing assessments conducted by three medical physicists, and four quantitative evaluations: the Inception score (IS), Fréchet Inception Distance score (FID), feature similarity and diversity score (DS, indicating diversity similarity) between the synthetic and true images. To leverage the framework value for training AI models, we conducted COVID-19 classification tasks using real images, synthetic images, and mixtures of both images.Main results. Visual Turing assessments showed an average accuracy of 0.64 (accuracy converging to50%indicates a better realistic visual appearance of the synthetic images), sensitivity of 0.79, and specificity of 0.50. Average quantitative accuracy obtained from all datasets were IS = 2.28, FID = 37.27, FDS = 0.20, and DS = 0.86. For the COVID-19 classification task, the baseline network obtained an accuracy of 0.88 using a pure real dataset, 0.89 using a pure synthetic dataset, and 0.93 using a dataset mixed of real and synthetic data.Significance. A image synthesis framework was demonstrated for medical image synthesis, which can generate high-quality medical images of different imaging modalities with the purpose of supplementing existing training sets for AI model deployment. This method has potential applications in many data-driven medical imaging research.

Keywords: COVID-19; Swin-transformer-based network; artificial intelligence; medical image synthesis; transformer-based diffusion model.

Creative Commons Attribution license.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Artificial Intelligence*
COVID-19* / diagnostic imaging
Diffusion
Humans
Image Processing, Computer-Assisted
Models, Statistical
Tomography, X-Ray Computed

Abstract

Publication types

MeSH terms

Grants and funding