High-resolution 3T to 7T ADC map synthesis with a hybrid CNN-transformer model

Med Phys. 2024 Apr 17. doi: 10.1002/mp.17079. Online ahead of print.

Abstract

Background: 7 Tesla (7T) apparent diffusion coefficient (ADC) maps derived from diffusion-weighted imaging (DWI) demonstrate improved image quality and spatial resolution over 3 Tesla (3T) ADC maps. However, 7T magnetic resonance imaging (MRI) currently suffers from limited clinical unavailability, higher cost, and increased susceptibility to artifacts.

Purpose: To address these issues, we propose a hybrid CNN-transformer model to synthesize high-resolution 7T ADC maps from multimodal 3T MRI.

Methods: The Vision CNN-Transformer (VCT), composed of both Vision Transformer (ViT) blocks and convolutional layers, is proposed to produce high-resolution synthetic 7T ADC maps from 3T ADC maps and 3T T1-weighted (T1w) MRI. ViT blocks enabled global image context while convolutional layers efficiently captured fine detail. The VCT model was validated on the publicly available Human Connectome Project Young Adult dataset, comprising 3T T1w, 3T DWI, and 7T DWI brain scans. The Diffusion Imaging in Python library was used to compute ADC maps from the DWI scans. A total of 171 patient cases were randomly divided into 130 training cases, 20 validation cases, and 21 test cases. The synthetic ADC maps were evaluated by comparing their similarity to the ground truth volumes with the following metrics: peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and mean squared error (MSE). In addition, RESULTS: The results are as follows: PSNR: 27.0 ± 0.9 dB, SSIM: 0.945 ± 0.010, and MSE: 2.0E-3 ± 0.4E-3. Both qualitative and quantitative results demonstrate that VCT performs favorably against other state-of-the-art methods. We have introduced various efficiency improvements, including the implementation of flash attention and training on 176×208 resolution images. These enhancements have resulted in the reduction of parameters and training time per epoch by 50% in comparison to ResViT. Specifically, the training time per epoch has been shortened from 7.67 min to 3.86 min.

Conclusion: We propose a novel method to predict high-resolution 7T ADC maps from low-resolution 3T ADC maps and T1w MRI. Our predicted images demonstrate better spatial resolution and contrast compared to 3T MRI and prediction results made by ResViT and pix2pix. These high-quality synthetic 7T MR images could be beneficial for disease diagnosis and intervention, producing higher resolution and conformal contours, and as an intermediate step in generating synthetic CT for radiation therapy, especially when 7T MRI scanners are unavailable.

Keywords: 7T MRI; DWI; deep learning; intramodal MRI synthesis.