HCformer: Hybrid CNN-Transformer for LDCT Image Denoising

Jinli Yuan; Feng Zhou; Zhitao Guo; Xiaozeng Li; Hengyong Yu

doi:10.1007/s10278-023-00842-9

HCformer: Hybrid CNN-Transformer for LDCT Image Denoising

J Digit Imaging. 2023 Oct;36(5):2290-2305. doi: 10.1007/s10278-023-00842-9. Epub 2023 Jun 29.

Authors

Jinli Yuan¹, Feng Zhou¹, Zhitao Guo², Xiaozeng Li¹, Hengyong Yu³

Affiliations

¹ The School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, 300401, China.
² The School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, 300401, China. mrnow@hebut.edu.cn.
³ The Department of Electrical and Computer Engineering, University of Massachusetts Lowell, Lowell, MA, 01854, USA.

PMID: 37386333
PMCID: PMC10501999 (available on 2024-10-01)
DOI: 10.1007/s10278-023-00842-9

Abstract

Low-dose computed tomography (LDCT) is an effective way to reduce radiation exposure for patients. However, it will increase the noise of reconstructed CT images and affect the precision of clinical diagnosis. The majority of the current deep learning-based denoising methods are built on convolutional neural networks (CNNs), which concentrate on local information and have little capacity for multiple structures modeling. Transformer structures are capable of computing each pixel's response on a global scale, but their extensive computation requirements prevent them from being widely used in medical image processing. To reduce the impact of LDCT scans on patients, this paper aims to develop an image post-processing method by combining CNN and Transformer structures. This method can obtain a high-quality images from LDCT. A hybrid CNN-Transformer (HCformer) codec network model is proposed for LDCT image denoising. A neighborhood feature enhancement (NEF) module is designed to introduce the local information into the Transformer's operation, and the representation of adjacent pixel information in the LDCT image denoising task is increased. The shifting window method is utilized to lower the computational complexity of the network model and overcome the problems that come with computing the MSA (Multi-head self-attention) process in a fixed window. Meanwhile, W/SW-MSA (Windows/Shifted window Multi-head self-attention) is alternately used in two layers of the Transformer to gain the information interaction between various Transformer layers. This approach can successfully decrease the Transformer's overall computational cost. The AAPM 2016 LDCT grand challenge dataset is employed for ablation and comparison experiments to demonstrate the viability of the proposed LDCT denoising method. Per the experimental findings, HCformer can increase the image quality metrics SSIM, HuRMSE and FSIM from 0.8017, 34.1898, and 0.6885 to 0.8507, 17.7213, and 0.7247, respectively. Additionally, the proposed HCformer algorithm will preserves image details while it reduces noise. In this paper, an HCformer structure is proposed based on deep learning and evaluated by using the AAPM LDCT dataset. Both the qualitative and quantitative comparison results confirm that the proposed HCformer outperforms other methods. The contribution of each component of the HCformer is also confirmed by the ablation experiments. HCformer can combine the advantages of CNN and Transformer, and it has great potential for LDCT image denoising and other tasks.

Keywords: CT image denoising; Deep learning; Low-dose CT.

Publication types

Review
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Humans
Image Processing, Computer-Assisted / methods
Neural Networks, Computer*
Signal-To-Noise Ratio
Tomography, X-Ray Computed* / methods