Deep Learning Staging of Liver Iron Content From Multiecho MR Images

J Magn Reson Imaging. 2023 Feb;57(2):472-484. doi: 10.1002/jmri.28300. Epub 2022 Jun 17.

Abstract

Background: MRI represents the most established liver iron content (LIC) evaluation approach by estimation of liver T2* value, but it is dependent on the choice of the measurement region and the software used for image analysis.

Purpose: To develop a deep-learning method for unsupervised classification of LIC from magnitude T2* multiecho MR images.

Study type: Retrospective.

Population/subjects: A total of 1069 thalassemia major patients enrolled in the core laboratory of the Myocardial Iron Overload in Thalassemia (MIOT) network, which were included in the training (80%) and test (20%) sets. Twenty patients from different MRI vendors included in the external test set.

Field strength/sequence: A5 T, T2* multiecho magnitude images.

Assessment: Four deep-learning convolutional neural networks (HippoNet-2D, HippoNet-3D, HippoNet-LSTM, and an ensemble network HippoNet-Ensemble) were used to achieve unsupervised staging of LIC using five classes (normal, borderline, middle, moderate, severe). The training set was employed to construct the deep-learning model. The performance of the LIC staging model was evaluated in the test set and in the external test set. The model's performances were assessed by evaluating the accuracy, sensitivity, and specificity with respect to the ground truth labels obtained by T2* measurements and by comparison with operator-induced variability originating from different region of interest (ROI) placements.

Statistical tests: The network's performances were evaluated by single-class accuracy, specificity, and sensitivity and compared by one-way repeated measures analysis of variance (ANOVA) and one-way ANOVA.

Results: HippoNet-Ensemble reached an accuracy significantly higher than the other networks, and a sensitivity and specificity higher than HippoNet-LSTM. Accuracy, sensitivity, and specificity values for the LIC stages were: normal: 0.96/0.93/0.97, borderline: 0.95/0.85/0.98, mild: 0.96/0.88/0.98, moderate: 0.95/0.89/0.97, severe: 0.97/0.95/0.98. Correctly staging of cases was in the range of 85%-95%, depending on the LIC class. Multiclass accuracy was 0.90 against 0.92 for the interobserver variability.

Data conclusion: The proposed HippoNet-Ensemble network can perform unsupervised LIC staging and achieves good prognostic performance.

Evidence level: 4 TECHNICAL EFFICACY: Stage 2.

Keywords: CNN; T2* multiecho; deep learning; iron overload; liver.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Deep Learning*
  • Humans
  • Iron
  • Iron Overload* / diagnostic imaging
  • Liver / diagnostic imaging
  • Magnetic Resonance Imaging / methods
  • Retrospective Studies

Substances

  • Iron