Improving interobserver agreement and performance of deep learning models for segmenting acute ischemic stroke by combining DWI with optimized ADC thresholds

Eur Radiol. 2022 Aug;32(8):5371-5381. doi: 10.1007/s00330-022-08633-6. Epub 2022 Feb 24.

Abstract

Objectives: To examine the role of ADC threshold on agreement across observers and deep learning models (DLMs) plus segmentation performance of DLMs for acute ischemic stroke (AIS).

Methods: Twelve DLMs, which were trained on DWI-ADC-ADC combination from 76 patients with AIS using 6 different ADC thresholds with ground truth manually contoured by 2 observers, were tested by additional 67 patients in the same hospital and another 78 patients in another hospital. Agreement between observers and DLMs were evaluated by Bland-Altman plot and intraclass correlation coefficient (ICC). The similarity between ground truth (GT) defined by observers and between automatic segmentation performed by DLMs was evaluated by Dice similarity coefficient (DSC). Group comparison was performed using the Mann-Whitney U test. The relationship between the DSC and ADC threshold as well as AIS lesion size was evaluated by linear regression analysis. A p < .05 was considered statistically significant.

Results: Excellent interobserver agreement and intraobserver repeatability in the manual segmentation (all ICC > 0.98, p < .001) were achieved. The 95% limit of agreement was reduced from 11.23 cm2 for GT on DWI to 0.59 cm2 for prediction at an ADC threshold of 0.6 × 10-3 mm2/s combined with DWI. The segmentation performance of DLMs was improved with an overall DSC from 0.738 ± 0.214 on DWI to 0.971 ± 0.021 on an ADC threshold of 0.6 × 10-3 mm2/s combined with DWI.

Conclusions: Combining an ADC threshold of 0.6 × 10-3 mm2/s with DWI reduces interobserver and inter-DLM difference and achieves best segmentation performance of AIS lesions using DLMs.

Key points: • Higher Dice similarity coefficient (DSC) in predicting acute ischemic stroke lesions was achieved by ADC thresholds combined with DWI than by DWI alone (all p < .05). • DSC had a negative association with the ADC threshold in most sizes, both hospitals, and both observers (most p < .05) and a positive association with the stroke size in all ADC thresholds, both hospitals, and both observers (all p < .001). • An ADC threshold of 0.6 × 10-3 mm2/s eliminated the difference of DSC at any stroke size between observers or between hospitals (p = .07 to > .99).

Keywords: Apparent diffusion coefficient; Deep learning; Dice similarity coefficient; Diffusion magnetic resonance imaging; Ischemic stroke.

MeSH terms

  • Deep Learning*
  • Diffusion Magnetic Resonance Imaging
  • Humans
  • Ischemic Stroke* / diagnostic imaging
  • Observer Variation
  • Stroke* / diagnostic imaging