Attention-augmented U-Net (AA-U-Net) for semantic segmentation

Kumar T Rajamani; Priya Rani; Hanna Siebert; Rajkumar ElagiriRamalingam; Mattias P Heinrich

doi:10.1007/s11760-022-02302-3

Attention-augmented U-Net (AA-U-Net) for semantic segmentation

Signal Image Video Process. 2023;17(4):981-989. doi: 10.1007/s11760-022-02302-3. Epub 2022 Jul 25.

Authors

Kumar T Rajamani¹, Priya Rani², Hanna Siebert³, Rajkumar ElagiriRamalingam⁴, Mattias P Heinrich³

Affiliations

¹ Philips Research, Bangalore, India.
² Applied Artificial Intelligence Institute, Deakin University, Burwood, VIC 3125 Australia.
³ Institute of Medical Informatics, University of Lübeck, Luebeck, Germany.
⁴ Apex Semiconductors, Bangalore, India.

Abstract

Deep learning-based image segmentation models rely strongly on capturing sufficient spatial context without requiring complex models that are hard to train with limited labeled data. For COVID-19 infection segmentation on CT images, training data are currently scarce. Attention models, in particular the most recent self-attention methods, have shown to help gather contextual information within deep networks and benefit semantic segmentation tasks. The recent attention-augmented convolution model aims to capture long range interactions by concatenating self-attention and convolution feature maps. This work proposes a novel attention-augmented convolution U-Net (AA-U-Net) that enables a more accurate spatial aggregation of contextual information by integrating attention-augmented convolution in the bottleneck of an encoder-decoder segmentation architecture. A deep segmentation network (U-Net) with this attention mechanism significantly improves the performance of semantic segmentation tasks on challenging COVID-19 lesion segmentation. The validation experiments show that the performance gain of the attention-augmented U-Net comes from their ability to capture dynamic and precise (wider) attention context. The AA-U-Net achieves Dice scores of 72.3% and 61.4% for ground-glass opacity and consolidation lesions for COVID-19 segmentation and improves the accuracy by 4.2% points against a baseline U-Net and 3.09% points compared to a baseline U-Net with matched parameters.

Supplementary information: The online version contains supplementary material available at 10.1007/s11760-022-02302-3.

Keywords: Attention mechanism; Attention-augmented convolution; COVID-19; Consolidation; Ground-glass opacities; Segmentation; U-Net.