DIY AI, deep learning network development for automated image classification in a point-of-care ultrasound quality assurance program

J Am Coll Emerg Physicians Open. 2020 Mar 1;1(2):124-131. doi: 10.1002/emp2.12018. eCollection 2020 Apr.

Abstract

Background: Artificial intelligence (AI) is increasingly a part of daily life and offers great possibilities to enrich health care. Imaging applications of AI have been mostly developed by large, well-funded companies and currently are inaccessible to the comparatively small market of point-of-care ultrasound (POCUS) programs. Given this absence of commercial solutions, we sought to create and test a do-it-yourself (DIY) deep learning algorithm to classify ultrasound images to enhance the quality assurance work-flow for POCUS programs.

Methods: We created a convolutional neural network using publicly available software tools and pre-existing convolutional neural network architecture. The convolutional neural network was subsequently trained using ultrasound images from seven ultrasound exam types: pelvis, heart, lung, abdomen, musculoskeletal, ocular, and central vascular access from 189 publicly available POCUS videos. Approximately 121,000 individual images were extracted from the videos, 80% were used for model training and 10% each for cross validation and testing. We then tested the algorithm for accuracy against a set of 160 randomly extracted ultrasound frames from ultrasound videos not previously used for training and that were performed on different ultrasound equipment. Three POCUS experts blindly categorized the 160 random images, and results were compared to the convolutional neural network algorithm. Descriptive statistics and Krippendorff alpha reliability estimates were calculated.

Results: The cross validation of the convolutional neural network approached 99% for accuracy. The algorithm accurately classified 98% of the test ultrasound images. In the new POCUS program simulation phase, the algorithm accurately classified 70% of 160 new images for moderate correlation with the ground truth, α = 0.64. The three blinded POCUS experts correctly classified 93%, 94%, and 98% of the images, respectively. There was excellent agreement among the experts with α = 0.87. Agreement between experts and algorithm was good with α = 0.74. The most common error was misclassifying musculoskeletal images for both the algorithm (40%) and POCUS experts (40.6%). The algorithm took 7 minutes 45 seconds to review and classify the new 160 images. The 3 expert reviewers took 27, 32, and 45 minutes to classify the images, respectively.

Conclusions: Our algorithm accurately classified 98% of new images, by body scan area, related to its training pool, simulating POCUS program workflow. Performance was diminished with exam images from an unrelated image pool and ultrasound equipment, suggesting additional images and convolutional neural network training are necessary for fine tuning when using across different POCUS programs. The algorithm showed theoretical potential to improve workflow for POCUS program directors, if fully implemented. The implications of our DIY AI for POCUS are scalable and further work to maximize the collaboration between AI and POCUS programs is warranted.

Keywords: artificial intelligence; deep learning; emergency medicine; emergency ultrasound; point‐of‐care ultrasound.