Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods

Guy S Handelman; Hong Kuan Kok; Ronil V Chandra; Amir H Razavi; Shiwei Huang; Mark Brooks; Michael J Lee; Hamed Asadi

doi:10.2214/AJR.18.20224

Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods

AJR Am J Roentgenol. 2019 Jan;212(1):38-43. doi: 10.2214/AJR.18.20224. Epub 2018 Oct 17.

Authors

Guy S Handelman^{1

2}, Hong Kuan Kok^{3

4}, Ronil V Chandra^{5

6}, Amir H Razavi^{7

8}, Shiwei Huang⁹, Mark Brooks^{5

10}, Michael J Lee^{2

11}, Hamed Asadi^{5

10

12}

Affiliations

¹ 1 Department of Radiology, Belfast City Hospital, 51 Lisburn Rd, Belfast, Antrim BT9 7AB, UK.
² 2 Royal College of Surgeons in Ireland, Dublin, Ireland.
³ 3 Interventional Radiology Service, Northern Hospital Radiology, Epping, Australia.
⁴ 4 School of Medicine, Faculty of Health, Deakin University, Waurn Ponds, Australia.
⁵ 5 Interventional Neuroradiology Service, Monash Imaging, Monash Health, Clayton, Australia.
⁶ 6 Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Australia.
⁷ 7 School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, Canada.
⁸ 8 BCE Corporate Security, Ottawa, ON, Canada.
⁹ 9 The Australian National University Medical School, Garran, Australia.
¹⁰ 10 Department of Radiology, Interventional Neuroradiology Service, Austin Health, Heidelberg, Australia.
¹¹ 11 Department of Radiology, Beaumont Hospital, Dublin, Ireland.
¹² 12 The Florey Institute of Neuroscience and Mental Health, University of Melbourne, Australia.

PMID: 30332290
DOI: 10.2214/AJR.18.20224

Abstract

Objective: Machine learning (ML) and artificial intelligence (AI) are rapidly becoming the most talked about and controversial topics in radiology and medicine. Over the past few years, the numbers of ML- or AI-focused studies in the literature have increased almost exponentially, and ML has become a hot topic at academic and industry conferences. However, despite the increased awareness of ML as a tool, many medical professionals have a poor understanding of how ML works and how to critically appraise studies and tools that are presented to us. Thus, we present a brief overview of ML, explain the metrics used in ML and how to interpret them, and explain some of the technical jargon associated with the field so that readers with a medical background and basic knowledge of statistics can feel more comfortable when examining ML applications.

Conclusion: Attention to sample size, overfitting, underfitting, cross validation, as well as a broad knowledge of the metrics of machine learning, can help those with little or no technical knowledge begin to assess machine learning studies. However, transparency in methods and sharing of algorithms is vital to allow clinicians to assess these tools themselves.

Keywords: artificial intelligence; machine learning; medicine; supervised machine learning; unsupervised machine learning.

Publication types

Review

MeSH terms

Algorithms
Humans
Image Processing, Computer-Assisted
Machine Learning*
Radiology*
Statistics as Topic