Differences between human and machine perception in medical diagnosis

Taro Makino; Stanisław Jastrzębski; Witold Oleszkiewicz; Celin Chacko; Robin Ehrenpreis; Naziya Samreen; Chloe Chhor; Eric Kim; Jiyon Lee; Kristine Pysarenko; Beatriu Reig; Hildegard Toth; Divya Awal; Linda Du; Alice Kim; James Park; Daniel K Sodickson; Laura Heacock; Linda Moy; Kyunghyun Cho; Krzysztof J Geras

doi:10.1038/s41598-022-10526-z

Differences between human and machine perception in medical diagnosis

Sci Rep. 2022 Apr 27;12(1):6877. doi: 10.1038/s41598-022-10526-z.

Authors

Taro Makino^{1

2}, Stanisław Jastrzębski^{3

4

5}, Witold Oleszkiewicz⁶, Celin Chacko⁴, Robin Ehrenpreis⁴, Naziya Samreen⁴, Chloe Chhor⁴, Eric Kim⁴, Jiyon Lee⁴, Kristine Pysarenko⁴, Beatriu Reig^{4

7}, Hildegard Toth^{4

7}, Divya Awal⁴, Linda Du⁴, Alice Kim⁴, James Park⁴, Daniel K Sodickson^{4

5

8

7}, Laura Heacock^{4

7}, Linda Moy^{4

5

8

7}, Kyunghyun Cho^{3

9}, Krzysztof J Geras^{10

11

12

13}

Affiliations

¹ Center for Data Science, New York University, New York, NY, USA. taro@nyu.edu.
² Department of Radiology, NYU Langone Health, New York, NY, USA. taro@nyu.edu.
³ Center for Data Science, New York University, New York, NY, USA.
⁴ Department of Radiology, NYU Langone Health, New York, NY, USA.
⁵ Center for Advanced Imaging Innovation and Research, NYU Langone Health, New York, NY, USA.
⁶ Faculty of Electronics and Information Technology, Warsaw University of Technology, Warszawa, Poland.
⁷ Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA.
⁸ Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA.
⁹ Department of Computer Science, Courant Institute, New York University, New York, NY, USA.
¹⁰ Center for Data Science, New York University, New York, NY, USA. k.j.geras@nyu.edu.
¹¹ Department of Radiology, NYU Langone Health, New York, NY, USA. k.j.geras@nyu.edu.
¹² Center for Advanced Imaging Innovation and Research, NYU Langone Health, New York, NY, USA. k.j.geras@nyu.edu.
¹³ Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA. k.j.geras@nyu.edu.

Abstract

Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since they can fail for reasons unrelated to underlying pathology. Humans are less likely to make such superficial mistakes, since they use features that are grounded on medical science. It is therefore important to know whether DNNs use different features than humans. Towards this end, we propose a framework for comparing human and machine perception in medical diagnosis. We frame the comparison in terms of perturbation robustness, and mitigate Simpson's paradox by performing a subgroup analysis. The framework is demonstrated with a case study in breast cancer screening, where we separately analyze microcalcifications and soft tissue lesions. While it is inconclusive whether humans and DNNs use different features to detect microcalcifications, we find that for soft tissue lesions, DNNs rely on high frequency components ignored by radiologists. Moreover, these features are located outside of the region of the images found most suspicious by radiologists. This difference between humans and machines was only visible through subgroup analysis, which highlights the importance of incorporating medical domain knowledge into the comparison.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Breast Neoplasms* / diagnostic imaging
Calcinosis*
Female
Humans
Neural Networks, Computer
Perception
Radiologists

Abstract

Publication types

MeSH terms

Grants and funding