A visual-language foundation model for computational pathology

Ming Y Lu; Bowen Chen; Drew F K Williamson; Richard J Chen; Ivy Liang; Tong Ding; Guillaume Jaume; Igor Odintsov; Long Phi Le; Georg Gerber; Anil V Parwani; Andrew Zhang; Faisal Mahmood

doi:10.1038/s41591-024-02856-4

A visual-language foundation model for computational pathology

Nat Med. 2024 Mar;30(3):863-874. doi: 10.1038/s41591-024-02856-4. Epub 2024 Mar 19.

Authors

Ming Y Lu^#^{1

2

3

4

5}, Bowen Chen^#^{1

2}, Drew F K Williamson^#^{1

2

3}, Richard J Chen^{1

2

3

4

6}, Ivy Liang^{1

7}, Tong Ding^{1

7}, Guillaume Jaume^{1

2

3

4}, Igor Odintsov¹, Long Phi Le², Georg Gerber¹, Anil V Parwani⁸, Andrew Zhang^{1

2

3

4

9}, Faisal Mahmood^{10

11

12

13

14}

Affiliations

¹ Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
² Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
³ Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
⁴ Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
⁵ Electrical Engineering and Computer Science, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA.
⁶ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
⁷ Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA.
⁸ Department of Pathology, Wexner Medical Center, Ohio State University, Columbus, OH, USA.
⁹ Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA.
¹⁰ Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA. faisalmahmood@bwh.harvard.edu.
¹¹ Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA. faisalmahmood@bwh.harvard.edu.
¹² Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA. faisalmahmood@bwh.harvard.edu.
¹³ Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA. faisalmahmood@bwh.harvard.edu.
¹⁴ Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA. faisalmahmood@bwh.harvard.edu.

^# Contributed equally.

PMID: 38504017
DOI: 10.1038/s41591-024-02856-4

Abstract

The accelerated adoption of digital pathology and advances in deep learning have enabled the development of robust models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain, and a model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text and, notably, over 1.17 million image-caption pairs through task-agnostic pretraining. Evaluated on a suite of 14 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving histopathology images and/or text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, and text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning.

MeSH terms

Humans
Language*
Machine Learning*
Workflow