PMC-LLaMA: toward building open-source language models for medicine

Chaoyi Wu; Weixiong Lin; Xiaoman Zhang; Ya Zhang; Weidi Xie; Yanfeng Wang

doi:10.1093/jamia/ocae045

PMC-LLaMA: toward building open-source language models for medicine

J Am Med Inform Assoc. 2024 Apr 13:ocae045. doi: 10.1093/jamia/ocae045. Online ahead of print.

Authors

Chaoyi Wu^{1

2}, Weixiong Lin^{1

2}, Xiaoman Zhang^{1

2}, Ya Zhang^{1

2}, Weidi Xie^{1

2}, Yanfeng Wang^{1

2}

Affiliations

¹ Cooperative Medianet Innovation Center (CMIC), Shanghai Jiao Tong University, Shanghai, 200240, China.
² Shanghai AI Laboratory, Shanghai, 200232, China.

PMID: 38613821
DOI: 10.1093/jamia/ocae045

Abstract

Objective: Recently, large language models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering (QA) situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this article, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.

Materials and methods: We adapt a general-purpose LLM toward the medical domain, involving data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive domain-specific instruction fine-tuning, encompassing medical QA, rationale for reasoning, and conversational dialogues with 202M tokens.

Results: While evaluating various public medical QA benchmarks and manual rating, our lightweight PMC-LLaMA, which consists of only 13B parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, and datasets for instruction tuning will be released to the research community.

Discussion: Our contributions are 3-fold: (1) we build up an open-source LLM toward the medical domain. We believe the proposed PMC-LLaMA model can promote further development of foundation models in medicine, serving as a medical trainable basic generative language backbone; (2) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component, demonstrating how different training data and model scales affect medical LLMs; (3) we contribute a large-scale, comprehensive dataset for instruction tuning.

Conclusion: In this article, we systematically investigate the process of building up an open-source medical-specific LLM, PMC-LLaMA.

Keywords: ChatGPT; biomedical NLP; generative language models; large language models.

Abstract

Grants and funding