Peer review of GPT-4 technical report and systems card

Jack Gallifant; Amelia Fiske; Yulia A Levites Strekalova; Juan S Osorio-Valencia; Rachael Parke; Rogers Mwavu; Nicole Martinez; Judy Wawira Gichoya; Marzyeh Ghassemi; Dina Demner-Fushman; Liam G McCoy; Leo Anthony Celi; Robin Pierce

doi:10.1371/journal.pdig.0000417

Peer review of GPT-4 technical report and systems card

PLOS Digit Health. 2024 Jan 18;3(1):e0000417. doi: 10.1371/journal.pdig.0000417. eCollection 2024 Jan.

Authors

Jack Gallifant^{1

2}, Amelia Fiske³, Yulia A Levites Strekalova⁴, Juan S Osorio-Valencia^{5

6

7}, Rachael Parke^{8

9}, Rogers Mwavu¹⁰, Nicole Martinez¹¹, Judy Wawira Gichoya¹², Marzyeh Ghassemi¹³, Dina Demner-Fushman¹⁴, Liam G McCoy¹⁵, Leo Anthony Celi^{2

16

17}, Robin Pierce¹⁸

Affiliations

¹ Department of Critical Care, Guy's & St Thomas' NHS Trust, London, United Kingdom.
² Massachusetts Institute of Technology, Laboratory for Computational Physiology, Cambridge, Massachusetts, United States of America.
³ Institute of History and Ethics in Medicine, Department of Clinical Medicine, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany.
⁴ Department of Health Services Research, Management, and Policy, College of Public Health and Health Professions, University of Florida, Gainesville, Florida, United States of America.
⁵ A.I. and Innovation Committee, Colombian Radiology Association, Medellin, Colombia.
⁶ ScienteLab, Bogota, Colombia.
⁷ Be4tech, Medellin, Colombia.
⁸ Cardiothoracic and Vascular Intensive Care Unit, Auckland City Hospital, Auckland, New Zealand.
⁹ School of Nursing, The University of Auckland, Auckland, New Zealand.
¹⁰ Faculty of Computing and Informatics, Mbarara University of Science and Technology, Mbarara, Uganda.
¹¹ Center for Biomedical Ethics, Stanford University, Stanford, California, United States of America.
¹² Department of Radiology, Emory University School of Medicine, Atlanta, Georgia, United States of America.
¹³ Massachusetts Institute of Technology, Electrical Engineering and Computer Science (EECS), Cambridge, Massachusetts, United States of America.
¹⁴ National Library of Medicine, NIH, HHS, Bethesda, Maryland, United States of America.
¹⁵ Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, Canada.
¹⁶ Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America.
¹⁷ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.
¹⁸ The Law School, Faculty of Humanities, Arts, and Social Sciences, University of Exeter, Exeter, United Kingdom.

Abstract

The study provides a comprehensive review of OpenAI's Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4's report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.

Copyright: © 2024 Gallifant et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Publication types

Review

Abstract

Publication types

Grants and funding