Comparisons of Quality, Correctness, and Similarity Between ChatGPT-Generated and Human-Written Abstracts for Basic Research: Cross-Sectional Study

Shu-Li Cheng; Shih-Jen Tsai; Ya-Mei Bai; Chih-Hung Ko; Chih-Wei Hsu; Fu-Chi Yang; Chia-Kuang Tsai; Yu-Kang Tu; Szu-Nian Yang; Ping-Tao Tseng; Tien-Wei Hsu; Chih-Sung Liang; Kuan-Pin Su

doi:10.2196/51229

Comparisons of Quality, Correctness, and Similarity Between ChatGPT-Generated and Human-Written Abstracts for Basic Research: Cross-Sectional Study

J Med Internet Res. 2023 Dec 25:25:e51229. doi: 10.2196/51229.

Authors

Shu-Li Cheng¹, Shih-Jen Tsai^{2

3}, Ya-Mei Bai^{2

3}, Chih-Hung Ko^{4

5

6}, Chih-Wei Hsu⁷, Fu-Chi Yang⁸, Chia-Kuang Tsai⁸, Yu-Kang Tu^{9

10}, Szu-Nian Yang^{11

12

13}, Ping-Tao Tseng^{14

15

16}, Tien-Wei Hsu^#^{17

18}, Chih-Sung Liang^#^{11

19}, Kuan-Pin Su^{20

21

22}

Affiliations

¹ Department of Nursing, Mackay Medical College, Taipei, Taiwan.
² Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan.
³ Division of Psychiatry, School of Medicine, National Yang-Ming University, Taipei, Taiwan.
⁴ Department of Psychiatry, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan.
⁵ Department of Psychiatry, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan.
⁶ Department of Psychiatry, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan.
⁷ Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan.
⁸ Department of Neurology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan.
⁹ Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.
¹⁰ Department of Dentistry, National Taiwan University Hospital, Taipei, Taiwan.
¹¹ Department of Psychiatry, Tri-service Hospital, Beitou branch, Taipei, Taiwan.
¹² Department of Psychiatry, Armed Forces Taoyuan General Hospital, Taoyuan, Taiwan.
¹³ Graduate Institute of Health and Welfare Policy, National Yang Ming Chiao Tung University, Taipei, Taiwan.
¹⁴ Institute of Biomedical Sciences, Institute of Precision Medicine, National Sun Yat-sen University, Kaohsiung, Taiwan.
¹⁵ Department of Psychology, College of Medical and Health Science, Asia University, Taichung, Taiwan.
¹⁶ Prospect Clinic for Otorhinolaryngology and Neurology, Kaohsiung, Taiwan.
¹⁷ Department of Psychiatry, E-Da Dachang Hospital, I-Shou University, Kaohsiung, Taiwan.
¹⁸ Department of Psychiatry, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan.
¹⁹ Department of Psychiatry, National Defense Medical Center, Taipei, Taiwan.
²⁰ College of Medicine, China Medical University, Taichung, Taiwan.
²¹ Mind-Body Interface Laboratory, China Medical University and Hospital, Taichung, Taiwan.
²² An-Nan Hospital, China Medical University, Tainan, Taiwan.

^# Contributed equally.

PMID: 38145486
PMCID: PMC10760418
DOI: 10.2196/51229

Abstract

Background: ChatGPT may act as a research assistant to help organize the direction of thinking and summarize research findings. However, few studies have examined the quality, similarity (abstracts being similar to the original one), and accuracy of the abstracts generated by ChatGPT when researchers provide full-text basic research papers.

Objective: We aimed to assess the applicability of an artificial intelligence (AI) model in generating abstracts for basic preclinical research.

Methods: We selected 30 basic research papers from Nature, Genome Biology, and Biological Psychiatry. Excluding abstracts, we inputted the full text into ChatPDF, an application of a language model based on ChatGPT, and we prompted it to generate abstracts with the same style as used in the original papers. A total of 8 experts were invited to evaluate the quality of these abstracts (based on a Likert scale of 0-10) and identify which abstracts were generated by ChatPDF, using a blind approach. These abstracts were also evaluated for their similarity to the original abstracts and the accuracy of the AI content.

Results: The quality of ChatGPT-generated abstracts was lower than that of the actual abstracts (10-point Likert scale: mean 4.72, SD 2.09 vs mean 8.09, SD 1.03; P<.001). The difference in quality was significant in the unstructured format (mean difference -4.33; 95% CI -4.79 to -3.86; P<.001) but minimal in the 4-subheading structured format (mean difference -2.33; 95% CI -2.79 to -1.86). Among the 30 ChatGPT-generated abstracts, 3 showed wrong conclusions, and 10 were identified as AI content. The mean percentage of similarity between the original and the generated abstracts was not high (2.10%-4.40%). The blinded reviewers achieved a 93% (224/240) accuracy rate in guessing which abstracts were written using ChatGPT.

Conclusions: Using ChatGPT to generate a scientific abstract may not lead to issues of similarity when using real full texts written by humans. However, the quality of the ChatGPT-generated abstracts was suboptimal, and their accuracy was not 100%.

Keywords: AI-generated scientific content; ChatGPT; LLM; NLP; abstract; abstracts; academic research; artificial intelligence; extract; extraction; generation; generative; language model; language models; natural language processing; plagiarism; publication; publications; scientific research; text; textual.

©Shu-Li Cheng, Shih-Jen Tsai, Ya-Mei Bai, Chih-Hung Ko, Chih-Wei Hsu, Fu-Chi Yang, Chia-Kuang Tsai, Yu-Kang Tu, Szu-Nian Yang, Ping-Tao Tseng, Tien-Wei Hsu, Chih-Sung Liang, Kuan-Pin Su. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.12.2023.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Cross-Sectional Studies
Humans
Language
Research Personnel
Research*