[Human vs. ChatGPT. Is it possible obtain comparable results in the analysis of a scientific systematic review?]

Q3 Medicine

Recenti progressi in medicina Pub Date : 2024-09-01 DOI:10.1701/4334.43184

Chiara Esposito, Giulia Dell'Omo, Daniele Di Ianni, Paolo Di Procolo

{"title":"[Human vs. ChatGPT. Is it possible obtain comparable results in the analysis of a scientific systematic review?]","authors":"Chiara Esposito, Giulia Dell'Omo, Daniele Di Ianni, Paolo Di Procolo","doi":"10.1701/4334.43184","DOIUrl":null,"url":null,"abstract":"Introduction: There is growing interest in the use of ChatGPT in the writing and reviewing of scientific articles. In line with the nature of ChatGPT, we tested its effectiveness in the scientific article review process.Methods: We compared the findings of a systematic review of the published literature, produced by researchers in the traditional way, with a version created by ChatGPT, obtained by providing the same inputs as the original paper and a set of instructions (prompts) optimized to obtain the same type of result; we also identified the process that led to a comparable result. In order to assess the effectiveness of ChatGPT in analyzing the systematic review, we selected an existing, replicable study on the experience of health care professionals in the use of digital tools in clinical practice, from which we extracted and downloaded the related 17 publications in Pdf format. Subsequently, we uploaded these references into ChatGPT, setting specific prompts detailing the professional profile required, the context of the application, the expected outputs, and the level of creative freedom (temperature) to a minimum to limit the possibility of \"hallucinations\". After verifying ChatGPT's understanding of the task, we performed several iterations of the prompt until we obtained a result comparable to the original review. Finally, we systematically compared the results obtained by ChatGPT with those of the reference review.Results: The analysis showed that ChatGPT's results are comparable to human results, although 4 iterations of the prompt are required to approach the human benchmark.Discussion: Although ChatGPT showed comparable capabilities in text review, human authors exhibited greater analytical depth in interpretation. Due to their greater creative freedom, the authors offered more details about the benefits of digital tools in the hospital setting. ChatGPT, however, enriched the analysis by including elements not contemplated initially. The final comparison revealed comparable macro-themes between the two approaches, emphasizing the need for careful human validation to ensure the full integrity and depth of the analysis.Conclusions: Generative artificial intelligence (AI), represented by ChatGPT, showed significant potential in revolutionizing the production of scientific literature by supporting healthcare professionals. Although there are challenges that require careful evaluation, ChatGPT's results are comparable to human results. The key element is not so much the superiority of AI over humans but the human ability to configure and direct AI for optimal or even potentially superior human results.","PeriodicalId":20887,"journal":{"name":"Recenti progressi in medicina","volume":"115 9","pages":"420-425"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recenti progressi in medicina","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1701/4334.43184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: There is growing interest in the use of ChatGPT in the writing and reviewing of scientific articles. In line with the nature of ChatGPT, we tested its effectiveness in the scientific article review process.

Methods: We compared the findings of a systematic review of the published literature, produced by researchers in the traditional way, with a version created by ChatGPT, obtained by providing the same inputs as the original paper and a set of instructions (prompts) optimized to obtain the same type of result; we also identified the process that led to a comparable result. In order to assess the effectiveness of ChatGPT in analyzing the systematic review, we selected an existing, replicable study on the experience of health care professionals in the use of digital tools in clinical practice, from which we extracted and downloaded the related 17 publications in Pdf format. Subsequently, we uploaded these references into ChatGPT, setting specific prompts detailing the professional profile required, the context of the application, the expected outputs, and the level of creative freedom (temperature) to a minimum to limit the possibility of "hallucinations". After verifying ChatGPT's understanding of the task, we performed several iterations of the prompt until we obtained a result comparable to the original review. Finally, we systematically compared the results obtained by ChatGPT with those of the reference review.

Results: The analysis showed that ChatGPT's results are comparable to human results, although 4 iterations of the prompt are required to approach the human benchmark.

Discussion: Although ChatGPT showed comparable capabilities in text review, human authors exhibited greater analytical depth in interpretation. Due to their greater creative freedom, the authors offered more details about the benefits of digital tools in the hospital setting. ChatGPT, however, enriched the analysis by including elements not contemplated initially. The final comparison revealed comparable macro-themes between the two approaches, emphasizing the need for careful human validation to ensure the full integrity and depth of the analysis.

Conclusions: Generative artificial intelligence (AI), represented by ChatGPT, showed significant potential in revolutionizing the production of scientific literature by supporting healthcare professionals. Although there are challenges that require careful evaluation, ChatGPT's results are comparable to human results. The key element is not so much the superiority of AI over humans but the human ability to configure and direct AI for optimal or even potentially superior human results.

查看原文本刊更多论文

[人类与 ChatGPT。是否有可能在科学系统综述分析中获得可比结果？］

导言：在科学文章的撰写和评审过程中使用 ChatGPT 的兴趣日益浓厚。根据 ChatGPT 的特性，我们测试了它在科学文章评审过程中的有效性：我们将研究人员以传统方式撰写的已发表文献的系统性综述结果与 ChatGPT 创建的版本进行了比较，ChatGPT 创建的版本提供了与原始论文相同的输入，并提供了一套经过优化的说明（提示），以获得相同类型的结果；我们还确定了导致可比结果的过程。为了评估 ChatGPT 在分析系统综述中的有效性，我们选择了一项现有的、可复制的研究，该研究涉及医护人员在临床实践中使用数字工具的经验，我们从中提取并下载了相关的 17 篇 PDF 格式的出版物。随后，我们将这些参考文献上传到 ChatGPT 中，并设置了具体的提示，详细说明了所需的专业特征、应用背景、预期产出，并将创作自由度（温度）降到最低，以限制出现 "幻觉 "的可能性。在验证了 ChatGPT 对任务的理解后，我们对提示进行了多次反复，直到获得与原始审查相当的结果。最后，我们系统地比较了 ChatGPT 与参考评论的结果：分析结果表明，ChatGPT 的结果与人类结果相当，尽管需要迭代 4 次提示才能接近人类基准：讨论：虽然 ChatGPT 在文本审阅方面表现出了相当的能力，但人类作者在解释方面表现出了更高的分析深度。由于作者有更大的创作自由，他们提供了更多关于数字工具在医院环境中的益处的细节。而 ChatGPT 则通过纳入最初未考虑到的元素，丰富了分析内容。最后的比较结果显示，两种方法的宏观主题具有可比性，这就强调了人工验证的必要性，以确保分析的完整性和深度：以 ChatGPT 为代表的生成式人工智能（AI）通过为医疗保健专业人员提供支持，在革新科学文献的制作方面展现出了巨大的潜力。尽管还存在一些需要仔细评估的挑战，但 ChatGPT 的结果与人类的结果不相上下。关键因素并不在于人工智能优于人类，而在于人类有能力配置和指导人工智能，以获得最佳甚至可能优于人类的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Recenti progressi in medicina Medicine-Medicine (all)

CiteScore

0.90

自引率

0.00%

发文量

143

期刊介绍： Giunta ormai al sessantesimo anno, Recenti Progressi in Medicina continua a costituire un sicuro punto di riferimento ed uno strumento di lavoro fondamentale per l"ampliamento dell"orizzonte culturale del medico italiano. Recenti Progressi in Medicina è una rivista di medicina interna. Ciò significa il recupero di un"ottica globale e integrata, idonea ad evitare sia i particolarismi della informazione specialistica sia la frammentazione di quella generalista.