Chiara Esposito, Giulia Dell'Omo, Daniele Di Ianni, Paolo Di Procolo
{"title":"[Human vs. ChatGPT. Is it possible obtain comparable results in the analysis of a scientific systematic review?]","authors":"Chiara Esposito, Giulia Dell'Omo, Daniele Di Ianni, Paolo Di Procolo","doi":"10.1701/4334.43184","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>There is growing interest in the use of ChatGPT in the writing and reviewing of scientific articles. In line with the nature of ChatGPT, we tested its effectiveness in the scientific article review process.</p><p><strong>Methods: </strong>We compared the findings of a systematic review of the published literature, produced by researchers in the traditional way, with a version created by ChatGPT, obtained by providing the same inputs as the original paper and a set of instructions (prompts) optimized to obtain the same type of result; we also identified the process that led to a comparable result. In order to assess the effectiveness of ChatGPT in analyzing the systematic review, we selected an existing, replicable study on the experience of health care professionals in the use of digital tools in clinical practice, from which we extracted and downloaded the related 17 publications in Pdf format. Subsequently, we uploaded these references into ChatGPT, setting specific prompts detailing the professional profile required, the context of the application, the expected outputs, and the level of creative freedom (temperature) to a minimum to limit the possibility of \"hallucinations\". After verifying ChatGPT's understanding of the task, we performed several iterations of the prompt until we obtained a result comparable to the original review. Finally, we systematically compared the results obtained by ChatGPT with those of the reference review.</p><p><strong>Results: </strong>The analysis showed that ChatGPT's results are comparable to human results, although 4 iterations of the prompt are required to approach the human benchmark.</p><p><strong>Discussion: </strong>Although ChatGPT showed comparable capabilities in text review, human authors exhibited greater analytical depth in interpretation. Due to their greater creative freedom, the authors offered more details about the benefits of digital tools in the hospital setting. ChatGPT, however, enriched the analysis by including elements not contemplated initially. The final comparison revealed comparable macro-themes between the two approaches, emphasizing the need for careful human validation to ensure the full integrity and depth of the analysis.</p><p><strong>Conclusions: </strong>Generative artificial intelligence (AI), represented by ChatGPT, showed significant potential in revolutionizing the production of scientific literature by supporting healthcare professionals. Although there are challenges that require careful evaluation, ChatGPT's results are comparable to human results. The key element is not so much the superiority of AI over humans but the human ability to configure and direct AI for optimal or even potentially superior human results.</p>","PeriodicalId":20887,"journal":{"name":"Recenti progressi in medicina","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recenti progressi in medicina","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1701/4334.43184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: There is growing interest in the use of ChatGPT in the writing and reviewing of scientific articles. In line with the nature of ChatGPT, we tested its effectiveness in the scientific article review process.
Methods: We compared the findings of a systematic review of the published literature, produced by researchers in the traditional way, with a version created by ChatGPT, obtained by providing the same inputs as the original paper and a set of instructions (prompts) optimized to obtain the same type of result; we also identified the process that led to a comparable result. In order to assess the effectiveness of ChatGPT in analyzing the systematic review, we selected an existing, replicable study on the experience of health care professionals in the use of digital tools in clinical practice, from which we extracted and downloaded the related 17 publications in Pdf format. Subsequently, we uploaded these references into ChatGPT, setting specific prompts detailing the professional profile required, the context of the application, the expected outputs, and the level of creative freedom (temperature) to a minimum to limit the possibility of "hallucinations". After verifying ChatGPT's understanding of the task, we performed several iterations of the prompt until we obtained a result comparable to the original review. Finally, we systematically compared the results obtained by ChatGPT with those of the reference review.
Results: The analysis showed that ChatGPT's results are comparable to human results, although 4 iterations of the prompt are required to approach the human benchmark.
Discussion: Although ChatGPT showed comparable capabilities in text review, human authors exhibited greater analytical depth in interpretation. Due to their greater creative freedom, the authors offered more details about the benefits of digital tools in the hospital setting. ChatGPT, however, enriched the analysis by including elements not contemplated initially. The final comparison revealed comparable macro-themes between the two approaches, emphasizing the need for careful human validation to ensure the full integrity and depth of the analysis.
Conclusions: Generative artificial intelligence (AI), represented by ChatGPT, showed significant potential in revolutionizing the production of scientific literature by supporting healthcare professionals. Although there are challenges that require careful evaluation, ChatGPT's results are comparable to human results. The key element is not so much the superiority of AI over humans but the human ability to configure and direct AI for optimal or even potentially superior human results.
期刊介绍:
Giunta ormai al sessantesimo anno, Recenti Progressi in Medicina continua a costituire un sicuro punto di riferimento ed uno strumento di lavoro fondamentale per l"ampliamento dell"orizzonte culturale del medico italiano. Recenti Progressi in Medicina è una rivista di medicina interna. Ciò significa il recupero di un"ottica globale e integrata, idonea ad evitare sia i particolarismi della informazione specialistica sia la frammentazione di quella generalista.