人工智能大语言模型在文献检索辅助评估腹股沟疝修补方法中的作用。

IF 1.1 4区 医学 Q3 SURGERY
Joao P G Kasakewitch, Diego L Lima, Carlos A Balthazar da Silveira, Valberto Sanha, Ana Caroline Rasador, Leandro Totti Cavazzola, Julio Mayol, Flavio Malcher
{"title":"人工智能大语言模型在文献检索辅助评估腹股沟疝修补方法中的作用。","authors":"Joao P G Kasakewitch, Diego L Lima, Carlos A Balthazar da Silveira, Valberto Sanha, Ana Caroline Rasador, Leandro Totti Cavazzola, Julio Mayol, Flavio Malcher","doi":"10.1089/lap.2024.0277","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Aim:</i></b> This study assesses the reliability of artificial intelligence (AI) large language models (LLMs) in identifying relevant literature comparing inguinal hernia repair techniques. <b><i>Material and Methods:</i></b> We used LLM chatbots (Bing Chat AI, ChatGPT versions 3.5 and 4.0, and Gemini) to find comparative studies and randomized controlled trials on inguinal hernia repair techniques. The results were then compared with existing systematic reviews (SRs) and meta-analyses and checked for the authenticity of listed articles. <b><i>Results:</i></b> LLMs screened 22 studies from 2006 to 2023 across eight journals, while the SRs encompassed a total of 42 studies. Through thorough external validation, 63.6% of the studies (14 out of 22), including 10 identified through Chat GPT 4.0 and 6 via Bing AI (with an overlap of 2 studies between them), were confirmed to be authentic. Conversely, 36.3% (8 out of 22) were revealed as fabrications by Google Gemini (Bard), with two (25.0%) of these fabrications mistakenly linked to valid DOIs. Four (25.6%) of the 14 real studies were acknowledged in the SRs, which represents 18.1% of all LLM-generated studies. LLMs missed a total of 38 (90.5%) of the studies included in the previous SRs, while 10 real studies were found by the LLMs but were not included in the previous SRs. Between those 10 studies, 6 were reviews, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews. <b><i>Conclusions:</i></b> This study reveals the mixed reliability of AI language models in scientific searches. Emphasizing a cautious application of AI in academia and the importance of continuous evaluation of AI tools in scientific investigations.</p>","PeriodicalId":50166,"journal":{"name":"Journal of Laparoendoscopic & Advanced Surgical Techniques","volume":" ","pages":"437-444"},"PeriodicalIF":1.1000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.\",\"authors\":\"Joao P G Kasakewitch, Diego L Lima, Carlos A Balthazar da Silveira, Valberto Sanha, Ana Caroline Rasador, Leandro Totti Cavazzola, Julio Mayol, Flavio Malcher\",\"doi\":\"10.1089/lap.2024.0277\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b><i>Aim:</i></b> This study assesses the reliability of artificial intelligence (AI) large language models (LLMs) in identifying relevant literature comparing inguinal hernia repair techniques. <b><i>Material and Methods:</i></b> We used LLM chatbots (Bing Chat AI, ChatGPT versions 3.5 and 4.0, and Gemini) to find comparative studies and randomized controlled trials on inguinal hernia repair techniques. The results were then compared with existing systematic reviews (SRs) and meta-analyses and checked for the authenticity of listed articles. <b><i>Results:</i></b> LLMs screened 22 studies from 2006 to 2023 across eight journals, while the SRs encompassed a total of 42 studies. Through thorough external validation, 63.6% of the studies (14 out of 22), including 10 identified through Chat GPT 4.0 and 6 via Bing AI (with an overlap of 2 studies between them), were confirmed to be authentic. Conversely, 36.3% (8 out of 22) were revealed as fabrications by Google Gemini (Bard), with two (25.0%) of these fabrications mistakenly linked to valid DOIs. Four (25.6%) of the 14 real studies were acknowledged in the SRs, which represents 18.1% of all LLM-generated studies. LLMs missed a total of 38 (90.5%) of the studies included in the previous SRs, while 10 real studies were found by the LLMs but were not included in the previous SRs. Between those 10 studies, 6 were reviews, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews. <b><i>Conclusions:</i></b> This study reveals the mixed reliability of AI language models in scientific searches. Emphasizing a cautious application of AI in academia and the importance of continuous evaluation of AI tools in scientific investigations.</p>\",\"PeriodicalId\":50166,\"journal\":{\"name\":\"Journal of Laparoendoscopic & Advanced Surgical Techniques\",\"volume\":\" \",\"pages\":\"437-444\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Laparoendoscopic & Advanced Surgical Techniques\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1089/lap.2024.0277\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/26 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Laparoendoscopic & Advanced Surgical Techniques","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/lap.2024.0277","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究评估人工智能(AI)大语言模型(LLMs)识别相关文献比较腹股沟疝修补技术的可靠性。材料和方法:我们使用LLM聊天机器人(Bing Chat AI、ChatGPT 3.5和4.0版本以及Gemini)进行腹股沟疝修补技术的比较研究和随机对照试验。然后将结果与现有的系统综述(SRs)和荟萃分析进行比较,并检查所列文章的真实性。结果:法学硕士从2006年到2023年在8个期刊上筛选了22项研究,而SRs总共包含42项研究。通过彻底的外部验证,63.6%的研究(22项研究中的14项)被证实是真实的,其中包括10项通过Chat GPT 4.0识别的研究和6项通过Bing AI识别的研究(它们之间有2项研究重叠)。相反,36.3%(22个中的8个)被发现是谷歌Gemini (Bard)的捏造,其中两个(25.0%)的捏造错误地与有效的doi相关联。14项真正的研究中有4项(25.6%)被SRs认可,占所有llm产生的研究的18.1%。法学硕士总共错过了38项(90.5%)纳入以前的SRs的研究,而法学硕士发现的10项真正的研究没有被纳入以前的SRs。在这10项研究中,6项是综述,1项是在SRs之后发表的,总共有3项比较研究被综述遗漏。结论:本研究揭示了人工智能语言模型在科学搜索中的混合可靠性。强调人工智能在学术界的谨慎应用,以及在科学研究中持续评估人工智能工具的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.

Aim: This study assesses the reliability of artificial intelligence (AI) large language models (LLMs) in identifying relevant literature comparing inguinal hernia repair techniques. Material and Methods: We used LLM chatbots (Bing Chat AI, ChatGPT versions 3.5 and 4.0, and Gemini) to find comparative studies and randomized controlled trials on inguinal hernia repair techniques. The results were then compared with existing systematic reviews (SRs) and meta-analyses and checked for the authenticity of listed articles. Results: LLMs screened 22 studies from 2006 to 2023 across eight journals, while the SRs encompassed a total of 42 studies. Through thorough external validation, 63.6% of the studies (14 out of 22), including 10 identified through Chat GPT 4.0 and 6 via Bing AI (with an overlap of 2 studies between them), were confirmed to be authentic. Conversely, 36.3% (8 out of 22) were revealed as fabrications by Google Gemini (Bard), with two (25.0%) of these fabrications mistakenly linked to valid DOIs. Four (25.6%) of the 14 real studies were acknowledged in the SRs, which represents 18.1% of all LLM-generated studies. LLMs missed a total of 38 (90.5%) of the studies included in the previous SRs, while 10 real studies were found by the LLMs but were not included in the previous SRs. Between those 10 studies, 6 were reviews, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews. Conclusions: This study reveals the mixed reliability of AI language models in scientific searches. Emphasizing a cautious application of AI in academia and the importance of continuous evaluation of AI tools in scientific investigations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.90
自引率
0.00%
发文量
163
审稿时长
3 months
期刊介绍: Journal of Laparoendoscopic & Advanced Surgical Techniques (JLAST) is the leading international peer-reviewed journal for practicing surgeons who want to keep up with the latest thinking and advanced surgical technologies in laparoscopy, endoscopy, NOTES, and robotics. The Journal is ideally suited to surgeons who are early adopters of new technology and techniques. Recognizing that many new technologies and techniques have significant overlap with several surgical specialties, JLAST is the first journal to focus on these topics both in general and pediatric surgery, and includes other surgical subspecialties such as: urology, gynecologic surgery, thoracic surgery, and more.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信