Joao P G Kasakewitch, Diego L Lima, Carlos A Balthazar da Silveira, Valberto Sanha, Ana Caroline Rasador, Leandro Totti Cavazzola, Julio Mayol, Flavio Malcher
{"title":"人工智能大语言模型在文献检索辅助评估腹股沟疝修补方法中的作用。","authors":"Joao P G Kasakewitch, Diego L Lima, Carlos A Balthazar da Silveira, Valberto Sanha, Ana Caroline Rasador, Leandro Totti Cavazzola, Julio Mayol, Flavio Malcher","doi":"10.1089/lap.2024.0277","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Aim:</i></b> This study assesses the reliability of artificial intelligence (AI) large language models (LLMs) in identifying relevant literature comparing inguinal hernia repair techniques. <b><i>Material and Methods:</i></b> We used LLM chatbots (Bing Chat AI, ChatGPT versions 3.5 and 4.0, and Gemini) to find comparative studies and randomized controlled trials on inguinal hernia repair techniques. The results were then compared with existing systematic reviews (SRs) and meta-analyses and checked for the authenticity of listed articles. <b><i>Results:</i></b> LLMs screened 22 studies from 2006 to 2023 across eight journals, while the SRs encompassed a total of 42 studies. Through thorough external validation, 63.6% of the studies (14 out of 22), including 10 identified through Chat GPT 4.0 and 6 via Bing AI (with an overlap of 2 studies between them), were confirmed to be authentic. Conversely, 36.3% (8 out of 22) were revealed as fabrications by Google Gemini (Bard), with two (25.0%) of these fabrications mistakenly linked to valid DOIs. Four (25.6%) of the 14 real studies were acknowledged in the SRs, which represents 18.1% of all LLM-generated studies. LLMs missed a total of 38 (90.5%) of the studies included in the previous SRs, while 10 real studies were found by the LLMs but were not included in the previous SRs. Between those 10 studies, 6 were reviews, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews. <b><i>Conclusions:</i></b> This study reveals the mixed reliability of AI language models in scientific searches. Emphasizing a cautious application of AI in academia and the importance of continuous evaluation of AI tools in scientific investigations.</p>","PeriodicalId":50166,"journal":{"name":"Journal of Laparoendoscopic & Advanced Surgical Techniques","volume":" ","pages":"437-444"},"PeriodicalIF":1.1000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.\",\"authors\":\"Joao P G Kasakewitch, Diego L Lima, Carlos A Balthazar da Silveira, Valberto Sanha, Ana Caroline Rasador, Leandro Totti Cavazzola, Julio Mayol, Flavio Malcher\",\"doi\":\"10.1089/lap.2024.0277\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b><i>Aim:</i></b> This study assesses the reliability of artificial intelligence (AI) large language models (LLMs) in identifying relevant literature comparing inguinal hernia repair techniques. <b><i>Material and Methods:</i></b> We used LLM chatbots (Bing Chat AI, ChatGPT versions 3.5 and 4.0, and Gemini) to find comparative studies and randomized controlled trials on inguinal hernia repair techniques. The results were then compared with existing systematic reviews (SRs) and meta-analyses and checked for the authenticity of listed articles. <b><i>Results:</i></b> LLMs screened 22 studies from 2006 to 2023 across eight journals, while the SRs encompassed a total of 42 studies. Through thorough external validation, 63.6% of the studies (14 out of 22), including 10 identified through Chat GPT 4.0 and 6 via Bing AI (with an overlap of 2 studies between them), were confirmed to be authentic. Conversely, 36.3% (8 out of 22) were revealed as fabrications by Google Gemini (Bard), with two (25.0%) of these fabrications mistakenly linked to valid DOIs. Four (25.6%) of the 14 real studies were acknowledged in the SRs, which represents 18.1% of all LLM-generated studies. LLMs missed a total of 38 (90.5%) of the studies included in the previous SRs, while 10 real studies were found by the LLMs but were not included in the previous SRs. Between those 10 studies, 6 were reviews, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews. <b><i>Conclusions:</i></b> This study reveals the mixed reliability of AI language models in scientific searches. Emphasizing a cautious application of AI in academia and the importance of continuous evaluation of AI tools in scientific investigations.</p>\",\"PeriodicalId\":50166,\"journal\":{\"name\":\"Journal of Laparoendoscopic & Advanced Surgical Techniques\",\"volume\":\" \",\"pages\":\"437-444\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Laparoendoscopic & Advanced Surgical Techniques\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1089/lap.2024.0277\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/26 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Laparoendoscopic & Advanced Surgical Techniques","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/lap.2024.0277","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
The Role of Artificial Intelligence Large Language Models in Literature Search Assistance to Evaluate Inguinal Hernia Repair Approaches.
Aim: This study assesses the reliability of artificial intelligence (AI) large language models (LLMs) in identifying relevant literature comparing inguinal hernia repair techniques. Material and Methods: We used LLM chatbots (Bing Chat AI, ChatGPT versions 3.5 and 4.0, and Gemini) to find comparative studies and randomized controlled trials on inguinal hernia repair techniques. The results were then compared with existing systematic reviews (SRs) and meta-analyses and checked for the authenticity of listed articles. Results: LLMs screened 22 studies from 2006 to 2023 across eight journals, while the SRs encompassed a total of 42 studies. Through thorough external validation, 63.6% of the studies (14 out of 22), including 10 identified through Chat GPT 4.0 and 6 via Bing AI (with an overlap of 2 studies between them), were confirmed to be authentic. Conversely, 36.3% (8 out of 22) were revealed as fabrications by Google Gemini (Bard), with two (25.0%) of these fabrications mistakenly linked to valid DOIs. Four (25.6%) of the 14 real studies were acknowledged in the SRs, which represents 18.1% of all LLM-generated studies. LLMs missed a total of 38 (90.5%) of the studies included in the previous SRs, while 10 real studies were found by the LLMs but were not included in the previous SRs. Between those 10 studies, 6 were reviews, and 1 was published after the SRs, leaving a total of three comparative studies missed by the reviews. Conclusions: This study reveals the mixed reliability of AI language models in scientific searches. Emphasizing a cautious application of AI in academia and the importance of continuous evaluation of AI tools in scientific investigations.
期刊介绍:
Journal of Laparoendoscopic & Advanced Surgical Techniques (JLAST) is the leading international peer-reviewed journal for practicing surgeons who want to keep up with the latest thinking and advanced surgical technologies in laparoscopy, endoscopy, NOTES, and robotics. The Journal is ideally suited to surgeons who are early adopters of new technology and techniques. Recognizing that many new technologies and techniques have significant overlap with several surgical specialties, JLAST is the first journal to focus on these topics both in general and pediatric surgery, and includes other surgical subspecialties such as: urology, gynecologic surgery, thoracic surgery, and more.