T V Nechay, A V Sazhin, K M Loban, A K Bogomolova, V V Suglob, T R Beniia
{"title":"[基于人工智能的大语言模型在疝气学决策支持中的有效性和安全性:专家和普通外科医生的评估]。","authors":"T V Nechay, A V Sazhin, K M Loban, A K Bogomolova, V V Suglob, T R Beniia","doi":"10.17116/hirurgia20240816","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the quality of recommendations provided by ChatGPT regarding inguinal hernia repair.</p><p><strong>Material and methods: </strong>ChatGPT was asked 5 questions about surgical management of inguinal hernias. The chat-bot was assigned the role of expert in herniology and requested to search only specialized medical databases and provide information about references and evidence. Herniology experts and surgeons (non-experts) rated the quality of recommendations generated by ChatGPT using 4-point scale (from 0 to 3 points). Statistical correlations were explored between participants' ratings and their stance regarding artificial intelligence.</p><p><strong>Results: </strong>Experts scored the quality of ChatGPT responses lower than non-experts (2 (1-2) vs. 2 (2-3), <i>p</i><0.001). The chat-bot failed to provide valid references and actual evidence, as well as falsified half of references. Respondents were optimistic about the future of neural networks for clinical decision-making support. Most of them were against restricting their use in healthcare.</p><p><strong>Conclusion: </strong>We would not recommend non-specialized large language models as a single or primary source of information for clinical decision making or virtual searching assistant.</p>","PeriodicalId":35986,"journal":{"name":"Khirurgiya","volume":" 8","pages":"6-14"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons].\",\"authors\":\"T V Nechay, A V Sazhin, K M Loban, A K Bogomolova, V V Suglob, T R Beniia\",\"doi\":\"10.17116/hirurgia20240816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To evaluate the quality of recommendations provided by ChatGPT regarding inguinal hernia repair.</p><p><strong>Material and methods: </strong>ChatGPT was asked 5 questions about surgical management of inguinal hernias. The chat-bot was assigned the role of expert in herniology and requested to search only specialized medical databases and provide information about references and evidence. Herniology experts and surgeons (non-experts) rated the quality of recommendations generated by ChatGPT using 4-point scale (from 0 to 3 points). Statistical correlations were explored between participants' ratings and their stance regarding artificial intelligence.</p><p><strong>Results: </strong>Experts scored the quality of ChatGPT responses lower than non-experts (2 (1-2) vs. 2 (2-3), <i>p</i><0.001). The chat-bot failed to provide valid references and actual evidence, as well as falsified half of references. Respondents were optimistic about the future of neural networks for clinical decision-making support. Most of them were against restricting their use in healthcare.</p><p><strong>Conclusion: </strong>We would not recommend non-specialized large language models as a single or primary source of information for clinical decision making or virtual searching assistant.</p>\",\"PeriodicalId\":35986,\"journal\":{\"name\":\"Khirurgiya\",\"volume\":\" 8\",\"pages\":\"6-14\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Khirurgiya\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17116/hirurgia20240816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Khirurgiya","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17116/hirurgia20240816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Medicine","Score":null,"Total":0}
[Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons].
Objective: To evaluate the quality of recommendations provided by ChatGPT regarding inguinal hernia repair.
Material and methods: ChatGPT was asked 5 questions about surgical management of inguinal hernias. The chat-bot was assigned the role of expert in herniology and requested to search only specialized medical databases and provide information about references and evidence. Herniology experts and surgeons (non-experts) rated the quality of recommendations generated by ChatGPT using 4-point scale (from 0 to 3 points). Statistical correlations were explored between participants' ratings and their stance regarding artificial intelligence.
Results: Experts scored the quality of ChatGPT responses lower than non-experts (2 (1-2) vs. 2 (2-3), p<0.001). The chat-bot failed to provide valid references and actual evidence, as well as falsified half of references. Respondents were optimistic about the future of neural networks for clinical decision-making support. Most of them were against restricting their use in healthcare.
Conclusion: We would not recommend non-specialized large language models as a single or primary source of information for clinical decision making or virtual searching assistant.