Xuefeng Li, Ziman Chen, Ting Xie, Jieyi Liang, Fei Chen
{"title":"使用大型语言模型增强超声检查性能:ChatGPT-4和Claude 3的分析研究。","authors":"Xuefeng Li, Ziman Chen, Ting Xie, Jieyi Liang, Fei Chen","doi":"10.11152/mu-4505","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>To evaluate the effectiveness of two large language models, ChatGPT-4 and Claude 3, in improving the accuracy of question responses by senior sonologist and junior sonologist.</p><p><strong>Material and methods: </strong>A senior and a junior sonologist were given a practice exam. After answering the questions, they reviewed the responses and explanations provided by ChatGPT-4 and Claude 3. The accuracy and scores before and after incorporating the models' input were analyzed to compare their effectiveness.</p><p><strong>Results: </strong>No statistically significant differences were found between the two models' responses scores for each section (all p>0.05). For junior sonologist, both ChatGPT-4 (p=0.039) and Claude 3 (p=0.039) significantly improved scores in basic knowledge. The responses provided by ChatGPT-4 also significantly improved scores in relevant professional knowledge (p=0.038), though their explanations did not (p=0.077). For all exam sections, both models' responses and explanations significantly improved scores (all p<0.05). For senior sonologist, both ChatGPT-4's responses (p=0.022) and explanations (p=0.034) improved scores in basic knowledge, as did Claude 3's explanations (p=0.003). Across all sections, Claude 3's explanations significantly improved scores (p=0.041).</p><p><strong>Conclusion: </strong>ChatGPT-4 and Claude 3 significantly improved sonologist' examination performance, particularly in basic knowledge.</p>","PeriodicalId":94138,"journal":{"name":"Medical ultrasonography","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing sonologist examination performance with large language models: an analytical study of ChatGPT-4 and Claude 3.\",\"authors\":\"Xuefeng Li, Ziman Chen, Ting Xie, Jieyi Liang, Fei Chen\",\"doi\":\"10.11152/mu-4505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aim: </strong>To evaluate the effectiveness of two large language models, ChatGPT-4 and Claude 3, in improving the accuracy of question responses by senior sonologist and junior sonologist.</p><p><strong>Material and methods: </strong>A senior and a junior sonologist were given a practice exam. After answering the questions, they reviewed the responses and explanations provided by ChatGPT-4 and Claude 3. The accuracy and scores before and after incorporating the models' input were analyzed to compare their effectiveness.</p><p><strong>Results: </strong>No statistically significant differences were found between the two models' responses scores for each section (all p>0.05). For junior sonologist, both ChatGPT-4 (p=0.039) and Claude 3 (p=0.039) significantly improved scores in basic knowledge. The responses provided by ChatGPT-4 also significantly improved scores in relevant professional knowledge (p=0.038), though their explanations did not (p=0.077). For all exam sections, both models' responses and explanations significantly improved scores (all p<0.05). For senior sonologist, both ChatGPT-4's responses (p=0.022) and explanations (p=0.034) improved scores in basic knowledge, as did Claude 3's explanations (p=0.003). Across all sections, Claude 3's explanations significantly improved scores (p=0.041).</p><p><strong>Conclusion: </strong>ChatGPT-4 and Claude 3 significantly improved sonologist' examination performance, particularly in basic knowledge.</p>\",\"PeriodicalId\":94138,\"journal\":{\"name\":\"Medical ultrasonography\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical ultrasonography\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11152/mu-4505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical ultrasonography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11152/mu-4505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing sonologist examination performance with large language models: an analytical study of ChatGPT-4 and Claude 3.
Aim: To evaluate the effectiveness of two large language models, ChatGPT-4 and Claude 3, in improving the accuracy of question responses by senior sonologist and junior sonologist.
Material and methods: A senior and a junior sonologist were given a practice exam. After answering the questions, they reviewed the responses and explanations provided by ChatGPT-4 and Claude 3. The accuracy and scores before and after incorporating the models' input were analyzed to compare their effectiveness.
Results: No statistically significant differences were found between the two models' responses scores for each section (all p>0.05). For junior sonologist, both ChatGPT-4 (p=0.039) and Claude 3 (p=0.039) significantly improved scores in basic knowledge. The responses provided by ChatGPT-4 also significantly improved scores in relevant professional knowledge (p=0.038), though their explanations did not (p=0.077). For all exam sections, both models' responses and explanations significantly improved scores (all p<0.05). For senior sonologist, both ChatGPT-4's responses (p=0.022) and explanations (p=0.034) improved scores in basic knowledge, as did Claude 3's explanations (p=0.003). Across all sections, Claude 3's explanations significantly improved scores (p=0.041).
Conclusion: ChatGPT-4 and Claude 3 significantly improved sonologist' examination performance, particularly in basic knowledge.