梅毒相关问询语言模型的比较分析。

IF 1.8 3区 医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE
L-M Ferreira, J-P Nascimento, L-L Souza, F-T Souza, L-D Guimarães, M-A Lopes, P-A Vargas, H Martelli-Júnior
{"title":"梅毒相关问询语言模型的比较分析。","authors":"L-M Ferreira, J-P Nascimento, L-L Souza, F-T Souza, L-D Guimarães, M-A Lopes, P-A Vargas, H Martelli-Júnior","doi":"10.4317/medoral.27092","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Syphilis, caused by Treponema pallidum, is a significant global health concern with potentially severe complications if untreated. Advances in artificial intelligence (AI), particularly large language models (LLMs), offer opportunities to enhance medical diagnosis and public health education. This study aims to assess LLMs' ability to provide readable, accurate, and comprehensive syphilis information by comparing it with WHO datasheets and validating through specialist evaluation for clinical relevance.</p><p><strong>Material and methods: </strong>Ten AI-based LLMs were evaluated. Ten questions addressing symptoms, transmission, diagnosis, treatment, and prevention were crafted by researchers. Responses from the LLMs were compared to World Health Organization (WHO) syphilis fact sheets, and a panel of specialists assessed the accuracy, clinical relevance, and readability of the AI-generated information.</p><p><strong>Results: </strong>Among the evaluated LLMs, ChatGPT 4.0 and Claude demonstrated the highest accuracy, scoring 92% and 89% alignment with WHO standards, respectively. Perplexity and Llama3 performed less reliably, with scores between 60-70%, especially in areas like tertiary syphilis and neurosyphilis. Specialists identified common errors, such as outdated treatment protocols and incorrect descriptions of transmission pathways. Expert reviews further revealed that while LLMs provided adequate information on early syphilis symptoms, they struggled with complex clinical nuances. The specialists' evaluation showed that only 60% of the AI-generated content was deemed clinically reliable without further edits, with ChatGPT 4.0 rated highest by experts in terms of readability and clinical accuracy.</p><p><strong>Conclusions: </strong>LLMs hold promise for disseminating syphilis information, but human oversight is crucial. AI models need refinement to improve their accuracy, especially in complex medical scenarios.</p>","PeriodicalId":49016,"journal":{"name":"Medicina Oral Patologia Oral Y Cirugia Bucal","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of language models in addressing syphilis-related queries.\",\"authors\":\"L-M Ferreira, J-P Nascimento, L-L Souza, F-T Souza, L-D Guimarães, M-A Lopes, P-A Vargas, H Martelli-Júnior\",\"doi\":\"10.4317/medoral.27092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Syphilis, caused by Treponema pallidum, is a significant global health concern with potentially severe complications if untreated. Advances in artificial intelligence (AI), particularly large language models (LLMs), offer opportunities to enhance medical diagnosis and public health education. This study aims to assess LLMs' ability to provide readable, accurate, and comprehensive syphilis information by comparing it with WHO datasheets and validating through specialist evaluation for clinical relevance.</p><p><strong>Material and methods: </strong>Ten AI-based LLMs were evaluated. Ten questions addressing symptoms, transmission, diagnosis, treatment, and prevention were crafted by researchers. Responses from the LLMs were compared to World Health Organization (WHO) syphilis fact sheets, and a panel of specialists assessed the accuracy, clinical relevance, and readability of the AI-generated information.</p><p><strong>Results: </strong>Among the evaluated LLMs, ChatGPT 4.0 and Claude demonstrated the highest accuracy, scoring 92% and 89% alignment with WHO standards, respectively. Perplexity and Llama3 performed less reliably, with scores between 60-70%, especially in areas like tertiary syphilis and neurosyphilis. Specialists identified common errors, such as outdated treatment protocols and incorrect descriptions of transmission pathways. Expert reviews further revealed that while LLMs provided adequate information on early syphilis symptoms, they struggled with complex clinical nuances. The specialists' evaluation showed that only 60% of the AI-generated content was deemed clinically reliable without further edits, with ChatGPT 4.0 rated highest by experts in terms of readability and clinical accuracy.</p><p><strong>Conclusions: </strong>LLMs hold promise for disseminating syphilis information, but human oversight is crucial. AI models need refinement to improve their accuracy, especially in complex medical scenarios.</p>\",\"PeriodicalId\":49016,\"journal\":{\"name\":\"Medicina Oral Patologia Oral Y Cirugia Bucal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medicina Oral Patologia Oral Y Cirugia Bucal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.4317/medoral.27092\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicina Oral Patologia Oral Y Cirugia Bucal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4317/medoral.27092","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

背景:梅毒,由梅毒螺旋体引起,是一个重大的全球卫生问题,如果不治疗可能导致严重并发症。人工智能(AI)的进步,特别是大型语言模型(llm)的进步,为加强医疗诊断和公共卫生教育提供了机会。本研究旨在评估法学硕士提供可读、准确和全面的梅毒信息的能力,将其与世卫组织数据表进行比较,并通过专家评估进行临床相关性验证。材料与方法:对10个基于人工智能的法学硕士进行评价。研究人员精心设计了十个问题,涉及症状、传播、诊断、治疗和预防。法学硕士的回答与世界卫生组织(WHO)梅毒情况介绍进行了比较,专家小组评估了人工智能生成信息的准确性、临床相关性和可读性。结果:在评估的法学硕士中,ChatGPT 4.0和Claude的准确性最高,分别达到92%和89%的WHO标准。Perplexity和Llama3的表现不太可靠,得分在60-70%之间,尤其是在三期梅毒和神经梅毒等领域。专家们指出了常见的错误,例如过时的治疗方案和对传播途径的错误描述。专家评论进一步表明,虽然法学硕士提供了关于早期梅毒症状的充分信息,但他们在复杂的临床细微差别上挣扎。专家的评估显示,只有60%的人工智能生成的内容在没有进一步编辑的情况下被认为是临床可靠的,ChatGPT 4.0在可读性和临床准确性方面被专家评为最高。结论:法学硕士有望传播梅毒信息,但人类的监督是至关重要的。人工智能模型需要改进以提高其准确性,特别是在复杂的医疗场景中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparative analysis of language models in addressing syphilis-related queries.

Background: Syphilis, caused by Treponema pallidum, is a significant global health concern with potentially severe complications if untreated. Advances in artificial intelligence (AI), particularly large language models (LLMs), offer opportunities to enhance medical diagnosis and public health education. This study aims to assess LLMs' ability to provide readable, accurate, and comprehensive syphilis information by comparing it with WHO datasheets and validating through specialist evaluation for clinical relevance.

Material and methods: Ten AI-based LLMs were evaluated. Ten questions addressing symptoms, transmission, diagnosis, treatment, and prevention were crafted by researchers. Responses from the LLMs were compared to World Health Organization (WHO) syphilis fact sheets, and a panel of specialists assessed the accuracy, clinical relevance, and readability of the AI-generated information.

Results: Among the evaluated LLMs, ChatGPT 4.0 and Claude demonstrated the highest accuracy, scoring 92% and 89% alignment with WHO standards, respectively. Perplexity and Llama3 performed less reliably, with scores between 60-70%, especially in areas like tertiary syphilis and neurosyphilis. Specialists identified common errors, such as outdated treatment protocols and incorrect descriptions of transmission pathways. Expert reviews further revealed that while LLMs provided adequate information on early syphilis symptoms, they struggled with complex clinical nuances. The specialists' evaluation showed that only 60% of the AI-generated content was deemed clinically reliable without further edits, with ChatGPT 4.0 rated highest by experts in terms of readability and clinical accuracy.

Conclusions: LLMs hold promise for disseminating syphilis information, but human oversight is crucial. AI models need refinement to improve their accuracy, especially in complex medical scenarios.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Medicina Oral Patologia Oral Y Cirugia Bucal
Medicina Oral Patologia Oral Y Cirugia Bucal DENTISTRY, ORAL SURGERY & MEDICINE-
CiteScore
4.60
自引率
0.00%
发文量
52
审稿时长
3-8 weeks
期刊介绍: 1. Oral Medicine and Pathology: Clinicopathological as well as medical or surgical management aspects of diseases affecting oral mucosa, salivary glands, maxillary bones, as well as orofacial neurological disorders, and systemic conditions with an impact on the oral cavity. 2. Oral Surgery: Surgical management aspects of diseases affecting oral mucosa, salivary glands, maxillary bones, teeth, implants, oral surgical procedures. Surgical management of diseases affecting head and neck areas. 3. Medically compromised patients in Dentistry: Articles discussing medical problems in Odontology will also be included, with a special focus on the clinico-odontological management of medically compromised patients, and considerations regarding high-risk or disabled patients. 4. Implantology 5. Periodontology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信