{"title":"梅毒相关问询语言模型的比较分析。","authors":"L-M Ferreira, J-P Nascimento, L-L Souza, F-T Souza, L-D Guimarães, M-A Lopes, P-A Vargas, H Martelli-Júnior","doi":"10.4317/medoral.27092","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Syphilis, caused by Treponema pallidum, is a significant global health concern with potentially severe complications if untreated. Advances in artificial intelligence (AI), particularly large language models (LLMs), offer opportunities to enhance medical diagnosis and public health education. This study aims to assess LLMs' ability to provide readable, accurate, and comprehensive syphilis information by comparing it with WHO datasheets and validating through specialist evaluation for clinical relevance.</p><p><strong>Material and methods: </strong>Ten AI-based LLMs were evaluated. Ten questions addressing symptoms, transmission, diagnosis, treatment, and prevention were crafted by researchers. Responses from the LLMs were compared to World Health Organization (WHO) syphilis fact sheets, and a panel of specialists assessed the accuracy, clinical relevance, and readability of the AI-generated information.</p><p><strong>Results: </strong>Among the evaluated LLMs, ChatGPT 4.0 and Claude demonstrated the highest accuracy, scoring 92% and 89% alignment with WHO standards, respectively. Perplexity and Llama3 performed less reliably, with scores between 60-70%, especially in areas like tertiary syphilis and neurosyphilis. Specialists identified common errors, such as outdated treatment protocols and incorrect descriptions of transmission pathways. Expert reviews further revealed that while LLMs provided adequate information on early syphilis symptoms, they struggled with complex clinical nuances. The specialists' evaluation showed that only 60% of the AI-generated content was deemed clinically reliable without further edits, with ChatGPT 4.0 rated highest by experts in terms of readability and clinical accuracy.</p><p><strong>Conclusions: </strong>LLMs hold promise for disseminating syphilis information, but human oversight is crucial. AI models need refinement to improve their accuracy, especially in complex medical scenarios.</p>","PeriodicalId":49016,"journal":{"name":"Medicina Oral Patologia Oral Y Cirugia Bucal","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of language models in addressing syphilis-related queries.\",\"authors\":\"L-M Ferreira, J-P Nascimento, L-L Souza, F-T Souza, L-D Guimarães, M-A Lopes, P-A Vargas, H Martelli-Júnior\",\"doi\":\"10.4317/medoral.27092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Syphilis, caused by Treponema pallidum, is a significant global health concern with potentially severe complications if untreated. Advances in artificial intelligence (AI), particularly large language models (LLMs), offer opportunities to enhance medical diagnosis and public health education. This study aims to assess LLMs' ability to provide readable, accurate, and comprehensive syphilis information by comparing it with WHO datasheets and validating through specialist evaluation for clinical relevance.</p><p><strong>Material and methods: </strong>Ten AI-based LLMs were evaluated. Ten questions addressing symptoms, transmission, diagnosis, treatment, and prevention were crafted by researchers. Responses from the LLMs were compared to World Health Organization (WHO) syphilis fact sheets, and a panel of specialists assessed the accuracy, clinical relevance, and readability of the AI-generated information.</p><p><strong>Results: </strong>Among the evaluated LLMs, ChatGPT 4.0 and Claude demonstrated the highest accuracy, scoring 92% and 89% alignment with WHO standards, respectively. Perplexity and Llama3 performed less reliably, with scores between 60-70%, especially in areas like tertiary syphilis and neurosyphilis. Specialists identified common errors, such as outdated treatment protocols and incorrect descriptions of transmission pathways. Expert reviews further revealed that while LLMs provided adequate information on early syphilis symptoms, they struggled with complex clinical nuances. The specialists' evaluation showed that only 60% of the AI-generated content was deemed clinically reliable without further edits, with ChatGPT 4.0 rated highest by experts in terms of readability and clinical accuracy.</p><p><strong>Conclusions: </strong>LLMs hold promise for disseminating syphilis information, but human oversight is crucial. AI models need refinement to improve their accuracy, especially in complex medical scenarios.</p>\",\"PeriodicalId\":49016,\"journal\":{\"name\":\"Medicina Oral Patologia Oral Y Cirugia Bucal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medicina Oral Patologia Oral Y Cirugia Bucal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.4317/medoral.27092\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicina Oral Patologia Oral Y Cirugia Bucal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4317/medoral.27092","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Comparative analysis of language models in addressing syphilis-related queries.
Background: Syphilis, caused by Treponema pallidum, is a significant global health concern with potentially severe complications if untreated. Advances in artificial intelligence (AI), particularly large language models (LLMs), offer opportunities to enhance medical diagnosis and public health education. This study aims to assess LLMs' ability to provide readable, accurate, and comprehensive syphilis information by comparing it with WHO datasheets and validating through specialist evaluation for clinical relevance.
Material and methods: Ten AI-based LLMs were evaluated. Ten questions addressing symptoms, transmission, diagnosis, treatment, and prevention were crafted by researchers. Responses from the LLMs were compared to World Health Organization (WHO) syphilis fact sheets, and a panel of specialists assessed the accuracy, clinical relevance, and readability of the AI-generated information.
Results: Among the evaluated LLMs, ChatGPT 4.0 and Claude demonstrated the highest accuracy, scoring 92% and 89% alignment with WHO standards, respectively. Perplexity and Llama3 performed less reliably, with scores between 60-70%, especially in areas like tertiary syphilis and neurosyphilis. Specialists identified common errors, such as outdated treatment protocols and incorrect descriptions of transmission pathways. Expert reviews further revealed that while LLMs provided adequate information on early syphilis symptoms, they struggled with complex clinical nuances. The specialists' evaluation showed that only 60% of the AI-generated content was deemed clinically reliable without further edits, with ChatGPT 4.0 rated highest by experts in terms of readability and clinical accuracy.
Conclusions: LLMs hold promise for disseminating syphilis information, but human oversight is crucial. AI models need refinement to improve their accuracy, especially in complex medical scenarios.
期刊介绍:
1. Oral Medicine and Pathology:
Clinicopathological as well as medical or surgical management aspects of
diseases affecting oral mucosa, salivary glands, maxillary bones, as well as
orofacial neurological disorders, and systemic conditions with an impact on
the oral cavity.
2. Oral Surgery:
Surgical management aspects of diseases affecting oral mucosa, salivary glands,
maxillary bones, teeth, implants, oral surgical procedures. Surgical management
of diseases affecting head and neck areas.
3. Medically compromised patients in Dentistry:
Articles discussing medical problems in Odontology will also be included, with
a special focus on the clinico-odontological management of medically compromised patients, and considerations regarding high-risk or disabled patients.
4. Implantology
5. Periodontology