Paula Benito López, Daniela Adamo, Vito Carlo Alberto Caponio, José González-Serrano, Alan Roger Dos Santos Silva, Miguel de Pedro Herráez, Rui Albuquerque, María Pía López Jornet, Vlaho Brailo, Arwa Farag, Márcio Diniz Freitas, Noburo Noma, Richeal Ni Riordain, Gonzalo Hernández, Rosa María López-Pintor
{"title":"基于人工智能的大型语言模型能否帮助获取关于灼口综合征的信息?","authors":"Paula Benito López, Daniela Adamo, Vito Carlo Alberto Caponio, José González-Serrano, Alan Roger Dos Santos Silva, Miguel de Pedro Herráez, Rui Albuquerque, María Pía López Jornet, Vlaho Brailo, Arwa Farag, Márcio Diniz Freitas, Noburo Noma, Richeal Ni Riordain, Gonzalo Hernández, Rosa María López-Pintor","doi":"10.1111/odi.70078","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Burning Mouth Syndrome (BMS) is an idiopathic chronic orofacial pain disorder with diagnostic and therapeutic challenges. Inexperienced clinicians may desperately resort to online information. The objective of this study was to evaluate the usefulness, quality, and readability of responses generated by three artificial intelligence large language models (AI-LLMs)-ChatGPT-4, Gemini, and Microsoft Copilot-to frequent questions about BMS.</p><p><strong>Materials and methods: </strong>Nine clinically relevant open-ended questions were identified through search-trend analysis and expert review. Standardized prompts were submitted, and responses were independently rated by 12 international experts using a 4-point usefulness scale. Quality was evaluated using the QAMAI tool. Readability was measured using Flesch-Kincaid Grade Level and Reading Ease scores. Statistical analyses included Kruskal-Wallis and Bonferroni correction.</p><p><strong>Results: </strong>All AI-LLMs produced moderately useful responses, with no significant difference in global performance. Gemini achieved highest overall quality scores, particularly in relevance, completeness, and source provision. Copilot scored lower in usefulness and source provision. No significant differences were obtained among AI-LLMs. Average readability corresponded to 12th grade, with ChatGPT requiring the highest proficiency.</p><p><strong>Conclusions: </strong>AI-LLMs show potential for generating reliable information on BMS, though variability in quality, readability, and source citation remains concerning. Continuous optimization is essential to ensure their clinical integration.</p>","PeriodicalId":19615,"journal":{"name":"Oral diseases","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can Large Artificial Intelligence-Based Linguistic Models Help to Obtain Information About Burning Mouth Syndrome?\",\"authors\":\"Paula Benito López, Daniela Adamo, Vito Carlo Alberto Caponio, José González-Serrano, Alan Roger Dos Santos Silva, Miguel de Pedro Herráez, Rui Albuquerque, María Pía López Jornet, Vlaho Brailo, Arwa Farag, Márcio Diniz Freitas, Noburo Noma, Richeal Ni Riordain, Gonzalo Hernández, Rosa María López-Pintor\",\"doi\":\"10.1111/odi.70078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Burning Mouth Syndrome (BMS) is an idiopathic chronic orofacial pain disorder with diagnostic and therapeutic challenges. Inexperienced clinicians may desperately resort to online information. The objective of this study was to evaluate the usefulness, quality, and readability of responses generated by three artificial intelligence large language models (AI-LLMs)-ChatGPT-4, Gemini, and Microsoft Copilot-to frequent questions about BMS.</p><p><strong>Materials and methods: </strong>Nine clinically relevant open-ended questions were identified through search-trend analysis and expert review. Standardized prompts were submitted, and responses were independently rated by 12 international experts using a 4-point usefulness scale. Quality was evaluated using the QAMAI tool. Readability was measured using Flesch-Kincaid Grade Level and Reading Ease scores. Statistical analyses included Kruskal-Wallis and Bonferroni correction.</p><p><strong>Results: </strong>All AI-LLMs produced moderately useful responses, with no significant difference in global performance. Gemini achieved highest overall quality scores, particularly in relevance, completeness, and source provision. Copilot scored lower in usefulness and source provision. No significant differences were obtained among AI-LLMs. Average readability corresponded to 12th grade, with ChatGPT requiring the highest proficiency.</p><p><strong>Conclusions: </strong>AI-LLMs show potential for generating reliable information on BMS, though variability in quality, readability, and source citation remains concerning. Continuous optimization is essential to ensure their clinical integration.</p>\",\"PeriodicalId\":19615,\"journal\":{\"name\":\"Oral diseases\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Oral diseases\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/odi.70078\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Oral diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/odi.70078","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Can Large Artificial Intelligence-Based Linguistic Models Help to Obtain Information About Burning Mouth Syndrome?
Objective: Burning Mouth Syndrome (BMS) is an idiopathic chronic orofacial pain disorder with diagnostic and therapeutic challenges. Inexperienced clinicians may desperately resort to online information. The objective of this study was to evaluate the usefulness, quality, and readability of responses generated by three artificial intelligence large language models (AI-LLMs)-ChatGPT-4, Gemini, and Microsoft Copilot-to frequent questions about BMS.
Materials and methods: Nine clinically relevant open-ended questions were identified through search-trend analysis and expert review. Standardized prompts were submitted, and responses were independently rated by 12 international experts using a 4-point usefulness scale. Quality was evaluated using the QAMAI tool. Readability was measured using Flesch-Kincaid Grade Level and Reading Ease scores. Statistical analyses included Kruskal-Wallis and Bonferroni correction.
Results: All AI-LLMs produced moderately useful responses, with no significant difference in global performance. Gemini achieved highest overall quality scores, particularly in relevance, completeness, and source provision. Copilot scored lower in usefulness and source provision. No significant differences were obtained among AI-LLMs. Average readability corresponded to 12th grade, with ChatGPT requiring the highest proficiency.
Conclusions: AI-LLMs show potential for generating reliable information on BMS, though variability in quality, readability, and source citation remains concerning. Continuous optimization is essential to ensure their clinical integration.
期刊介绍:
Oral Diseases is a multidisciplinary and international journal with a focus on head and neck disorders, edited by leaders in the field, Professor Giovanni Lodi (Editor-in-Chief, Milan, Italy), Professor Stefano Petti (Deputy Editor, Rome, Italy) and Associate Professor Gulshan Sunavala-Dossabhoy (Deputy Editor, Shreveport, LA, USA). The journal is pre-eminent in oral medicine. Oral Diseases specifically strives to link often-isolated areas of dentistry and medicine through broad-based scholarship that includes well-designed and controlled clinical research, analytical epidemiology, and the translation of basic science in pre-clinical studies. The journal typically publishes articles relevant to many related medical specialties including especially dermatology, gastroenterology, hematology, immunology, infectious diseases, neuropsychiatry, oncology and otolaryngology. The essential requirement is that all submitted research is hypothesis-driven, with significant positive and negative results both welcomed. Equal publication emphasis is placed on etiology, pathogenesis, diagnosis, prevention and treatment.