{"title":"Responses of Different Artificial Intelligence Systems to Questions Related with Short Stature as Assessed by Pediatric Endocrinologists.","authors":"Kamber Kaşali, Özgür Fırat Özpolat, Merve Ülkü, Ayşe Sena Dönmez, Serap Kılıç Kaya, Esra Dişçi, Serkan Bilge Koca, Ufuk Özkaya, Hüseyin Demirbilek, Atilla Çayır","doi":"10.4274/jcrpe.galenos.2025.2025-6-14","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Artificial intelligence (AI) is increasingly utilized in medicine, including pediatric endocrinology. AI models have the potential to support clinical decision-making, patient education, and guidance. However, their accuracy, reliability, and effectiveness in providing medical information and recommendations remain unclear. This study aims to evaluate and compare the performance of four AI models-ChatGPT, Bard, Microsoft Copilot, and Pi-in answering frequently asked questions related to pediatric endocrinology.</p><p><strong>Methods: </strong>Nine questions commonly asked by parents regarding short stature in paediatric endocrinology have been selected based on literature reviews and expert opinions. These questions were posed to four AI models in both Turkish and English. The AI-generated responses were evaluated by 10 pediatric endocrinologists using a 12-item Likert-scale questionnaire assessing medical accuracy, completeness, guidance, and informativeness. Statistical analyses, including Kruskal-Wallis and post-hoc tests, were conducted to determine significant differences between AI models.</p><p><strong>Results: </strong>Bard outperformed other models in guidance and recommendation categories, excelling in directing users to medical consultation. Microsoft Copilot demonstrated strong medical accuracy but lacked guidance capacity. ChatGPT showed consistent performance in knowledge dissemination, making it effective for patient education. Pi scored the lowest in guidance and recommendations, indicating limited applicability in clinical settings. Significant differences were observed among AI models (p < 0.05), particularly in completeness and guidance-related categories.</p><p><strong>Conclusion: </strong>The study highlights the varying strengths and weaknesses of AI models in pediatric endocrinology. While Bard is effective in guidance, Microsoft Copilot excels in accuracy, and ChatGPT is informative. Future AI improvements should focus on balancing accuracy and guidance to enhance clinical decision-support and patient education. Tailored AI applications may optimize AI's role in specialized medical fields.</p>","PeriodicalId":48805,"journal":{"name":"Journal of Clinical Research in Pediatric Endocrinology","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Research in Pediatric Endocrinology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4274/jcrpe.galenos.2025.2025-6-14","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Artificial intelligence (AI) is increasingly utilized in medicine, including pediatric endocrinology. AI models have the potential to support clinical decision-making, patient education, and guidance. However, their accuracy, reliability, and effectiveness in providing medical information and recommendations remain unclear. This study aims to evaluate and compare the performance of four AI models-ChatGPT, Bard, Microsoft Copilot, and Pi-in answering frequently asked questions related to pediatric endocrinology.
Methods: Nine questions commonly asked by parents regarding short stature in paediatric endocrinology have been selected based on literature reviews and expert opinions. These questions were posed to four AI models in both Turkish and English. The AI-generated responses were evaluated by 10 pediatric endocrinologists using a 12-item Likert-scale questionnaire assessing medical accuracy, completeness, guidance, and informativeness. Statistical analyses, including Kruskal-Wallis and post-hoc tests, were conducted to determine significant differences between AI models.
Results: Bard outperformed other models in guidance and recommendation categories, excelling in directing users to medical consultation. Microsoft Copilot demonstrated strong medical accuracy but lacked guidance capacity. ChatGPT showed consistent performance in knowledge dissemination, making it effective for patient education. Pi scored the lowest in guidance and recommendations, indicating limited applicability in clinical settings. Significant differences were observed among AI models (p < 0.05), particularly in completeness and guidance-related categories.
Conclusion: The study highlights the varying strengths and weaknesses of AI models in pediatric endocrinology. While Bard is effective in guidance, Microsoft Copilot excels in accuracy, and ChatGPT is informative. Future AI improvements should focus on balancing accuracy and guidance to enhance clinical decision-support and patient education. Tailored AI applications may optimize AI's role in specialized medical fields.
期刊介绍:
The Journal of Clinical Research in Pediatric Endocrinology (JCRPE) publishes original research articles, reviews, short communications, letters, case reports and other special features related to the field of pediatric endocrinology. JCRPE is published in English by the Turkish Pediatric Endocrinology and Diabetes Society quarterly (March, June, September, December). The target audience is physicians, researchers and other healthcare professionals in all areas of pediatric endocrinology.