{"title":"人工智能对尾骨痛常见问题的回应:评估gpt - 40性能的准确性和一致性。","authors":"Aslinur Keles, Ozge Gulsum Illeez, Berkay Erbagci, Esra Giray","doi":"10.46497/ArchRheumatol.2025.10966","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to assess whether GPT-4o's responses to patient-centered frequently asked questions about coccydynia are accurate and consistent when asked at different times and from different accounts.</p><p><strong>Materials and methods: </strong>Questions were collected from medical websites, forums, and patient support groups and posed to GPT-4o. The responses were evaluated by two physiatrists for accuracy and consistency. Responses were categorized: <i>(i)</i> correct and comprehensive, <i>(ii)</i> correct but not inadequate, <i>(iii)</i> partially correct and partially incorrect, and <i>(iv)</i> completely incorrect. Inconsistencies in scoring were resolved by an additional reviewer as needed. Statistical analysis, including Cohen's kappa for interreviewer reliability, was performed.</p><p><strong>Results: </strong>Of the 81 responses, 45.7% were rated as correct and comprehensive, while 49.4% were correct but incomplete. Only 4.9% of the responses contained partially incorrect information, and no responses were completely incorrect. The interreviewer agreement was substantial (kappa=0.67), but 75% of the responses differed between the two rounds. Notably, 34.9% of initially incomplete answers improved in the second round.</p><p><strong>Conclusion: </strong>GPT-4o shows promise in providing accurate and generally reliable information about coccydynia. However, the variability observed in response consistency across repeated queries suggests that while the model is useful for patient education and general inquiries, it may not be suitable for providing specialized clinical knowledge without human oversight.</p>","PeriodicalId":93884,"journal":{"name":"Archives of rheumatology","volume":"40 1","pages":"63-71"},"PeriodicalIF":1.1000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12010271/pdf/","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence-generated responses to frequently asked questions on coccydynia: Evaluating the accuracy and consistency of GPT-4o's performance.\",\"authors\":\"Aslinur Keles, Ozge Gulsum Illeez, Berkay Erbagci, Esra Giray\",\"doi\":\"10.46497/ArchRheumatol.2025.10966\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>This study aimed to assess whether GPT-4o's responses to patient-centered frequently asked questions about coccydynia are accurate and consistent when asked at different times and from different accounts.</p><p><strong>Materials and methods: </strong>Questions were collected from medical websites, forums, and patient support groups and posed to GPT-4o. The responses were evaluated by two physiatrists for accuracy and consistency. Responses were categorized: <i>(i)</i> correct and comprehensive, <i>(ii)</i> correct but not inadequate, <i>(iii)</i> partially correct and partially incorrect, and <i>(iv)</i> completely incorrect. Inconsistencies in scoring were resolved by an additional reviewer as needed. Statistical analysis, including Cohen's kappa for interreviewer reliability, was performed.</p><p><strong>Results: </strong>Of the 81 responses, 45.7% were rated as correct and comprehensive, while 49.4% were correct but incomplete. Only 4.9% of the responses contained partially incorrect information, and no responses were completely incorrect. The interreviewer agreement was substantial (kappa=0.67), but 75% of the responses differed between the two rounds. Notably, 34.9% of initially incomplete answers improved in the second round.</p><p><strong>Conclusion: </strong>GPT-4o shows promise in providing accurate and generally reliable information about coccydynia. However, the variability observed in response consistency across repeated queries suggests that while the model is useful for patient education and general inquiries, it may not be suitable for providing specialized clinical knowledge without human oversight.</p>\",\"PeriodicalId\":93884,\"journal\":{\"name\":\"Archives of rheumatology\",\"volume\":\"40 1\",\"pages\":\"63-71\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2025-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12010271/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Archives of rheumatology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46497/ArchRheumatol.2025.10966\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q4\",\"JCRName\":\"RHEUMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of rheumatology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46497/ArchRheumatol.2025.10966","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
Artificial intelligence-generated responses to frequently asked questions on coccydynia: Evaluating the accuracy and consistency of GPT-4o's performance.
Objectives: This study aimed to assess whether GPT-4o's responses to patient-centered frequently asked questions about coccydynia are accurate and consistent when asked at different times and from different accounts.
Materials and methods: Questions were collected from medical websites, forums, and patient support groups and posed to GPT-4o. The responses were evaluated by two physiatrists for accuracy and consistency. Responses were categorized: (i) correct and comprehensive, (ii) correct but not inadequate, (iii) partially correct and partially incorrect, and (iv) completely incorrect. Inconsistencies in scoring were resolved by an additional reviewer as needed. Statistical analysis, including Cohen's kappa for interreviewer reliability, was performed.
Results: Of the 81 responses, 45.7% were rated as correct and comprehensive, while 49.4% were correct but incomplete. Only 4.9% of the responses contained partially incorrect information, and no responses were completely incorrect. The interreviewer agreement was substantial (kappa=0.67), but 75% of the responses differed between the two rounds. Notably, 34.9% of initially incomplete answers improved in the second round.
Conclusion: GPT-4o shows promise in providing accurate and generally reliable information about coccydynia. However, the variability observed in response consistency across repeated queries suggests that while the model is useful for patient education and general inquiries, it may not be suitable for providing specialized clinical knowledge without human oversight.