{"title":"Assessing the quality of ChatGPT's responses to commonly asked questions about trigger finger treatment.","authors":"Mehmet Can Gezer, Mehmet Armangil","doi":"10.14744/tjtes.2025.32735","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study aims to evaluate the accuracy and reliability of Generative Pre-trained Transformer (ChatGPT; OpenAI, San Francisco, California) in answering patient-related questions about trigger finger. This evaluation has the potential to enhance patient education prior to treatment and provides insight into the role of artificial intelligence (AI)-based systems in the patient educa-tion process.</p><p><strong>Methods: </strong>The ten most frequently asked questions regarding trigger finger were compiled from patient education websites and a literature review, then posed to ChatGPT. Two orthopedic specialists evaluated the responses using the Journal of the American Medical Association (JAMA) Benchmark criteria and the DISCERN instrument (A Tool for Judging the Quality of Written Consumer Health Information on Treatment Choices). Additionally, the readability of the responses was assessed using the Flesch-Kincaid Grade Level.</p><p><strong>Results: </strong>The DISCERN scores for ChatGPT's responses to trigger finger questions ranged from 35 to 47, with an average of 42, indicating \"moderate\" quality. While 60% of the responses were satisfactory, 40% contained deficiencies. According to the JAMA Benchmark criteria, the absence of scientific references was a significant drawback. The average readability level corresponded to the university level, making the information difficult to understand for patients with low health literacy. Improvements are needed to enhance the accessibility and comprehensibility of the content for a broader patient population.</p><p><strong>Conclusion: </strong>To the best of our knowledge, this is the first study to investigate the use of ChatGPT in the context of trigger finger. While ChatGPT shows reasonable effectiveness in providing general information on trigger finger, expert oversight is necessary before it can be relied upon as a primary source for patient education.</p>","PeriodicalId":94263,"journal":{"name":"Ulusal travma ve acil cerrahi dergisi = Turkish journal of trauma & emergency surgery : TJTES","volume":"31 4","pages":"389-393"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12000978/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ulusal travma ve acil cerrahi dergisi = Turkish journal of trauma & emergency surgery : TJTES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14744/tjtes.2025.32735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: This study aims to evaluate the accuracy and reliability of Generative Pre-trained Transformer (ChatGPT; OpenAI, San Francisco, California) in answering patient-related questions about trigger finger. This evaluation has the potential to enhance patient education prior to treatment and provides insight into the role of artificial intelligence (AI)-based systems in the patient educa-tion process.
Methods: The ten most frequently asked questions regarding trigger finger were compiled from patient education websites and a literature review, then posed to ChatGPT. Two orthopedic specialists evaluated the responses using the Journal of the American Medical Association (JAMA) Benchmark criteria and the DISCERN instrument (A Tool for Judging the Quality of Written Consumer Health Information on Treatment Choices). Additionally, the readability of the responses was assessed using the Flesch-Kincaid Grade Level.
Results: The DISCERN scores for ChatGPT's responses to trigger finger questions ranged from 35 to 47, with an average of 42, indicating "moderate" quality. While 60% of the responses were satisfactory, 40% contained deficiencies. According to the JAMA Benchmark criteria, the absence of scientific references was a significant drawback. The average readability level corresponded to the university level, making the information difficult to understand for patients with low health literacy. Improvements are needed to enhance the accessibility and comprehensibility of the content for a broader patient population.
Conclusion: To the best of our knowledge, this is the first study to investigate the use of ChatGPT in the context of trigger finger. While ChatGPT shows reasonable effectiveness in providing general information on trigger finger, expert oversight is necessary before it can be relied upon as a primary source for patient education.