Assessing the response quality and readability of ChatGPT in stuttering

IF 2.1 3区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Journal of Fluency Disorders Pub Date : 2025-08-15 DOI:10.1016/j.jfludis.2025.106149

Saeed Saeedi , Mehdi Bakhtiar

{"title":"Assessing the response quality and readability of ChatGPT in stuttering","authors":"Saeed Saeedi , Mehdi Bakhtiar","doi":"10.1016/j.jfludis.2025.106149","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study aimed to examine how frequently asked questions regarding stuttering were comprehended and answered by ChatGPT.</div></div><div><h3>Methods</h3><div>In this exploratory study, eleven common questions about stuttering were asked in a single conversation with the GPT-4o mini. While being blind relative to the source of the answers (whether by AI or SLPs), a panel of five certified speech and language pathologists (SLPs) was requested to differentiate if responses were produced by the ChatGPT chatbot or provided by SLPs. Additionally, they were instructed to evaluate the responses based on several criteria, including the presence of inaccuracies, the potential for causing harm and the degree of harm that could result, and alignment with the prevailing consensus within the SLP community. All ChatGPT responses were also evaluated utilizing various readability features, including the Flesch Reading Ease Score (FRES), Gunning Fog Scale Level (GFSL), and Dale-Chall Score (D-CS), the number of words, number of sentences, words per sentence (WPS), characters per word (CPW), and the percentage of difficult words. Furthermore, Spearman's rank correlation coefficient was employed to examine relationship between the evaluations conducted by the panel of certified SLPs and readability features.</div></div><div><h3>Results</h3><div>A substantial proportion of the AI-generated responses (45.50 %) were incorrectly identified by SLP panel as being written by other SLPs, indicating high perceived human-likeness (origin). Regarding content quality, 83.60 % of the responses were found to be accurate (incorrectness), 63.60 % were rated as harmless (harm), and 38.20 % were considered to cause only minor to moderate impact (extent of harm). In terms of professional alignment, 62 % of the responses reflected the prevailing views within the SLP community (consensus). The means ± standard deviation of FRES, GFSL, and D-CS were 26.52 ± 13.94 (readable for college graduates), 18.17 ± 3.39 (readable for graduate students), and 9.90 ± 1.08 (readable for 13th to 15th grade [college]), respectively. Furthermore, each response contained an average of 99.73 words, 6.80 sentences, 17.44 WPS, 5.79 CPW, and 27.96 % difficult words. The correlation coefficients ranged between significantly large negative value (<em>r</em> = -0.909, <em>p</em> < 0.05) to very large positive value (<em>r</em> = 0.918, <em>p</em> < 0.05).</div></div><div><h3>Conclusion</h3><div>The results revealed that the emerging ChatGPT possesses a promising capability to provide appropriate responses to frequently asked questions in the field of stuttering, which is attested by the fact that panel of certified SLPs perceived about 45 % of them to be generated by SLPs. However, given the increasing accessibility of AI tools, particularly among individuals with limited access to professional services, it is crucial to emphasize that such tools are intended solely for educational purposes and should not replace diagnosis or treatment by qualified SLPs.</div></div>","PeriodicalId":49166,"journal":{"name":"Journal of Fluency Disorders","volume":"85 ","pages":"Article 106149"},"PeriodicalIF":2.1000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Fluency Disorders","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0094730X25000518","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

This study aimed to examine how frequently asked questions regarding stuttering were comprehended and answered by ChatGPT.

Methods

In this exploratory study, eleven common questions about stuttering were asked in a single conversation with the GPT-4o mini. While being blind relative to the source of the answers (whether by AI or SLPs), a panel of five certified speech and language pathologists (SLPs) was requested to differentiate if responses were produced by the ChatGPT chatbot or provided by SLPs. Additionally, they were instructed to evaluate the responses based on several criteria, including the presence of inaccuracies, the potential for causing harm and the degree of harm that could result, and alignment with the prevailing consensus within the SLP community. All ChatGPT responses were also evaluated utilizing various readability features, including the Flesch Reading Ease Score (FRES), Gunning Fog Scale Level (GFSL), and Dale-Chall Score (D-CS), the number of words, number of sentences, words per sentence (WPS), characters per word (CPW), and the percentage of difficult words. Furthermore, Spearman's rank correlation coefficient was employed to examine relationship between the evaluations conducted by the panel of certified SLPs and readability features.

Results

A substantial proportion of the AI-generated responses (45.50 %) were incorrectly identified by SLP panel as being written by other SLPs, indicating high perceived human-likeness (origin). Regarding content quality, 83.60 % of the responses were found to be accurate (incorrectness), 63.60 % were rated as harmless (harm), and 38.20 % were considered to cause only minor to moderate impact (extent of harm). In terms of professional alignment, 62 % of the responses reflected the prevailing views within the SLP community (consensus). The means ± standard deviation of FRES, GFSL, and D-CS were 26.52 ± 13.94 (readable for college graduates), 18.17 ± 3.39 (readable for graduate students), and 9.90 ± 1.08 (readable for 13th to 15th grade [college]), respectively. Furthermore, each response contained an average of 99.73 words, 6.80 sentences, 17.44 WPS, 5.79 CPW, and 27.96 % difficult words. The correlation coefficients ranged between significantly large negative value (r = -0.909, p < 0.05) to very large positive value (r = 0.918, p < 0.05).

Conclusion

The results revealed that the emerging ChatGPT possesses a promising capability to provide appropriate responses to frequently asked questions in the field of stuttering, which is attested by the fact that panel of certified SLPs perceived about 45 % of them to be generated by SLPs. However, given the increasing accessibility of AI tools, particularly among individuals with limited access to professional services, it is crucial to emphasize that such tools are intended solely for educational purposes and should not replace diagnosis or treatment by qualified SLPs.

查看原文本刊更多论文

评估ChatGPT在口吃中的反应质量和可读性

目的本研究旨在研究ChatGPT对口吃常见问题的理解和回答情况。方法在这项探索性研究中，在与gpt - 40 mini的单次对话中询问了11个关于口吃的常见问题。虽然对答案的来源（无论是人工智能还是语言病理学家）一无所知，但一个由五名经过认证的语音和语言病理学家（slp）组成的小组被要求区分答案是由ChatGPT聊天机器人产生的还是由slp提供的。此外，他们被指示根据几个标准来评估回应，包括不准确的存在、造成伤害的潜在可能性和可能造成的伤害程度，以及与SLP社区内普遍共识的一致性。所有ChatGPT回答也使用各种可读性特征进行评估，包括Flesch Reading Ease Score (FRES), Gunning Fog Scale Level （GFSL）和Dale-Chall Score (D-CS)，单词数，句子数，每句单词数（WPS），每个单词字符数（CPW）和困难单词百分比。此外，采用Spearman等级相关系数来检验经认证的slp小组所进行的评价与可读性特征之间的关系。结果大量ai生成的回复（45.50 %）被SLP小组错误地识别为由其他SLP撰写，表明高度感知的人类相似性（起源）。关于内容质量，83.60 %的回答被认为是准确的（不正确），63.60 %被评为无害（伤害），38.20 %被认为只造成轻微到中度的影响（伤害程度）。在专业一致性方面，62% %的回应反映了SLP社区的主流观点（共识）。FRES、GFSL和D-CS的平均值±标准差分别为26.52 ± 13.94（大学毕业生可读）、18.17 ± 3.39（研究生可读）和9.90 ± 1.08（大学13 ~ 15年级可读）。此外，每个回答平均包含99.73个单词，6.80个句子，17.44个WPS， 5.79个CPW和27.96% %的难词。相关系数从显著的负值（r = -0.909,p <； 0.05）到非常大的正值（r = 0.918,p <； 0.05）。结果表明，新兴的ChatGPT具有为口吃领域的常见问题提供适当回答的能力，这一事实得到了认证的slp小组约45% %的问题是由slp产生的事实的证明。然而，鉴于人工智能工具的可及性越来越高，特别是在获得专业服务的机会有限的个人中，必须强调的是，这些工具仅用于教育目的，不应取代合格的社会服务提供者的诊断或治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Fluency Disorders AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY-REHABILITATION

CiteScore

3.70

自引率

14.30%

发文量

审稿时长

>12 weeks

期刊介绍： Journal of Fluency Disorders provides comprehensive coverage of clinical, experimental, and theoretical aspects of stuttering, including the latest remediation techniques. As the official journal of the International Fluency Association, the journal features full-length research and clinical reports; methodological, theoretical and philosophical articles; reviews; short communications and much more – all readily accessible and tailored to the needs of the professional.