Zhao Luo, Sung Chul Kam, Ji Yong Kim, Wenhao Hu, Chuan Lin, Hyun Jun Park, Yu Seob Shin
{"title":"从ChatGPT 4.0中获得的精索静脉曲张相关信息的质量和可读性在不同的查询模型中保持一致吗?","authors":"Zhao Luo, Sung Chul Kam, Ji Yong Kim, Wenhao Hu, Chuan Lin, Hyun Jun Park, Yu Seob Shin","doi":"10.5534/wjmh.240331","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>There is a growing tendency of individuals resorting to Chat-Generative Pretrained Transformer (ChatGPT) as a source of medical information on specific ailments. Varicocele is a prevalent condition affecting the male reproductive system. The quality, readability, and consistency of the information related to varicocele that individuals obtain through interactive access to ChatGPT remains uncertain.</p><p><strong>Materials and methods: </strong>This study employed Google Trends data to extract 25 trending questions since 2004. Two distinct inquiry methodologies were employed with ChatGPT 4.0: repetition mode (each question repeated three times) and cyclic mode (each question input once in three consecutive cycles). The generated texts were evaluated according to a number of criteria, including the Automated Readability Index (ARI), the Flesch Reading Ease Score (FRES), the Gunning Fog Index (GFI), the DISCERN score and the Ensuring Quality Information for Patients (EQIP). Kruskal-Wallis and Mann-Whitney U tests were employed to compare the text quality, readability, and consistency between the two modes.</p><p><strong>Results: </strong>The results demonstrated that the texts generated in repetition and cyclic modes exhibited no statistically significant differences in ARI (12.06±1.29 <i>vs.</i> 12.27±1.74), FRES (36.08±8.70 <i>vs.</i> 36.87±7.73), GFI (13.14±1.81 <i>vs.</i> 13.25±1.50), DISCERN scores (38.08±6.55 <i>vs.</i> 38.35±6.50) and EQIP (47.92±6.84 <i>vs.</i> 48.35±5.56) (p>0.05). These findings indicate that ChatGPT 4.0 consistently produces information of comparable complexity and quality across different inquiry modes.</p><p><strong>Conclusions: </strong>This study found that ChatGPT-generated medical information on \"varicocele\" demonstrates consistent quality and readability across different modes, highlighting its potential for stable healthcare information provision. However, the content's complexity poses challenges for general readers, and notable limitations in quality and reliability highlight the need for improved accuracy, credibility, and readability in AI-generated medical content.</p>","PeriodicalId":54261,"journal":{"name":"World Journal of Mens Health","volume":" ","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Does the Quality and Readability of Information Related to Varicocele Obtained from ChatGPT 4.0 Remain Consistent Across Different Models of Inquiry?\",\"authors\":\"Zhao Luo, Sung Chul Kam, Ji Yong Kim, Wenhao Hu, Chuan Lin, Hyun Jun Park, Yu Seob Shin\",\"doi\":\"10.5534/wjmh.240331\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>There is a growing tendency of individuals resorting to Chat-Generative Pretrained Transformer (ChatGPT) as a source of medical information on specific ailments. Varicocele is a prevalent condition affecting the male reproductive system. The quality, readability, and consistency of the information related to varicocele that individuals obtain through interactive access to ChatGPT remains uncertain.</p><p><strong>Materials and methods: </strong>This study employed Google Trends data to extract 25 trending questions since 2004. Two distinct inquiry methodologies were employed with ChatGPT 4.0: repetition mode (each question repeated three times) and cyclic mode (each question input once in three consecutive cycles). The generated texts were evaluated according to a number of criteria, including the Automated Readability Index (ARI), the Flesch Reading Ease Score (FRES), the Gunning Fog Index (GFI), the DISCERN score and the Ensuring Quality Information for Patients (EQIP). Kruskal-Wallis and Mann-Whitney U tests were employed to compare the text quality, readability, and consistency between the two modes.</p><p><strong>Results: </strong>The results demonstrated that the texts generated in repetition and cyclic modes exhibited no statistically significant differences in ARI (12.06±1.29 <i>vs.</i> 12.27±1.74), FRES (36.08±8.70 <i>vs.</i> 36.87±7.73), GFI (13.14±1.81 <i>vs.</i> 13.25±1.50), DISCERN scores (38.08±6.55 <i>vs.</i> 38.35±6.50) and EQIP (47.92±6.84 <i>vs.</i> 48.35±5.56) (p>0.05). These findings indicate that ChatGPT 4.0 consistently produces information of comparable complexity and quality across different inquiry modes.</p><p><strong>Conclusions: </strong>This study found that ChatGPT-generated medical information on \\\"varicocele\\\" demonstrates consistent quality and readability across different modes, highlighting its potential for stable healthcare information provision. However, the content's complexity poses challenges for general readers, and notable limitations in quality and reliability highlight the need for improved accuracy, credibility, and readability in AI-generated medical content.</p>\",\"PeriodicalId\":54261,\"journal\":{\"name\":\"World Journal of Mens Health\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Journal of Mens Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.5534/wjmh.240331\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ANDROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Mens Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5534/wjmh.240331","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANDROLOGY","Score":null,"Total":0}
引用次数: 0
摘要
目的:越来越多的人使用聊天生成预训练转换器(ChatGPT)作为特定疾病的医疗信息来源。精索静脉曲张是影响男性生殖系统的常见疾病。个人通过ChatGPT交互式访问获得的精索静脉曲张相关信息的质量、可读性和一致性仍然不确定。材料与方法:本研究采用谷歌Trends数据提取2004年以来的25个趋势问题。ChatGPT 4.0采用了两种不同的查询方法:重复模式(每个问题重复三次)和循环模式(每个问题输入一次,连续三个周期)。根据一系列标准对生成的文本进行评估,包括自动可读性指数(ARI)、Flesch Reading Ease Score (FRES)、Gunning Fog Index (GFI)、DISCERN评分和确保患者质量信息(EQIP)。采用Kruskal-Wallis和Mann-Whitney U检验比较两种模式的文本质量、可读性和一致性。结果:重复和循环生成的文本在ARI(12.06±1.29 vs. 12.27±1.74)、FRES(36.08±8.70 vs. 36.87±7.73)、GFI(13.14±1.81 vs. 13.25±1.50)、DISCERN评分(38.08±6.55 vs. 38.35±6.50)和EQIP(47.92±6.84 vs. 48.35±5.56)方面差异无统计学意义(p < 0.05)。这些发现表明,ChatGPT 4.0在不同的查询模式中始终如一地产生具有相当复杂性和质量的信息。结论:本研究发现,chatgpt生成的关于“精索静脉曲张”的医疗信息在不同模式下表现出一致的质量和可读性,突出了其稳定医疗信息提供的潜力。然而,内容的复杂性给普通读者带来了挑战,并且在质量和可靠性方面的显著限制突出了人工智能生成的医疗内容需要提高准确性、可信度和可读性。
Does the Quality and Readability of Information Related to Varicocele Obtained from ChatGPT 4.0 Remain Consistent Across Different Models of Inquiry?
Purpose: There is a growing tendency of individuals resorting to Chat-Generative Pretrained Transformer (ChatGPT) as a source of medical information on specific ailments. Varicocele is a prevalent condition affecting the male reproductive system. The quality, readability, and consistency of the information related to varicocele that individuals obtain through interactive access to ChatGPT remains uncertain.
Materials and methods: This study employed Google Trends data to extract 25 trending questions since 2004. Two distinct inquiry methodologies were employed with ChatGPT 4.0: repetition mode (each question repeated three times) and cyclic mode (each question input once in three consecutive cycles). The generated texts were evaluated according to a number of criteria, including the Automated Readability Index (ARI), the Flesch Reading Ease Score (FRES), the Gunning Fog Index (GFI), the DISCERN score and the Ensuring Quality Information for Patients (EQIP). Kruskal-Wallis and Mann-Whitney U tests were employed to compare the text quality, readability, and consistency between the two modes.
Results: The results demonstrated that the texts generated in repetition and cyclic modes exhibited no statistically significant differences in ARI (12.06±1.29 vs. 12.27±1.74), FRES (36.08±8.70 vs. 36.87±7.73), GFI (13.14±1.81 vs. 13.25±1.50), DISCERN scores (38.08±6.55 vs. 38.35±6.50) and EQIP (47.92±6.84 vs. 48.35±5.56) (p>0.05). These findings indicate that ChatGPT 4.0 consistently produces information of comparable complexity and quality across different inquiry modes.
Conclusions: This study found that ChatGPT-generated medical information on "varicocele" demonstrates consistent quality and readability across different modes, highlighting its potential for stable healthcare information provision. However, the content's complexity poses challenges for general readers, and notable limitations in quality and reliability highlight the need for improved accuracy, credibility, and readability in AI-generated medical content.