Quality and Readability Analysis of Artificial Intelligence-Generated Medical Information Related to Prostate Cancer: A Cross-Sectional Study of ChatGPT and DeepSeek.

IF 4.1 3区医学 Q1 ANDROLOGY

World Journal of Mens Health Pub Date : 2025-09-09 DOI:10.5534/wjmh.250144

Zhao Luo, Chuan Lin, Tae Hyo Kim, Yu Seob Shin, Sun Tae Ahn

{"title":"Quality and Readability Analysis of Artificial Intelligence-Generated Medical Information Related to Prostate Cancer: A Cross-Sectional Study of ChatGPT and DeepSeek.","authors":"Zhao Luo, Chuan Lin, Tae Hyo Kim, Yu Seob Shin, Sun Tae Ahn","doi":"10.5534/wjmh.250144","DOIUrl":null,"url":null,"abstract":"Purpose: Artificial intelligence (AI) tools have demonstrated considerable potential for the dissemination of medical information. However, variability may exist in the quality and readability of prostate-cancer-related content generated by different AI platforms. This study aimed to evaluate the quality, accuracy, and readability of prostate-cancer-related medical information produced by ChatGPT and DeepSeek.Materials and methods: Frequently asked questions related to prostate cancer were collected from the American Cancer Society website, ChatGPT, and DeepSeek. Three urologists with over 10 years of clinical experience reviewed and confirmed the relevance of the selected questions. The Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) was used to assess the understandability and actionability of AI-generated content. The DISCERN instrument was used to evaluate the quality of the treatment-related information. Additionally, readability was assessed using four established indices: Automated Readability Index (ARI), Flesch Reading Ease Score, Gunning Fog Index, and Flesch-Kincaid Grade Level.Results: No statistically significant differences were observed between ChatGPT and DeepSeek in PEMAT-P scores (70.66±8.13 vs. 69.35±8.83) or DISCERN scores (59.07±3.39 vs. 58.88±3.66) (p>0.05). However, the ARI for DeepSeek was higher than that for ChatGPT (12.63±1.42 vs. 10.85±1.93, p<0.001), indicating greater textual complexity and reading difficulty.Conclusions: AI tools, such as ChatGPT and DeepSeek, hold significant potential for enhancing patient education and disseminating medical information on prostate cancer. Nevertheless, further refinement of content quality and language clarity is needed to prevent potential misunderstandings, decisional uncertainty, and anxiety among patients due to difficulty in comprehension.","PeriodicalId":54261,"journal":{"name":"World Journal of Mens Health","volume":" ","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Mens Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5534/wjmh.250144","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANDROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Artificial intelligence (AI) tools have demonstrated considerable potential for the dissemination of medical information. However, variability may exist in the quality and readability of prostate-cancer-related content generated by different AI platforms. This study aimed to evaluate the quality, accuracy, and readability of prostate-cancer-related medical information produced by ChatGPT and DeepSeek.

Materials and methods: Frequently asked questions related to prostate cancer were collected from the American Cancer Society website, ChatGPT, and DeepSeek. Three urologists with over 10 years of clinical experience reviewed and confirmed the relevance of the selected questions. The Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) was used to assess the understandability and actionability of AI-generated content. The DISCERN instrument was used to evaluate the quality of the treatment-related information. Additionally, readability was assessed using four established indices: Automated Readability Index (ARI), Flesch Reading Ease Score, Gunning Fog Index, and Flesch-Kincaid Grade Level.

Results: No statistically significant differences were observed between ChatGPT and DeepSeek in PEMAT-P scores (70.66±8.13 vs. 69.35±8.83) or DISCERN scores (59.07±3.39 vs. 58.88±3.66) (p>0.05). However, the ARI for DeepSeek was higher than that for ChatGPT (12.63±1.42 vs. 10.85±1.93, p<0.001), indicating greater textual complexity and reading difficulty.

Conclusions: AI tools, such as ChatGPT and DeepSeek, hold significant potential for enhancing patient education and disseminating medical information on prostate cancer. Nevertheless, further refinement of content quality and language clarity is needed to prevent potential misunderstandings, decisional uncertainty, and anxiety among patients due to difficulty in comprehension.

查看原文本刊更多论文

人工智能生成的前列腺癌相关医疗信息的质量和可读性分析：ChatGPT和DeepSeek的横断面研究

目的：人工智能（AI）工具已显示出传播医疗信息的巨大潜力。然而，不同人工智能平台生成的前列腺癌相关内容的质量和可读性可能存在差异。本研究旨在评估ChatGPT和DeepSeek提供的前列腺癌相关医学信息的质量、准确性和可读性。材料和方法：从美国癌症协会网站、ChatGPT和DeepSeek收集与前列腺癌相关的常见问题。三位具有超过10年临床经验的泌尿科医生审查并确认了所选问题的相关性。使用可打印材料患者教育材料评估工具（PEMAT-P）来评估人工智能生成内容的可理解性和可操作性。使用DISCERN仪器来评估治疗相关信息的质量。此外，使用4个既定指标评估可读性：自动可读性指数（ARI）、Flesch阅读简易评分、Gunning Fog指数和Flesch- kincaid Grade Level。结果：ChatGPT和DeepSeek在PEMAT-P评分（70.66±8.13比69.35±8.83）和DISCERN评分（59.07±3.39比58.88±3.66）方面差异均无统计学意义（p < 0.05）。然而，DeepSeek的ARI高于ChatGPT（12.63±1.42 vs. 10.85±1.93）。结论：人工智能工具，如ChatGPT和DeepSeek，在加强患者教育和传播前列腺癌医学信息方面具有重要潜力。然而，需要进一步改进内容质量和语言清晰度，以防止患者因理解困难而产生误解、决策不确定性和焦虑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊