Quality and Readability Analysis of Artificial Intelligence-Generated Medical Information Related to Prostate Cancer: A Cross-Sectional Study of ChatGPT and DeepSeek.
Zhao Luo, Chuan Lin, Tae Hyo Kim, Yu Seob Shin, Sun Tae Ahn
{"title":"Quality and Readability Analysis of Artificial Intelligence-Generated Medical Information Related to Prostate Cancer: A Cross-Sectional Study of ChatGPT and DeepSeek.","authors":"Zhao Luo, Chuan Lin, Tae Hyo Kim, Yu Seob Shin, Sun Tae Ahn","doi":"10.5534/wjmh.250144","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Artificial intelligence (AI) tools have demonstrated considerable potential for the dissemination of medical information. However, variability may exist in the quality and readability of prostate-cancer-related content generated by different AI platforms. This study aimed to evaluate the quality, accuracy, and readability of prostate-cancer-related medical information produced by ChatGPT and DeepSeek.</p><p><strong>Materials and methods: </strong>Frequently asked questions related to prostate cancer were collected from the American Cancer Society website, ChatGPT, and DeepSeek. Three urologists with over 10 years of clinical experience reviewed and confirmed the relevance of the selected questions. The Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) was used to assess the understandability and actionability of AI-generated content. The DISCERN instrument was used to evaluate the quality of the treatment-related information. Additionally, readability was assessed using four established indices: Automated Readability Index (ARI), Flesch Reading Ease Score, Gunning Fog Index, and Flesch-Kincaid Grade Level.</p><p><strong>Results: </strong>No statistically significant differences were observed between ChatGPT and DeepSeek in PEMAT-P scores (70.66±8.13 <i>vs</i>. 69.35±8.83) or DISCERN scores (59.07±3.39 <i>vs</i>. 58.88±3.66) (p>0.05). However, the ARI for DeepSeek was higher than that for ChatGPT (12.63±1.42 <i>vs</i>. 10.85±1.93, p<0.001), indicating greater textual complexity and reading difficulty.</p><p><strong>Conclusions: </strong>AI tools, such as ChatGPT and DeepSeek, hold significant potential for enhancing patient education and disseminating medical information on prostate cancer. Nevertheless, further refinement of content quality and language clarity is needed to prevent potential misunderstandings, decisional uncertainty, and anxiety among patients due to difficulty in comprehension.</p>","PeriodicalId":54261,"journal":{"name":"World Journal of Mens Health","volume":" ","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Mens Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5534/wjmh.250144","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANDROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Artificial intelligence (AI) tools have demonstrated considerable potential for the dissemination of medical information. However, variability may exist in the quality and readability of prostate-cancer-related content generated by different AI platforms. This study aimed to evaluate the quality, accuracy, and readability of prostate-cancer-related medical information produced by ChatGPT and DeepSeek.
Materials and methods: Frequently asked questions related to prostate cancer were collected from the American Cancer Society website, ChatGPT, and DeepSeek. Three urologists with over 10 years of clinical experience reviewed and confirmed the relevance of the selected questions. The Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) was used to assess the understandability and actionability of AI-generated content. The DISCERN instrument was used to evaluate the quality of the treatment-related information. Additionally, readability was assessed using four established indices: Automated Readability Index (ARI), Flesch Reading Ease Score, Gunning Fog Index, and Flesch-Kincaid Grade Level.
Results: No statistically significant differences were observed between ChatGPT and DeepSeek in PEMAT-P scores (70.66±8.13 vs. 69.35±8.83) or DISCERN scores (59.07±3.39 vs. 58.88±3.66) (p>0.05). However, the ARI for DeepSeek was higher than that for ChatGPT (12.63±1.42 vs. 10.85±1.93, p<0.001), indicating greater textual complexity and reading difficulty.
Conclusions: AI tools, such as ChatGPT and DeepSeek, hold significant potential for enhancing patient education and disseminating medical information on prostate cancer. Nevertheless, further refinement of content quality and language clarity is needed to prevent potential misunderstandings, decisional uncertainty, and anxiety among patients due to difficulty in comprehension.