{"title":"Comparative Evaluation of Teaching Plans on Prostate Cancer Generated by Various Large Language Models and a Human Expert","authors":"Rong Wang, Yue Ding, Yajun Shen, Haiyong Liu, Ping Wang, Zhixiang Gao","doi":"10.1002/eng2.70303","DOIUrl":null,"url":null,"abstract":"<p>Prostate cancer remains one of the most common malignancies affecting men globally, characterized by high morbidity and mortality rates. The complexity and variability of the disease necessitate diverse treatment strategies, ranging from active surveillance to more aggressive interventions such as radical prostatectomy, radiation therapy, and androgen deprivation therapy. This study investigates the potential of large language models (LLMs) in generating educational content for prostate cancer, focusing on the creation of teaching plans in both Chinese and English. Four LLMs—GPT-4 (OpenAI), Gemini 1.5 Pro (Google), Kimi AI (Microsoft), and Douban (ByteDance)—were evaluated against teaching plans developed by an experienced urology professor. A double-blind assessment by 25 urology faculty members using a standardized 10-point scale was employed to compare the quality of curriculum content, learning objectives, and outcomes. The results revealed that GPT-4 and Gemini 1.5 Pro outperformed Kimi AI and Douban, yet still lagged behind human-generated plans, particularly in Chinese. Statistical analyses indicated significant differences in the quality scores among the LLMs and the human experts, underscoring the necessity of integrating domain-specific knowledge into AI-generated content. This research highlights the promise and limitations of LLMs in medical education, suggesting that future developments should focus on hybrid models that combine artificial intelligence with human expertise to enhance educational efficacy.</p>","PeriodicalId":72922,"journal":{"name":"Engineering reports : open access","volume":"7 8","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.70303","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering reports : open access","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/eng2.70303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Prostate cancer remains one of the most common malignancies affecting men globally, characterized by high morbidity and mortality rates. The complexity and variability of the disease necessitate diverse treatment strategies, ranging from active surveillance to more aggressive interventions such as radical prostatectomy, radiation therapy, and androgen deprivation therapy. This study investigates the potential of large language models (LLMs) in generating educational content for prostate cancer, focusing on the creation of teaching plans in both Chinese and English. Four LLMs—GPT-4 (OpenAI), Gemini 1.5 Pro (Google), Kimi AI (Microsoft), and Douban (ByteDance)—were evaluated against teaching plans developed by an experienced urology professor. A double-blind assessment by 25 urology faculty members using a standardized 10-point scale was employed to compare the quality of curriculum content, learning objectives, and outcomes. The results revealed that GPT-4 and Gemini 1.5 Pro outperformed Kimi AI and Douban, yet still lagged behind human-generated plans, particularly in Chinese. Statistical analyses indicated significant differences in the quality scores among the LLMs and the human experts, underscoring the necessity of integrating domain-specific knowledge into AI-generated content. This research highlights the promise and limitations of LLMs in medical education, suggesting that future developments should focus on hybrid models that combine artificial intelligence with human expertise to enhance educational efficacy.