Artificial Intelligence Design for Race-Based Prostate Cancer Stage Classification With Multilayer Perceptron: Feature Selection Optimization Approach.

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES
Adithama Mulia, David Agustriawan, Marlinda Overbeek, Moeljono Widjaja, Vincent Kurniawan, Jheno Syechlo, Muhammad Imran Ahmad, Srinivasulu Yerukala Sathipati, Nilubon Kurubanjerdjit
{"title":"Artificial Intelligence Design for Race-Based Prostate Cancer Stage Classification With Multilayer Perceptron: Feature Selection Optimization Approach.","authors":"Adithama Mulia, David Agustriawan, Marlinda Overbeek, Moeljono Widjaja, Vincent Kurniawan, Jheno Syechlo, Muhammad Imran Ahmad, Srinivasulu Yerukala Sathipati, Nilubon Kurubanjerdjit","doi":"10.2196/82587","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Prostate cancer progression exhibits significant variability influenced by biological and racial factors. DNA methylation profiling has shown potential in early cancer detection, but its integration with machine learning across racially diverse populations remains limited.</p><p><strong>Objective: </strong>This study aimed to develop a prostate cancer stage classifier for the majority White cohort using DNA methylation data and a multilayer perceptron (MLP) model in order to classify prostate cancer stages into early (stages I-II) and late (stages III-IV) stages and assess its performance when applied to other racial groups to highlight the need for race-specific models.</p><p><strong>Methods: </strong>Methylation and phenotype data from the TCGA-PRAD (The Cancer Genome Atlas Prostate Adenocarcinoma) dataset were processed using differentially methylated position (DMP) analysis to identify CpG sites correlated with cancer stages. These features were further refined through recursive feature elimination (RFE) and used to train MLP models. Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) were used to interpret the model and identify key DNA methylation features contributing to model predictions.</p><p><strong>Results: </strong>The best-performing model achieved 95% accuracy and up to 99% area under the curve on the majority race (White) training data using 90 selected features. However, performance declined sharply in racial minority groups, revealing the effects of sample imbalance and race-specific methylation patterns. Feature importance examination indicated strong patterns within certain CpG sites driving model predictions.</p><p><strong>Conclusions: </strong>We propose a race-aware MLP model for prostate cancer stage classification using DNA methylation data, which has been optimized through DMP and RFE-based feature selection. SHAP and LIME confirmed the predictive relevance of selected CpG sites, supporting model transparency. The results highlight high performance within the White cohort but reveal poor generalization to racial minority groups, emphasizing the importance of race-specific modeling strategies.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"10 ","pages":"e82587"},"PeriodicalIF":2.0000,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13086062/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/82587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Prostate cancer progression exhibits significant variability influenced by biological and racial factors. DNA methylation profiling has shown potential in early cancer detection, but its integration with machine learning across racially diverse populations remains limited.

Objective: This study aimed to develop a prostate cancer stage classifier for the majority White cohort using DNA methylation data and a multilayer perceptron (MLP) model in order to classify prostate cancer stages into early (stages I-II) and late (stages III-IV) stages and assess its performance when applied to other racial groups to highlight the need for race-specific models.

Methods: Methylation and phenotype data from the TCGA-PRAD (The Cancer Genome Atlas Prostate Adenocarcinoma) dataset were processed using differentially methylated position (DMP) analysis to identify CpG sites correlated with cancer stages. These features were further refined through recursive feature elimination (RFE) and used to train MLP models. Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) were used to interpret the model and identify key DNA methylation features contributing to model predictions.

Results: The best-performing model achieved 95% accuracy and up to 99% area under the curve on the majority race (White) training data using 90 selected features. However, performance declined sharply in racial minority groups, revealing the effects of sample imbalance and race-specific methylation patterns. Feature importance examination indicated strong patterns within certain CpG sites driving model predictions.

Conclusions: We propose a race-aware MLP model for prostate cancer stage classification using DNA methylation data, which has been optimized through DMP and RFE-based feature selection. SHAP and LIME confirmed the predictive relevance of selected CpG sites, supporting model transparency. The results highlight high performance within the White cohort but reveal poor generalization to racial minority groups, emphasizing the importance of race-specific modeling strategies.

基于种族的多层感知器前列腺癌分期的人工智能设计:特征选择优化方法。
背景:前列腺癌的进展表现出明显的可变性,受生物学和种族因素的影响。DNA甲基化分析在早期癌症检测中显示出潜力,但它与机器学习在不同种族人群中的整合仍然有限。目的:本研究旨在利用DNA甲基化数据和多层感知器(MLP)模型为大多数白人队列开发前列腺癌分期分类器,以便将前列腺癌分期分为早期(I-II期)和晚期(III-IV期),并评估其在应用于其他种族群体时的表现,以突出对种族特异性模型的需求。方法:使用差异甲基化位置(DMP)分析对来自TCGA-PRAD(前列腺癌基因组图谱)数据集的甲基化和表型数据进行处理,以确定与癌症分期相关的CpG位点。这些特征通过递归特征消除(RFE)进一步细化,并用于训练MLP模型。使用Shapley加性解释(SHAP)和局部可解释模型不可知论解释(LIME)来解释模型并确定有助于模型预测的关键DNA甲基化特征。结果:使用90个选定的特征,在大多数种族(白人)训练数据上,表现最好的模型达到95%的准确率和高达99%的曲线下面积。然而,少数种族群体的表现急剧下降,揭示了样本不平衡和种族特异性甲基化模式的影响。特征重要性检查表明,在某些CpG位点驱动模型预测的强模式。结论:我们提出了一个基于DNA甲基化数据的种族感知的前列腺癌分期MLP模型,并通过基于DMP和rfe的特征选择对该模型进行了优化。SHAP和LIME证实了所选CpG位点的预测相关性,支持了模型的透明度。结果强调了白人群体的高绩效,但揭示了少数种族群体的低泛化,强调了种族特定建模策略的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Formative Research
JMIR Formative Research Medicine-Medicine (miscellaneous)
CiteScore
2.70
自引率
9.10%
发文量
579
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书