Adithama Mulia, David Agustriawan, Marlinda Overbeek, Moeljono Widjaja, Vincent Kurniawan, Jheno Syechlo, Muhammad Imran Ahmad, Srinivasulu Yerukala Sathipati, Nilubon Kurubanjerdjit
{"title":"Artificial Intelligence Design for Race-Based Prostate Cancer Stage Classification With Multilayer Perceptron: Feature Selection Optimization Approach.","authors":"Adithama Mulia, David Agustriawan, Marlinda Overbeek, Moeljono Widjaja, Vincent Kurniawan, Jheno Syechlo, Muhammad Imran Ahmad, Srinivasulu Yerukala Sathipati, Nilubon Kurubanjerdjit","doi":"10.2196/82587","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Prostate cancer progression exhibits significant variability influenced by biological and racial factors. DNA methylation profiling has shown potential in early cancer detection, but its integration with machine learning across racially diverse populations remains limited.</p><p><strong>Objective: </strong>This study aimed to develop a prostate cancer stage classifier for the majority White cohort using DNA methylation data and a multilayer perceptron (MLP) model in order to classify prostate cancer stages into early (stages I-II) and late (stages III-IV) stages and assess its performance when applied to other racial groups to highlight the need for race-specific models.</p><p><strong>Methods: </strong>Methylation and phenotype data from the TCGA-PRAD (The Cancer Genome Atlas Prostate Adenocarcinoma) dataset were processed using differentially methylated position (DMP) analysis to identify CpG sites correlated with cancer stages. These features were further refined through recursive feature elimination (RFE) and used to train MLP models. Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) were used to interpret the model and identify key DNA methylation features contributing to model predictions.</p><p><strong>Results: </strong>The best-performing model achieved 95% accuracy and up to 99% area under the curve on the majority race (White) training data using 90 selected features. However, performance declined sharply in racial minority groups, revealing the effects of sample imbalance and race-specific methylation patterns. Feature importance examination indicated strong patterns within certain CpG sites driving model predictions.</p><p><strong>Conclusions: </strong>We propose a race-aware MLP model for prostate cancer stage classification using DNA methylation data, which has been optimized through DMP and RFE-based feature selection. SHAP and LIME confirmed the predictive relevance of selected CpG sites, supporting model transparency. The results highlight high performance within the White cohort but reveal poor generalization to racial minority groups, emphasizing the importance of race-specific modeling strategies.</p>","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"10 ","pages":"e82587"},"PeriodicalIF":2.0000,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13086062/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/82587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Prostate cancer progression exhibits significant variability influenced by biological and racial factors. DNA methylation profiling has shown potential in early cancer detection, but its integration with machine learning across racially diverse populations remains limited.
Objective: This study aimed to develop a prostate cancer stage classifier for the majority White cohort using DNA methylation data and a multilayer perceptron (MLP) model in order to classify prostate cancer stages into early (stages I-II) and late (stages III-IV) stages and assess its performance when applied to other racial groups to highlight the need for race-specific models.
Methods: Methylation and phenotype data from the TCGA-PRAD (The Cancer Genome Atlas Prostate Adenocarcinoma) dataset were processed using differentially methylated position (DMP) analysis to identify CpG sites correlated with cancer stages. These features were further refined through recursive feature elimination (RFE) and used to train MLP models. Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) were used to interpret the model and identify key DNA methylation features contributing to model predictions.
Results: The best-performing model achieved 95% accuracy and up to 99% area under the curve on the majority race (White) training data using 90 selected features. However, performance declined sharply in racial minority groups, revealing the effects of sample imbalance and race-specific methylation patterns. Feature importance examination indicated strong patterns within certain CpG sites driving model predictions.
Conclusions: We propose a race-aware MLP model for prostate cancer stage classification using DNA methylation data, which has been optimized through DMP and RFE-based feature selection. SHAP and LIME confirmed the predictive relevance of selected CpG sites, supporting model transparency. The results highlight high performance within the White cohort but reveal poor generalization to racial minority groups, emphasizing the importance of race-specific modeling strategies.