David Agustriawan, Adithama Mulia, Marlinda Vasty Overbeek, Vincent Kurniawan, Jheno Syechlo, Moeljono Widjaja, Muhammad Imran Ahmad
{"title":"基于基因表达数据的机器学习的种族特异性前列腺癌检测框架:特征选择优化方法。","authors":"David Agustriawan, Adithama Mulia, Marlinda Vasty Overbeek, Vincent Kurniawan, Jheno Syechlo, Moeljono Widjaja, Muhammad Imran Ahmad","doi":"10.2196/72423","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Previous machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles.</p><p><strong>Objective: </strong>To develop a classification method for diagnosing prostate cancer using gene expression in specific populations.</p><p><strong>Methods: </strong>This research uses Differentially Expressed Gene (DEG) analysis, Receiver Operating Characteristic (ROC) analysis, and MSigDB verification as a feature selection framework to identify genes for constructing Support Vector Machine (SVM) models.</p><p><strong>Results: </strong>Among the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for white patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved similarly strong performance 97% accuracy for white and 95% for African American patients while using only 9 gene features, trained on 374 samples and tested on 138 samples.</p><p><strong>Conclusions: </strong>The findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach.\",\"authors\":\"David Agustriawan, Adithama Mulia, Marlinda Vasty Overbeek, Vincent Kurniawan, Jheno Syechlo, Moeljono Widjaja, Muhammad Imran Ahmad\",\"doi\":\"10.2196/72423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Previous machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles.</p><p><strong>Objective: </strong>To develop a classification method for diagnosing prostate cancer using gene expression in specific populations.</p><p><strong>Methods: </strong>This research uses Differentially Expressed Gene (DEG) analysis, Receiver Operating Characteristic (ROC) analysis, and MSigDB verification as a feature selection framework to identify genes for constructing Support Vector Machine (SVM) models.</p><p><strong>Results: </strong>Among the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for white patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved similarly strong performance 97% accuracy for white and 95% for African American patients while using only 9 gene features, trained on 374 samples and tested on 138 samples.</p><p><strong>Conclusions: </strong>The findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.</p>\",\"PeriodicalId\":73552,\"journal\":{\"name\":\"JMIR bioinformatics and biotechnology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR bioinformatics and biotechnology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/72423\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR bioinformatics and biotechnology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/72423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach.
Background: Previous machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles.
Objective: To develop a classification method for diagnosing prostate cancer using gene expression in specific populations.
Methods: This research uses Differentially Expressed Gene (DEG) analysis, Receiver Operating Characteristic (ROC) analysis, and MSigDB verification as a feature selection framework to identify genes for constructing Support Vector Machine (SVM) models.
Results: Among the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for white patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved similarly strong performance 97% accuracy for white and 95% for African American patients while using only 9 gene features, trained on 374 samples and tested on 138 samples.
Conclusions: The findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.