Muhammad Hamraz , Tahir Abbas , Fawad Ali , Dost Muhammad Khan , Muhammad Aamir
{"title":"基于改进鲁棒比例重叠评分的高维微阵列数据特征选择","authors":"Muhammad Hamraz , Tahir Abbas , Fawad Ali , Dost Muhammad Khan , Muhammad Aamir","doi":"10.1016/j.compbiomed.2025.110165","DOIUrl":null,"url":null,"abstract":"<div><div>High-dimensional microarray datasets often contain tens of thousands of genes but only a small number of samples, typically ranging from tens to a few hundred. This imbalance, known as the curse of dimensionality or the <em>n</em> ≪ <em>p</em> problem, hampers the learning process. To address this issue, this study introduces the Modified Robust Proportional Overlapping Score (MRPOS), an enhanced feature selection method based on robust measures of dispersion, specifically the <em>Sn</em> and <em>Qn</em> statistics by Rousseeuw and Croux. MRPOS identifies discriminative genes in binary class problems by evaluating gene expression overlap. This study considers the four gene expression datasets, each divided into two parts: a training subset covering 70 % of the data and a testing subset holding the remaining 30 %. The MRPOS eliminates genes with high inter-class similarity while retaining those differentiating classes. The method's performance is assessed against four established feature selection techniques using classification error rates from four gene expression datasets. Three classifiers, random forest, k-nearest neighbor (k-NN), and support vector machine (SVM), are employed, with results visualized through bar plots of classification errors. The findings highlight the distinctiveness and effectiveness of the proposed method.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"191 ","pages":"Article 110165"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modified Robust Proportional Overlapping Score for feature selection in high-dimensional micro-array data\",\"authors\":\"Muhammad Hamraz , Tahir Abbas , Fawad Ali , Dost Muhammad Khan , Muhammad Aamir\",\"doi\":\"10.1016/j.compbiomed.2025.110165\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>High-dimensional microarray datasets often contain tens of thousands of genes but only a small number of samples, typically ranging from tens to a few hundred. This imbalance, known as the curse of dimensionality or the <em>n</em> ≪ <em>p</em> problem, hampers the learning process. To address this issue, this study introduces the Modified Robust Proportional Overlapping Score (MRPOS), an enhanced feature selection method based on robust measures of dispersion, specifically the <em>Sn</em> and <em>Qn</em> statistics by Rousseeuw and Croux. MRPOS identifies discriminative genes in binary class problems by evaluating gene expression overlap. This study considers the four gene expression datasets, each divided into two parts: a training subset covering 70 % of the data and a testing subset holding the remaining 30 %. The MRPOS eliminates genes with high inter-class similarity while retaining those differentiating classes. The method's performance is assessed against four established feature selection techniques using classification error rates from four gene expression datasets. Three classifiers, random forest, k-nearest neighbor (k-NN), and support vector machine (SVM), are employed, with results visualized through bar plots of classification errors. The findings highlight the distinctiveness and effectiveness of the proposed method.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"191 \",\"pages\":\"Article 110165\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010482525005165\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005165","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
Modified Robust Proportional Overlapping Score for feature selection in high-dimensional micro-array data
High-dimensional microarray datasets often contain tens of thousands of genes but only a small number of samples, typically ranging from tens to a few hundred. This imbalance, known as the curse of dimensionality or the n ≪ p problem, hampers the learning process. To address this issue, this study introduces the Modified Robust Proportional Overlapping Score (MRPOS), an enhanced feature selection method based on robust measures of dispersion, specifically the Sn and Qn statistics by Rousseeuw and Croux. MRPOS identifies discriminative genes in binary class problems by evaluating gene expression overlap. This study considers the four gene expression datasets, each divided into two parts: a training subset covering 70 % of the data and a testing subset holding the remaining 30 %. The MRPOS eliminates genes with high inter-class similarity while retaining those differentiating classes. The method's performance is assessed against four established feature selection techniques using classification error rates from four gene expression datasets. Three classifiers, random forest, k-nearest neighbor (k-NN), and support vector machine (SVM), are employed, with results visualized through bar plots of classification errors. The findings highlight the distinctiveness and effectiveness of the proposed method.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.