An adaptive feature reduction algorithm for cancer classification using wavelet decomposition of serum proteomic and DNA microarray data

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) Pub Date : 2011-11-12 DOI:10.1109/BIBMW.2011.6112391

S. Rashid, G. M. Maruf

{"title":"An adaptive feature reduction algorithm for cancer classification using wavelet decomposition of serum proteomic and DNA microarray data","authors":"S. Rashid, G. M. Maruf","doi":"10.1109/BIBMW.2011.6112391","DOIUrl":null,"url":null,"abstract":"A significant challenge in DNA microarray and mass spectrometric data analysis can be attributed to the problem of having a large number of features with a small number of samples or patients in the data set. Particular care is required to deal with such a problem as the low classification accuracy of a model brought about by the small number of features may depict a low predictive capability. To overcome the associated challenges, proper approaches for data preprocessing, feature reduction and identifying the optimal set of features are critical. In this paper, a novel technique has been proposed for feature reduction and cancer classification; which is applicable for two different types of biological data. The proposed method has been implemented on Surface enhanced laser desorption/ionization time-of-flight mass spectrometric (SELDI-TOF-MS) and DNA microarray data sets. This technique is self adaptive and independent of the type data sets. We have developed a two step strategy for feature reduction such as (1) data preprocessing which includes merging and t-testing and (2) wavelet decomposition. For classification purpose, support vector machine (SVM) has been proposed. By evaluating the performance of the proposed algorithm on the two types of datasets it has been shown that the classification accuracy, sensitivity and specificity obtained by the features selected by the proposed method consistently give excellent performance.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"40 1","pages":"305-312"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2011.6112391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

A significant challenge in DNA microarray and mass spectrometric data analysis can be attributed to the problem of having a large number of features with a small number of samples or patients in the data set. Particular care is required to deal with such a problem as the low classification accuracy of a model brought about by the small number of features may depict a low predictive capability. To overcome the associated challenges, proper approaches for data preprocessing, feature reduction and identifying the optimal set of features are critical. In this paper, a novel technique has been proposed for feature reduction and cancer classification; which is applicable for two different types of biological data. The proposed method has been implemented on Surface enhanced laser desorption/ionization time-of-flight mass spectrometric (SELDI-TOF-MS) and DNA microarray data sets. This technique is self adaptive and independent of the type data sets. We have developed a two step strategy for feature reduction such as (1) data preprocessing which includes merging and t-testing and (2) wavelet decomposition. For classification purpose, support vector machine (SVM) has been proposed. By evaluating the performance of the proposed algorithm on the two types of datasets it has been shown that the classification accuracy, sensitivity and specificity obtained by the features selected by the proposed method consistently give excellent performance.

查看原文本刊更多论文

基于血清蛋白质组和DNA微阵列数据的小波分解自适应特征约简算法

DNA微阵列和质谱数据分析的一个重大挑战可归因于数据集中具有少量样本或患者的大量特征的问题。需要特别注意的是，由于特征数量少而导致的模型分类精度低，可能说明模型的预测能力较低。为了克服相关的挑战，适当的数据预处理、特征缩减和识别最佳特征集的方法至关重要。本文提出了一种新的特征还原和肿瘤分类技术;这适用于两种不同类型的生物数据。该方法已在表面增强激光解吸/电离飞行时间质谱(SELDI-TOF-MS)和DNA微阵列数据集上实现。这种技术是自适应的，独立于类型数据集。我们已经开发了一个两步的特征约简策略，如:(1)数据预处理，包括合并和t检验;(2)小波分解。为了实现分类目的，提出了支持向量机(SVM)。通过对算法在两类数据集上的性能进行评价，结果表明，算法所选择的特征所获得的分类精度、灵敏度和特异性均具有优异的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)

自引率

0.00%

发文量