Linear regression-based feature selection for microarray data classification.

IF 0.4 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics Pub Date : 2015-01-01 DOI:10.1504/ijdmb.2015.066776

Md Abid Hasan, Md Kamrul Hasan, M Abdul Mottalib

{"title":"Linear regression-based feature selection for microarray data classification.","authors":"Md Abid Hasan, Md Kamrul Hasan, M Abdul Mottalib","doi":"10.1504/ijdmb.2015.066776","DOIUrl":null,"url":null,"abstract":"<p><p>Predicting the class of gene expression profiles helps improve the diagnosis and treatment of diseases. Analysing huge gene expression data otherwise known as microarray data is complicated due to its high dimensionality. Hence the traditional classifiers do not perform well where the number of features far exceeds the number of samples. A good set of features help classifiers to classify the dataset efficiently. Moreover, a manageable set of features is also desirable for the biologist for further analysis. In this paper, we have proposed a linear regression-based feature selection method for selecting discriminative features. Our main focus is to classify the dataset more accurately using less number of features than other traditional feature selection methods. Our method has been compared with several other methods and in almost every case the classification accuracy is higher using less number of features than the other popular feature selection methods.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 2","pages":"167-79"},"PeriodicalIF":0.4000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066776","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1504/ijdmb.2015.066776","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 11

Abstract

Predicting the class of gene expression profiles helps improve the diagnosis and treatment of diseases. Analysing huge gene expression data otherwise known as microarray data is complicated due to its high dimensionality. Hence the traditional classifiers do not perform well where the number of features far exceeds the number of samples. A good set of features help classifiers to classify the dataset efficiently. Moreover, a manageable set of features is also desirable for the biologist for further analysis. In this paper, we have proposed a linear regression-based feature selection method for selecting discriminative features. Our main focus is to classify the dataset more accurately using less number of features than other traditional feature selection methods. Our method has been compared with several other methods and in almost every case the classification accuracy is higher using less number of features than the other popular feature selection methods.

查看原文本刊更多论文

基于线性回归的微阵列数据分类特征选择。

预测基因表达谱的类别有助于改善疾病的诊断和治疗。由于其高维性，分析大量基因表达数据或称为微阵列数据是复杂的。因此，传统的分类器在特征数量远远超过样本数量的情况下表现不佳。一组好的特征可以帮助分类器有效地对数据集进行分类。此外，生物学家还需要一组可管理的特征以进行进一步分析。本文提出了一种基于线性回归的特征选择方法，用于判别特征的选择。我们的主要重点是使用比其他传统特征选择方法更少的特征来更准确地分类数据集。我们的方法与其他几种方法进行了比较，在几乎所有情况下，使用较少的特征数量的分类精度都比其他常用的特征选择方法高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Data Mining and Bioinformatics 生物-数学与计算生物学

CiteScore

1.00

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Mining bioinformatics data is an emerging area at the intersection between bioinformatics and data mining. The objective of IJDMB is to facilitate collaboration between data mining researchers and bioinformaticians by presenting cutting edge research topics and methodologies in the area of data mining for bioinformatics. This perspective acknowledges the inter-disciplinary nature of research in data mining and bioinformatics and provides a unified forum for researchers/practitioners/students/policy makers to share the latest research and developments in this fast growing multi-disciplinary research area.