Penalized logistic regression with prior information for microarray gene expression classification.

IF 16.4 1区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Accounts of Chemical Research Pub Date : 2022-11-25 eCollection Date: 2024-05-01 DOI:10.1515/ijb-2022-0025

Murat Genç

{"title":"Penalized logistic regression with prior information for microarray gene expression classification.","authors":"Murat Genç","doi":"10.1515/ijb-2022-0025","DOIUrl":null,"url":null,"abstract":"<p><p>Cancer classification and gene selection are important applications in DNA microarray gene expression data analysis. Since DNA microarray data suffers from the high-dimensionality problem, automatic gene selection methods are used to enhance the classification performance of expert classifier systems. In this paper, a new penalized logistic regression method that performs simultaneous gene coefficient estimation and variable selection in DNA microarray data is discussed. The method employs prior information about the gene coefficients to improve the classification accuracy of the underlying model. The coordinate descent algorithm with screening rules is given to obtain the gene coefficient estimates of the proposed method efficiently. The performance of the method is examined on five high-dimensional cancer classification datasets using the area under the curve, the number of selected genes, misclassification rate and F-score measures. The real data analysis results indicate that the proposed method achieves a good cancer classification performance with a small misclassification rate, large area under the curve and F-score by trading off some sparsity level of the underlying model. Hence, the proposed method can be seen as a reliable penalized logistic regression method in the scope of high-dimensional cancer classification.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2022-0025","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Cancer classification and gene selection are important applications in DNA microarray gene expression data analysis. Since DNA microarray data suffers from the high-dimensionality problem, automatic gene selection methods are used to enhance the classification performance of expert classifier systems. In this paper, a new penalized logistic regression method that performs simultaneous gene coefficient estimation and variable selection in DNA microarray data is discussed. The method employs prior information about the gene coefficients to improve the classification accuracy of the underlying model. The coordinate descent algorithm with screening rules is given to obtain the gene coefficient estimates of the proposed method efficiently. The performance of the method is examined on five high-dimensional cancer classification datasets using the area under the curve, the number of selected genes, misclassification rate and F-score measures. The real data analysis results indicate that the proposed method achieves a good cancer classification performance with a small misclassification rate, large area under the curve and F-score by trading off some sparsity level of the underlying model. Hence, the proposed method can be seen as a reliable penalized logistic regression method in the scope of high-dimensional cancer classification.

查看原文本刊更多论文

利用先验信息对微阵列基因表达分类进行有惩罚的逻辑回归。

癌症分类和基因选择是 DNA 微阵列基因表达数据分析的重要应用。由于 DNA 微阵列数据存在高维问题，因此需要使用自动基因选择方法来提高专家分类器系统的分类性能。本文讨论了一种新的惩罚性逻辑回归方法，它能在 DNA 微阵列数据中同时进行基因系数估计和变量选择。该方法利用基因系数的先验信息来提高基础模型的分类准确性。给出了带有筛选规则的坐标下降算法，以高效地获得所提方法的基因系数估计值。在五个高维癌症分类数据集上，使用曲线下面积、选中基因数、误分类率和 F 分数等指标检验了该方法的性能。实际数据分析结果表明，所提出的方法通过交换基础模型的某些稀疏程度，实现了较小的误分类率、较大的曲线下面积和 F 分数，具有良好的癌症分类性能。因此，在高维癌症分类中，所提出的方法可以看作是一种可靠的惩罚逻辑回归方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Accounts of Chemical Research 化学-化学综合

CiteScore

31.40

自引率

1.10%

发文量

312

审稿时长

2 months

期刊介绍： Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.