Multinomial Logistic Regression with Adaptive Regularization for Cancer Subtype Classification via Multi-omics Data

IF 2.4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics Pub Date : 2024-06-24 DOI:10.2174/0115748936308171240605075531

Yingdi Wu, Fuzhen Cao, Juntao Li

{"title":"Multinomial Logistic Regression with Adaptive Regularization for Cancer Subtype Classification via Multi-omics Data","authors":"Yingdi Wu, Fuzhen Cao, Juntao Li","doi":"10.2174/0115748936308171240605075531","DOIUrl":null,"url":null,"abstract":"Background: Integrating multi-omics data for cancer classification brings complementary biological insights while also facing challenges such as data integration, gene grouping, and adaptive weight construction. Objective: This paper aims to address the challenges faced by the cancer subtype classification and gene screening based on multi-omics data. Methods: Multinomial logistic regression with adaptive regularization (MLRAR) was proposed by integrating DNA methylation, gene mutation, and RNA-seq information. A data preprocessing strategy that effectively utilizes multi-omics information was presented, and the local maximum quasiclique merging (lmQCM) algorithm was implemented to group genes. Biological pathway information was utilized to evaluate the significance of gene groups, while the significance of each gene within a group was evaluated by integrating mutation information, information theory, and methylation information. Results: Compared to MRlasso, MRGL, MSGL, MROGL, AMRSOGL, and AGLRMR, the proposed method yielded improvements in subtype classification accuracy of breast cancer by 2.6%, 2.9%, 3.5%, 2.3%, 2.0%, and 1.8%, respectively. In addition, MLRAR also achieved significant improvements in ovarian cancer by 8.2%, 5.0%, 6.8%, 5.2%, 12.7%, and 6.3%, respectively. Conclusion: The proposed method can effectively deal with data integration, gene grouping, and adaptive weight construction.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"2016 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936308171240605075531","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Integrating multi-omics data for cancer classification brings complementary biological insights while also facing challenges such as data integration, gene grouping, and adaptive weight construction. Objective: This paper aims to address the challenges faced by the cancer subtype classification and gene screening based on multi-omics data. Methods: Multinomial logistic regression with adaptive regularization (MLRAR) was proposed by integrating DNA methylation, gene mutation, and RNA-seq information. A data preprocessing strategy that effectively utilizes multi-omics information was presented, and the local maximum quasiclique merging (lmQCM) algorithm was implemented to group genes. Biological pathway information was utilized to evaluate the significance of gene groups, while the significance of each gene within a group was evaluated by integrating mutation information, information theory, and methylation information. Results: Compared to MRlasso, MRGL, MSGL, MROGL, AMRSOGL, and AGLRMR, the proposed method yielded improvements in subtype classification accuracy of breast cancer by 2.6%, 2.9%, 3.5%, 2.3%, 2.0%, and 1.8%, respectively. In addition, MLRAR also achieved significant improvements in ovarian cancer by 8.2%, 5.0%, 6.8%, 5.2%, 12.7%, and 6.3%, respectively. Conclusion: The proposed method can effectively deal with data integration, gene grouping, and adaptive weight construction.

查看原文本刊更多论文

利用自适应正则化的多项式逻辑回归，通过多组学数据进行癌症亚型分类

背景：整合多组学数据用于癌症分类可带来互补的生物学见解，但同时也面临着数据整合、基因分组和自适应权重构建等挑战。目的：本文旨在解决癌症亚型分类和基因筛选所面临的挑战：本文旨在解决基于多组学数据的癌症亚型分类和基因筛选所面临的挑战。研究方法通过整合 DNA 甲基化、基因突变和 RNA-seq 信息，提出了自适应正则化多叉逻辑回归（MLRAR）。提出了一种有效利用多组学信息的数据预处理策略，并采用局部最大准斜率合并（lmQCM）算法对基因进行分组。利用生物通路信息来评估基因组的重要性，同时通过整合突变信息、信息论和甲基化信息来评估组内每个基因的重要性。结果：与 MRlasso、MRGL、MSGL、MROGL、AMRSOGL 和 AGLRMR 相比，所提出的方法在乳腺癌亚型分类准确率方面分别提高了 2.6%、2.9%、3.5%、2.3%、2.0% 和 1.8%。此外，MLRAR 对卵巢癌的分类准确率也有显著提高，分别提高了 8.2%、5.0%、6.8%、5.2%、12.7% 和 6.3%。结论所提出的方法能有效处理数据整合、基因分组和自适应权重构建等问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.