基于奇异值分解的惩罚性多项式回归利用甲基化数据对不平衡髓母细胞瘤亚组进行分类

IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS
Journal of Computational Biology Pub Date : 2024-05-01 Epub Date: 2024-05-14 DOI:10.1089/cmb.2023.0198
Isra Mohammed, Murtada K Elbashir, Areeg S Faggad
{"title":"基于奇异值分解的惩罚性多项式回归利用甲基化数据对不平衡髓母细胞瘤亚组进行分类","authors":"Isra Mohammed, Murtada K Elbashir, Areeg S Faggad","doi":"10.1089/cmb.2023.0198","DOIUrl":null,"url":null,"abstract":"<p><p><b>Medulloblastoma (MB) is a molecularly heterogeneous brain malignancy with large differences in clinical presentation. According to genomic studies, there are at least four distinct molecular subgroups of MB: sonic hedgehog (SHH), wingless/INT (WNT), Group 3, and Group 4. The treatment and outcomes depend on appropriate classification. It is difficult for the classification algorithms to identify these subgroups from an imbalanced MB genomic data set, where the distribution of samples among the MB subgroups may not be equal. To overcome this problem, we used singular value decomposition (SVD) and group lasso techniques to find DNA methylation probe features that maximize the separation between the different imbalanced MB subgroups. We used multinomial regression as a classification method to classify the four different molecular subgroups of MB using the reduced DNA methylation data. Coordinate descent is used to solve our loss function associated with the group lasso, which promotes sparsity. By using SVD, we were able to reduce the 321,174 probe features to just 200 features. Less than 40 features were successfully selected after applying the group lasso, which we then used as predictors for our classification models. Our proposed method achieved an average overall accuracy of 99% based on fivefold cross-validation technique. Our approach produces improved classification performance compared with the state-of-the-art methods for classifying MB molecular subgroups</b>.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"458-471"},"PeriodicalIF":1.4000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Singular Value Decomposition-Based Penalized Multinomial Regression for Classifying Imbalanced Medulloblastoma Subgroups Using Methylation Data.\",\"authors\":\"Isra Mohammed, Murtada K Elbashir, Areeg S Faggad\",\"doi\":\"10.1089/cmb.2023.0198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Medulloblastoma (MB) is a molecularly heterogeneous brain malignancy with large differences in clinical presentation. According to genomic studies, there are at least four distinct molecular subgroups of MB: sonic hedgehog (SHH), wingless/INT (WNT), Group 3, and Group 4. The treatment and outcomes depend on appropriate classification. It is difficult for the classification algorithms to identify these subgroups from an imbalanced MB genomic data set, where the distribution of samples among the MB subgroups may not be equal. To overcome this problem, we used singular value decomposition (SVD) and group lasso techniques to find DNA methylation probe features that maximize the separation between the different imbalanced MB subgroups. We used multinomial regression as a classification method to classify the four different molecular subgroups of MB using the reduced DNA methylation data. Coordinate descent is used to solve our loss function associated with the group lasso, which promotes sparsity. By using SVD, we were able to reduce the 321,174 probe features to just 200 features. Less than 40 features were successfully selected after applying the group lasso, which we then used as predictors for our classification models. Our proposed method achieved an average overall accuracy of 99% based on fivefold cross-validation technique. Our approach produces improved classification performance compared with the state-of-the-art methods for classifying MB molecular subgroups</b>.</p>\",\"PeriodicalId\":15526,\"journal\":{\"name\":\"Journal of Computational Biology\",\"volume\":\" \",\"pages\":\"458-471\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1089/cmb.2023.0198\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/5/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2023.0198","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/14 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

髓母细胞瘤(MB)是一种分子异质性脑恶性肿瘤,临床表现差异很大。根据基因组研究,髓母细胞瘤至少有四个不同的分子亚组:声刺猬(SHH)、无翅/INT(WNT)、第 3 组和第 4 组。 治疗和预后取决于适当的分类。在不平衡的 MB 基因组数据集中,MB 亚组之间的样本分布可能不均等,因此分类算法很难识别这些亚组。为了解决这个问题,我们使用奇异值分解(SVD)和组套索技术来寻找 DNA 甲基化探针特征,以最大限度地分离不同的不平衡 MB 亚组。我们使用多项式回归作为分类方法,利用还原的 DNA 甲基化数据对 MB 的四个不同分子亚组进行分类。我们使用坐标下降法来解决与组套索相关的损失函数,从而提高了稀疏性。通过使用 SVD,我们将 321,174 个探针特征减少到了 200 个。应用分组套索后,我们成功选出了不到 40 个特征,并将其用作分类模型的预测因子。基于五重交叉验证技术,我们提出的方法达到了 99% 的平均整体准确率。与最先进的 MB 分子亚群分类方法相比,我们的方法提高了分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Singular Value Decomposition-Based Penalized Multinomial Regression for Classifying Imbalanced Medulloblastoma Subgroups Using Methylation Data.

Medulloblastoma (MB) is a molecularly heterogeneous brain malignancy with large differences in clinical presentation. According to genomic studies, there are at least four distinct molecular subgroups of MB: sonic hedgehog (SHH), wingless/INT (WNT), Group 3, and Group 4. The treatment and outcomes depend on appropriate classification. It is difficult for the classification algorithms to identify these subgroups from an imbalanced MB genomic data set, where the distribution of samples among the MB subgroups may not be equal. To overcome this problem, we used singular value decomposition (SVD) and group lasso techniques to find DNA methylation probe features that maximize the separation between the different imbalanced MB subgroups. We used multinomial regression as a classification method to classify the four different molecular subgroups of MB using the reduced DNA methylation data. Coordinate descent is used to solve our loss function associated with the group lasso, which promotes sparsity. By using SVD, we were able to reduce the 321,174 probe features to just 200 features. Less than 40 features were successfully selected after applying the group lasso, which we then used as predictors for our classification models. Our proposed method achieved an average overall accuracy of 99% based on fivefold cross-validation technique. Our approach produces improved classification performance compared with the state-of-the-art methods for classifying MB molecular subgroups.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Computational Biology
Journal of Computational Biology 生物-计算机:跨学科应用
CiteScore
3.60
自引率
5.90%
发文量
113
审稿时长
6-12 weeks
期刊介绍: Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics. Journal of Computational Biology coverage includes: -Genomics -Mathematical modeling and simulation -Distributed and parallel biological computing -Designing biological databases -Pattern matching and pattern detection -Linking disparate databases and data -New tools for computational biology -Relational and object-oriented database technology for bioinformatics -Biological expert system design and use -Reasoning by analogy, hypothesis formation, and testing by machine -Management of biological databases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信