基于增强峰度的投影追踪:一种用于多组学数据分析和集成的新颖、先进的机器学习方法

IF 13.1 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Fabian Bong, Ibrahim Ahmed, Nithya Ramakrishnan, Karla N Valenzuela-Valderas, Peter D Wentzell, Jasmine Barra, Tobias K Karakach
{"title":"基于增强峰度的投影追踪:一种用于多组学数据分析和集成的新颖、先进的机器学习方法","authors":"Fabian Bong, Ibrahim Ahmed, Nithya Ramakrishnan, Karla N Valenzuela-Valderas, Peter D Wentzell, Jasmine Barra, Tobias K Karakach","doi":"10.1093/nar/gkaf844","DOIUrl":null,"url":null,"abstract":"Due to the heterogeneity of multi-omics data, exacting their maximum information potential remains a challenge. Whereas some solutions have been offered, most cannot overcome the large linear dynamic range associated with such data, while others require large biological effect sizes to produce meaningful models. Here, we (i) perform a comprehensive benchmarking of multi-omics data analysis tools, and (ii) introduce kurtosis-based projection pursuit analysis, augmented with classification and regression trees (kPPA-CART) as a robust, easy-to-implement alternative. Using ground truth data, we demonstrate that kPPA-CART exhibits superiority in inferring biological significance from low-intensity (low-count) features and studies with small biological effect sizes. Applying it to experimental breast cancer data from The Cancer Genome Atlas, we identify novel genes that cluster the samples into subtypes that mimic the canonical PAM50 classes with notable improvements. Validating with external metastatic breast cancer data from the AURORA US consortium, kPPA-CART identifies genes that are associated with poor event-free survival and additional clustering associated with increased tumor mutational burden. Finally, we provide an R package and an online implementation of kPPA-CART.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"3 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Augmented kurtosis-based projection pursuit: a novel, advanced machine learning approach for multi-omics data analysis and integration\",\"authors\":\"Fabian Bong, Ibrahim Ahmed, Nithya Ramakrishnan, Karla N Valenzuela-Valderas, Peter D Wentzell, Jasmine Barra, Tobias K Karakach\",\"doi\":\"10.1093/nar/gkaf844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the heterogeneity of multi-omics data, exacting their maximum information potential remains a challenge. Whereas some solutions have been offered, most cannot overcome the large linear dynamic range associated with such data, while others require large biological effect sizes to produce meaningful models. Here, we (i) perform a comprehensive benchmarking of multi-omics data analysis tools, and (ii) introduce kurtosis-based projection pursuit analysis, augmented with classification and regression trees (kPPA-CART) as a robust, easy-to-implement alternative. Using ground truth data, we demonstrate that kPPA-CART exhibits superiority in inferring biological significance from low-intensity (low-count) features and studies with small biological effect sizes. Applying it to experimental breast cancer data from The Cancer Genome Atlas, we identify novel genes that cluster the samples into subtypes that mimic the canonical PAM50 classes with notable improvements. Validating with external metastatic breast cancer data from the AURORA US consortium, kPPA-CART identifies genes that are associated with poor event-free survival and additional clustering associated with increased tumor mutational burden. Finally, we provide an R package and an online implementation of kPPA-CART.\",\"PeriodicalId\":19471,\"journal\":{\"name\":\"Nucleic Acids Research\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":13.1000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nucleic Acids Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/nar/gkaf844\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf844","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

由于多组学数据的异质性,确定其最大的信息潜力仍然是一个挑战。虽然已经提供了一些解决方案,但大多数解决方案无法克服与此类数据相关的大线性动态范围,而其他解决方案则需要大的生物效应规模来产生有意义的模型。在这里,我们(i)对多组学数据分析工具进行了全面的基准测试,(ii)引入了基于峰度的投影追踪分析,并辅以分类和回归树(kPPA-CART)作为一种鲁棒性强、易于实现的替代方法。使用地面真实数据,我们证明了kPPA-CART在从低强度(低计数)特征和小生物效应大小的研究中推断生物学意义方面具有优势。将其应用于来自癌症基因组图谱的实验性乳腺癌数据,我们确定了新的基因,将样本聚类为模仿典型PAM50类的亚型,并显着改善。通过AURORA US联盟的外部转移性乳腺癌数据验证,kPPA-CART确定了与不良无事件生存相关的基因,以及与肿瘤突变负担增加相关的额外聚类。最后,我们提供了一个R包和kPPA-CART的在线实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Augmented kurtosis-based projection pursuit: a novel, advanced machine learning approach for multi-omics data analysis and integration
Due to the heterogeneity of multi-omics data, exacting their maximum information potential remains a challenge. Whereas some solutions have been offered, most cannot overcome the large linear dynamic range associated with such data, while others require large biological effect sizes to produce meaningful models. Here, we (i) perform a comprehensive benchmarking of multi-omics data analysis tools, and (ii) introduce kurtosis-based projection pursuit analysis, augmented with classification and regression trees (kPPA-CART) as a robust, easy-to-implement alternative. Using ground truth data, we demonstrate that kPPA-CART exhibits superiority in inferring biological significance from low-intensity (low-count) features and studies with small biological effect sizes. Applying it to experimental breast cancer data from The Cancer Genome Atlas, we identify novel genes that cluster the samples into subtypes that mimic the canonical PAM50 classes with notable improvements. Validating with external metastatic breast cancer data from the AURORA US consortium, kPPA-CART identifies genes that are associated with poor event-free survival and additional clustering associated with increased tumor mutational burden. Finally, we provide an R package and an online implementation of kPPA-CART.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Nucleic Acids Research
Nucleic Acids Research 生物-生化与分子生物学
CiteScore
27.10
自引率
4.70%
发文量
1057
审稿时长
2 months
期刊介绍: Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信