Fabian Bong, Ibrahim Ahmed, Nithya Ramakrishnan, Karla N Valenzuela-Valderas, Peter D Wentzell, Jasmine Barra, Tobias K Karakach
{"title":"基于增强峰度的投影追踪:一种用于多组学数据分析和集成的新颖、先进的机器学习方法","authors":"Fabian Bong, Ibrahim Ahmed, Nithya Ramakrishnan, Karla N Valenzuela-Valderas, Peter D Wentzell, Jasmine Barra, Tobias K Karakach","doi":"10.1093/nar/gkaf844","DOIUrl":null,"url":null,"abstract":"Due to the heterogeneity of multi-omics data, exacting their maximum information potential remains a challenge. Whereas some solutions have been offered, most cannot overcome the large linear dynamic range associated with such data, while others require large biological effect sizes to produce meaningful models. Here, we (i) perform a comprehensive benchmarking of multi-omics data analysis tools, and (ii) introduce kurtosis-based projection pursuit analysis, augmented with classification and regression trees (kPPA-CART) as a robust, easy-to-implement alternative. Using ground truth data, we demonstrate that kPPA-CART exhibits superiority in inferring biological significance from low-intensity (low-count) features and studies with small biological effect sizes. Applying it to experimental breast cancer data from The Cancer Genome Atlas, we identify novel genes that cluster the samples into subtypes that mimic the canonical PAM50 classes with notable improvements. Validating with external metastatic breast cancer data from the AURORA US consortium, kPPA-CART identifies genes that are associated with poor event-free survival and additional clustering associated with increased tumor mutational burden. Finally, we provide an R package and an online implementation of kPPA-CART.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"3 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Augmented kurtosis-based projection pursuit: a novel, advanced machine learning approach for multi-omics data analysis and integration\",\"authors\":\"Fabian Bong, Ibrahim Ahmed, Nithya Ramakrishnan, Karla N Valenzuela-Valderas, Peter D Wentzell, Jasmine Barra, Tobias K Karakach\",\"doi\":\"10.1093/nar/gkaf844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the heterogeneity of multi-omics data, exacting their maximum information potential remains a challenge. Whereas some solutions have been offered, most cannot overcome the large linear dynamic range associated with such data, while others require large biological effect sizes to produce meaningful models. Here, we (i) perform a comprehensive benchmarking of multi-omics data analysis tools, and (ii) introduce kurtosis-based projection pursuit analysis, augmented with classification and regression trees (kPPA-CART) as a robust, easy-to-implement alternative. Using ground truth data, we demonstrate that kPPA-CART exhibits superiority in inferring biological significance from low-intensity (low-count) features and studies with small biological effect sizes. Applying it to experimental breast cancer data from The Cancer Genome Atlas, we identify novel genes that cluster the samples into subtypes that mimic the canonical PAM50 classes with notable improvements. Validating with external metastatic breast cancer data from the AURORA US consortium, kPPA-CART identifies genes that are associated with poor event-free survival and additional clustering associated with increased tumor mutational burden. Finally, we provide an R package and an online implementation of kPPA-CART.\",\"PeriodicalId\":19471,\"journal\":{\"name\":\"Nucleic Acids Research\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":13.1000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nucleic Acids Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/nar/gkaf844\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf844","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Augmented kurtosis-based projection pursuit: a novel, advanced machine learning approach for multi-omics data analysis and integration
Due to the heterogeneity of multi-omics data, exacting their maximum information potential remains a challenge. Whereas some solutions have been offered, most cannot overcome the large linear dynamic range associated with such data, while others require large biological effect sizes to produce meaningful models. Here, we (i) perform a comprehensive benchmarking of multi-omics data analysis tools, and (ii) introduce kurtosis-based projection pursuit analysis, augmented with classification and regression trees (kPPA-CART) as a robust, easy-to-implement alternative. Using ground truth data, we demonstrate that kPPA-CART exhibits superiority in inferring biological significance from low-intensity (low-count) features and studies with small biological effect sizes. Applying it to experimental breast cancer data from The Cancer Genome Atlas, we identify novel genes that cluster the samples into subtypes that mimic the canonical PAM50 classes with notable improvements. Validating with external metastatic breast cancer data from the AURORA US consortium, kPPA-CART identifies genes that are associated with poor event-free survival and additional clustering associated with increased tumor mutational burden. Finally, we provide an R package and an online implementation of kPPA-CART.
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.