CorrAdjust unveils biologically relevant transcriptomic correlations by efficiently eliminating hidden confounders.

IF 16.6 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Nucleic Acids Research Pub Date : 2025-05-22 DOI:10.1093/nar/gkaf444

Stepan Nersisyan, Phillipe Loher, Isidore Rigoutsos

{"title":"CorrAdjust unveils biologically relevant transcriptomic correlations by efficiently eliminating hidden confounders.","authors":"Stepan Nersisyan, Phillipe Loher, Isidore Rigoutsos","doi":"10.1093/nar/gkaf444","DOIUrl":null,"url":null,"abstract":"<p><p>Correcting for confounding variables is often overlooked when computing RNA-RNA correlations, even though it can profoundly affect results. We introduce CorrAdjust, a method for identifying and correcting such hidden confounders. CorrAdjust selects a subset of principal components to residualize from expression data by maximizing the enrichment of \"reference pairs\" among highly correlated RNA-RNA pairs. Unlike traditional machine learning metrics, this novel enrichment-based metric is specifically designed to evaluate correlation data and provides valuable RNA-level interpretability. CorrAdjust outperforms current state-of-the-art methods when evaluated on 25 063 human RNA-seq datasets from The Cancer Genome Atlas, the Genotype-Tissue Expression project, and the Geuvadis collection. In particular, CorrAdjust excels at integrating small RNA and mRNA sequencing data, significantly enhancing the enrichment of experimentally validated miRNA targets among negatively correlated miRNA-mRNA pairs. CorrAdjust, with accompanying documentation and tutorials, is available at https://tju-cmc-org.github.io/CorrAdjust.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"53 10","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125544/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf444","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Correcting for confounding variables is often overlooked when computing RNA-RNA correlations, even though it can profoundly affect results. We introduce CorrAdjust, a method for identifying and correcting such hidden confounders. CorrAdjust selects a subset of principal components to residualize from expression data by maximizing the enrichment of "reference pairs" among highly correlated RNA-RNA pairs. Unlike traditional machine learning metrics, this novel enrichment-based metric is specifically designed to evaluate correlation data and provides valuable RNA-level interpretability. CorrAdjust outperforms current state-of-the-art methods when evaluated on 25 063 human RNA-seq datasets from The Cancer Genome Atlas, the Genotype-Tissue Expression project, and the Geuvadis collection. In particular, CorrAdjust excels at integrating small RNA and mRNA sequencing data, significantly enhancing the enrichment of experimentally validated miRNA targets among negatively correlated miRNA-mRNA pairs. CorrAdjust, with accompanying documentation and tutorials, is available at https://tju-cmc-org.github.io/CorrAdjust.

查看原文本刊更多论文

CorrAdjust通过有效地消除隐藏的混杂因素揭示生物学相关的转录组相关性。

在计算RNA-RNA相关性时，校正混杂变量常常被忽略，尽管它会对结果产生深远的影响。我们介绍了CorrAdjust，一种识别和纠正这些隐藏混杂因素的方法。CorrAdjust通过在高度相关的RNA-RNA对中最大限度地富集“参考对”，从表达数据中选择一个主成分子集进行残差处理。与传统的机器学习指标不同，这种新颖的基于丰富度的指标专门用于评估相关数据，并提供有价值的rna水平的可解释性。当对来自癌症基因组图谱、基因型组织表达项目和Geuvadis收集的25063个人类RNA-seq数据集进行评估时，CorrAdjust优于目前最先进的方法。特别是，CorrAdjust擅长整合小RNA和mRNA测序数据，显著增强了负相关miRNA-mRNA对中实验验证的miRNA靶点的富集。CorrAdjust及其附带的文档和教程可在https://tju-cmc-org.github.io/CorrAdjust上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nucleic Acids Research 生物-生化与分子生物学

CiteScore

27.10

自引率

4.70%

发文量

1057

审稿时长

2 months

期刊介绍： Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.