{"title":"聚集的ret3 RNA-seq数据提高了共识和组织特异性基因共表达网络的推断","authors":"Prashanthi Ravichandran, Princy Parsana, Rebecca Keener, Kasper Hansen, Alexis Battle","doi":"10.1101/gr.280808.125","DOIUrl":null,"url":null,"abstract":"Gene coexpression networks (GCNs) describe relationships among genes that maintain cellular identity and homeostasis. However, typical RNA-seq experiments often lack sufficient sample sizes for reliable GCN inference. Recount3, a dataset with 316,443 processed human RNA-seq samples, provides an opportunity to improve network reconstruction. However, GCN inference from public data is challenged by confounders and inconsistent labeling. To address this, we developed a pipeline to annotate samples based on cell type composition. By comparing aggregation strategies, we found that regressing confounders within studies and prioritizing larger studies optimized network reconstruction. We applied these findings to infer three consensus networks (universal, cancer, non-cancer) and 27 context-specific networks. Central genes in consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, while context-specific central nodes included tissue-specific transcription factors. The increased statistical power from data aggregation facilitated the derivation of variant annotations from context-specific networks, which were significantly enriched for complex-trait heritability independent of overlap with baseline functional genomic annotations. While data aggregation led to strictly increasing held-out log-likelihood, we observed diminishing marginal improvements, suggesting that integrating complementary modalities, such as Hi-C and ChIP-seq, could further refine network reconstruction. Our approach outlines best practices for GCN inference and highlights both the strengths and limitations of data aggregation.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"12 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Aggregation of recount3 RNA-seq data improves inference of consensus and tissue-specific gene coexpression networks\",\"authors\":\"Prashanthi Ravichandran, Princy Parsana, Rebecca Keener, Kasper Hansen, Alexis Battle\",\"doi\":\"10.1101/gr.280808.125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gene coexpression networks (GCNs) describe relationships among genes that maintain cellular identity and homeostasis. However, typical RNA-seq experiments often lack sufficient sample sizes for reliable GCN inference. Recount3, a dataset with 316,443 processed human RNA-seq samples, provides an opportunity to improve network reconstruction. However, GCN inference from public data is challenged by confounders and inconsistent labeling. To address this, we developed a pipeline to annotate samples based on cell type composition. By comparing aggregation strategies, we found that regressing confounders within studies and prioritizing larger studies optimized network reconstruction. We applied these findings to infer three consensus networks (universal, cancer, non-cancer) and 27 context-specific networks. Central genes in consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, while context-specific central nodes included tissue-specific transcription factors. The increased statistical power from data aggregation facilitated the derivation of variant annotations from context-specific networks, which were significantly enriched for complex-trait heritability independent of overlap with baseline functional genomic annotations. While data aggregation led to strictly increasing held-out log-likelihood, we observed diminishing marginal improvements, suggesting that integrating complementary modalities, such as Hi-C and ChIP-seq, could further refine network reconstruction. Our approach outlines best practices for GCN inference and highlights both the strengths and limitations of data aggregation.\",\"PeriodicalId\":12678,\"journal\":{\"name\":\"Genome research\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1101/gr.280808.125\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.280808.125","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Aggregation of recount3 RNA-seq data improves inference of consensus and tissue-specific gene coexpression networks
Gene coexpression networks (GCNs) describe relationships among genes that maintain cellular identity and homeostasis. However, typical RNA-seq experiments often lack sufficient sample sizes for reliable GCN inference. Recount3, a dataset with 316,443 processed human RNA-seq samples, provides an opportunity to improve network reconstruction. However, GCN inference from public data is challenged by confounders and inconsistent labeling. To address this, we developed a pipeline to annotate samples based on cell type composition. By comparing aggregation strategies, we found that regressing confounders within studies and prioritizing larger studies optimized network reconstruction. We applied these findings to infer three consensus networks (universal, cancer, non-cancer) and 27 context-specific networks. Central genes in consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, while context-specific central nodes included tissue-specific transcription factors. The increased statistical power from data aggregation facilitated the derivation of variant annotations from context-specific networks, which were significantly enriched for complex-trait heritability independent of overlap with baseline functional genomic annotations. While data aggregation led to strictly increasing held-out log-likelihood, we observed diminishing marginal improvements, suggesting that integrating complementary modalities, such as Hi-C and ChIP-seq, could further refine network reconstruction. Our approach outlines best practices for GCN inference and highlights both the strengths and limitations of data aggregation.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.