Reproducible processing of TCGA regulatory networks.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Viola Fanfani, Katherine H Shutta, Panagiotis Mandros, Jonas Fischer, Enakshi Saha, Soel Micheletti, Chen Chen, Marouen Ben Guebila, Camila M Lopes-Ramos, John Quackenbush
{"title":"Reproducible processing of TCGA regulatory networks.","authors":"Viola Fanfani, Katherine H Shutta, Panagiotis Mandros, Jonas Fischer, Enakshi Saha, Soel Micheletti, Chen Chen, Marouen Ben Guebila, Camila M Lopes-Ramos, John Quackenbush","doi":"10.1093/gigascience/giaf126","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Technological advances in sequencing and computation have allowed deep exploration of the molecular basis of diseases. Biological networks have proven to be a valuable framework for analyzing omics data and modeling regulatory interactions between genes and proteins. Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have provided a rich resource for building and validating new computational methods, resulting in a plethora of open-source software for downloading, pre-processing, and analyzing those data. However, for an end-to-end analysis of regulatory networks, a coherent and reusable workflow is essential to integrate all relevant packages into a robust pipeline.</p><p><strong>Findings: </strong>We developed tcga-data-nf, a Nextflow workflow that allows users to reproducibly infer regulatory networks from the thousands of samples in TCGA using a single command. The workflow can be divided into three main steps: multi-omic data, such as RNA-seq and methylation, are (i) downloaded, (ii) pre-processed, and (iii) analyzed to infer regulatory network models with the Network Zoo. The workflow is powered by the NetworkDataCompanion R package, a standalone collection of functions for managing, mapping, and filtering TCGA data. Here, we demonstrate how the pipeline can be used to investigate the differences between colon cancer subtypes attributed to epigenetic mechanisms. Lastly, we provide a database of pre-generated networks for the 10 most common cancer types that can be readily accessed by the public.</p><p><strong>Conclusions: </strong>tcga-data-nf is a complete, yet flexible and extensible, framework that enables the reproducible inference and analysis of cancer regulatory networks, bridging a gap in the current universe of software tools for analyzing TCGA data.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf126","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Technological advances in sequencing and computation have allowed deep exploration of the molecular basis of diseases. Biological networks have proven to be a valuable framework for analyzing omics data and modeling regulatory interactions between genes and proteins. Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have provided a rich resource for building and validating new computational methods, resulting in a plethora of open-source software for downloading, pre-processing, and analyzing those data. However, for an end-to-end analysis of regulatory networks, a coherent and reusable workflow is essential to integrate all relevant packages into a robust pipeline.

Findings: We developed tcga-data-nf, a Nextflow workflow that allows users to reproducibly infer regulatory networks from the thousands of samples in TCGA using a single command. The workflow can be divided into three main steps: multi-omic data, such as RNA-seq and methylation, are (i) downloaded, (ii) pre-processed, and (iii) analyzed to infer regulatory network models with the Network Zoo. The workflow is powered by the NetworkDataCompanion R package, a standalone collection of functions for managing, mapping, and filtering TCGA data. Here, we demonstrate how the pipeline can be used to investigate the differences between colon cancer subtypes attributed to epigenetic mechanisms. Lastly, we provide a database of pre-generated networks for the 10 most common cancer types that can be readily accessed by the public.

Conclusions: tcga-data-nf is a complete, yet flexible and extensible, framework that enables the reproducible inference and analysis of cancer regulatory networks, bridging a gap in the current universe of software tools for analyzing TCGA data.

TCGA调控网络的可重复处理。
背景:测序和计算技术的进步使得深入探索疾病的分子基础成为可能。生物网络已被证明是分析组学数据和模拟基因和蛋白质之间调节相互作用的有价值的框架。大型合作项目,如癌症基因组图谱(TCGA),为构建和验证新的计算方法提供了丰富的资源,导致大量开源软件用于下载、预处理和分析这些数据。然而,对于监管网络的端到端分析,一个连贯和可重用的工作流对于将所有相关包集成到一个强大的管道中至关重要。研究结果:我们开发了TCGA -data-nf,这是一个Nextflow工作流,允许用户使用单个命令从TCGA中的数千个样本中可重复地推断出监管网络。工作流程可分为三个主要步骤:多组学数据,如RNA-seq和甲基化,被(i)下载,(ii)预处理,(iii)分析,以推断与网络动物园监管网络模型。工作流由NetworkDataCompanion R包提供支持,这是一个独立的功能集合,用于管理、映射和过滤TCGA数据。在这里,我们展示了如何使用管道来研究归因于表观遗传机制的结肠癌亚型之间的差异。最后,我们为10种最常见的癌症类型提供了一个预先生成的网络数据库,可供公众随时访问。结论:TCGA -data-nf是一个完整的、灵活的、可扩展的框架,它可以对癌症调控网络进行可重复的推断和分析,填补了目前用于分析TCGA数据的软件工具的空白。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信