GEDI:为生物信息学应用整合来自多个平台的转录组数据的 R 软件包。

Mathias N. Stokholm, Maria B. Rabaglino, Haja N. Kadarmideen
{"title":"GEDI:为生物信息学应用整合来自多个平台的转录组数据的 R 软件包。","authors":"Mathias N. Stokholm,&nbsp;Maria B. Rabaglino,&nbsp;Haja N. Kadarmideen","doi":"10.1002/cpz1.70046","DOIUrl":null,"url":null,"abstract":"<p>Transcriptomic data is often expensive and difficult to generate in large cohorts relative to genomic data; therefore, it is often important to integrate multiple transcriptomic datasets from both microarray- and next generation sequencing (NGS)-based transcriptomic data across similar experiments or clinical trials to improve analytical power and discovery of novel transcripts and genes. However, transcriptomic data integration presents a few challenges including reannotation and batch effect removal. We developed the Gene Expression Data Integration (GEDI) R package to enable transcriptomic data integration by combining existing R packages. With just four functions, the GEDI R package makes constructing a transcriptomic data integration pipeline straightforward. Together, the functions overcome the complications in transcriptomic data integration by automatically reannotating the data and removing the batch effect. The removal of the batch effect is verified with principal component analysis and the data integration is verified using a logistic regression model with forward stepwise feature selection. To demonstrate the functionalities of the GEDI package, we integrated five bovine endometrial transcriptomic datasets from the NCBI Gene Expression Omnibus. These transcriptomic datasets were from multiple high-throughput platforms, namely, array-based Affymetrix and Agilent platforms, and NGS-based Illumina paired-end RNA-seq platform. Furthermore, we compared the GEDI package to existing tools and found that GEDI is the only tool that provides a full transcriptomic data integration pipeline including verification of both batch effect removal and data integration for downstream genomic and bioinformatics applications. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC.</p><p><b>Basic Protocol 1</b>: ReadGE, a function to import gene expression datasets</p><p><b>Basic Protocol 2</b>: GEDI, a function to reannotate and merge gene expression datasets</p><p><b>Basic Protocol 3</b>: BatchCorrection, a function to remove batch effects from gene expression data</p><p><b>Basic Protocol 4</b>: VerifyGEDI, a function to confirm successful integration of gene expression data</p>","PeriodicalId":93970,"journal":{"name":"Current protocols","volume":"4 10","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpz1.70046","citationCount":"0","resultStr":"{\"title\":\"GEDI: An R Package for Integration of Transcriptomic Data from Multiple Platforms for Bioinformatics Applications\",\"authors\":\"Mathias N. Stokholm,&nbsp;Maria B. Rabaglino,&nbsp;Haja N. Kadarmideen\",\"doi\":\"10.1002/cpz1.70046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Transcriptomic data is often expensive and difficult to generate in large cohorts relative to genomic data; therefore, it is often important to integrate multiple transcriptomic datasets from both microarray- and next generation sequencing (NGS)-based transcriptomic data across similar experiments or clinical trials to improve analytical power and discovery of novel transcripts and genes. However, transcriptomic data integration presents a few challenges including reannotation and batch effect removal. We developed the Gene Expression Data Integration (GEDI) R package to enable transcriptomic data integration by combining existing R packages. With just four functions, the GEDI R package makes constructing a transcriptomic data integration pipeline straightforward. Together, the functions overcome the complications in transcriptomic data integration by automatically reannotating the data and removing the batch effect. The removal of the batch effect is verified with principal component analysis and the data integration is verified using a logistic regression model with forward stepwise feature selection. To demonstrate the functionalities of the GEDI package, we integrated five bovine endometrial transcriptomic datasets from the NCBI Gene Expression Omnibus. These transcriptomic datasets were from multiple high-throughput platforms, namely, array-based Affymetrix and Agilent platforms, and NGS-based Illumina paired-end RNA-seq platform. Furthermore, we compared the GEDI package to existing tools and found that GEDI is the only tool that provides a full transcriptomic data integration pipeline including verification of both batch effect removal and data integration for downstream genomic and bioinformatics applications. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC.</p><p><b>Basic Protocol 1</b>: ReadGE, a function to import gene expression datasets</p><p><b>Basic Protocol 2</b>: GEDI, a function to reannotate and merge gene expression datasets</p><p><b>Basic Protocol 3</b>: BatchCorrection, a function to remove batch effects from gene expression data</p><p><b>Basic Protocol 4</b>: VerifyGEDI, a function to confirm successful integration of gene expression data</p>\",\"PeriodicalId\":93970,\"journal\":{\"name\":\"Current protocols\",\"volume\":\"4 10\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpz1.70046\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current protocols\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpz1.70046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current protocols","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpz1.70046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

与基因组数据相比,转录组数据通常成本高昂,而且难以在大型队列中生成;因此,通常需要整合类似实验或临床试验中基于微阵列和新一代测序(NGS)的多个转录组数据集,以提高分析能力并发现新的转录本和基因。然而,转录组数据整合面临着一些挑战,包括重新注释和批次效应去除。我们开发了基因表达数据整合(GEDI)R软件包,通过结合现有的R软件包实现转录组数据整合。GEDI R软件包仅有四个函数,使构建转录组数据整合管道变得简单易行。这些函数通过自动重新标注数据和消除批次效应,克服了转录组数据整合的复杂性。批次效应的消除通过主成分分析进行了验证,数据整合则通过带有前向逐步特征选择的逻辑回归模型进行了验证。为了展示 GEDI 软件包的功能,我们整合了 NCBI 基因表达总库中的五个牛子宫内膜转录组数据集。这些转录组数据集来自多个高通量平台,即基于阵列的 Affymetrix 和 Agilent 平台,以及基于 NGS 的 Illumina 成对端 RNA-seq 平台。此外,我们还将 GEDI 软件包与现有工具进行了比较,发现 GEDI 是唯一能提供完整转录组数据整合流水线的工具,包括批次效应去除验证和下游基因组学与生物信息学应用的数据整合。© 2024 作者。当前协议》由 Wiley Periodicals LLC 出版。基本协议 1:ReadGE,一个导入基因表达数据集的函数 基本协议 2:GEDI,一个重新标注和合并基因表达数据集的函数 基本协议 3:BatchCorrection,一个从基因表达数据中去除批次效应的函数 基本协议 4:VerifyGEDI,一个确认基因表达数据成功整合的函数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

GEDI: An R Package for Integration of Transcriptomic Data from Multiple Platforms for Bioinformatics Applications

GEDI: An R Package for Integration of Transcriptomic Data from Multiple Platforms for Bioinformatics Applications

Transcriptomic data is often expensive and difficult to generate in large cohorts relative to genomic data; therefore, it is often important to integrate multiple transcriptomic datasets from both microarray- and next generation sequencing (NGS)-based transcriptomic data across similar experiments or clinical trials to improve analytical power and discovery of novel transcripts and genes. However, transcriptomic data integration presents a few challenges including reannotation and batch effect removal. We developed the Gene Expression Data Integration (GEDI) R package to enable transcriptomic data integration by combining existing R packages. With just four functions, the GEDI R package makes constructing a transcriptomic data integration pipeline straightforward. Together, the functions overcome the complications in transcriptomic data integration by automatically reannotating the data and removing the batch effect. The removal of the batch effect is verified with principal component analysis and the data integration is verified using a logistic regression model with forward stepwise feature selection. To demonstrate the functionalities of the GEDI package, we integrated five bovine endometrial transcriptomic datasets from the NCBI Gene Expression Omnibus. These transcriptomic datasets were from multiple high-throughput platforms, namely, array-based Affymetrix and Agilent platforms, and NGS-based Illumina paired-end RNA-seq platform. Furthermore, we compared the GEDI package to existing tools and found that GEDI is the only tool that provides a full transcriptomic data integration pipeline including verification of both batch effect removal and data integration for downstream genomic and bioinformatics applications. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC.

Basic Protocol 1: ReadGE, a function to import gene expression datasets

Basic Protocol 2: GEDI, a function to reannotate and merge gene expression datasets

Basic Protocol 3: BatchCorrection, a function to remove batch effects from gene expression data

Basic Protocol 4: VerifyGEDI, a function to confirm successful integration of gene expression data

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信