iSeq: an integrated tool to fetch public sequencing data.

Haoyu Chao, Zhuojin Li, Dijun Chen, Ming Chen
{"title":"iSeq: an integrated tool to fetch public sequencing data.","authors":"Haoyu Chao, Zhuojin Li, Dijun Chen, Ming Chen","doi":"10.1093/bioinformatics/btae641","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>High-throughput sequencing technologies [next-generation sequencing (NGS)] are increasingly used to address diverse biological questions. Despite the rich information in NGS data, particularly with the growing datasets from repositories like the Genome Sequence Archive (GSA) at NGDC, programmatic access to public sequencing data and metadata remains limited.</p><p><strong>Results: </strong>We developed iSeq to enable quick and straightforward retrieval of metadata and NGS data from multiple databases via the command-line interface. iSeq supports simultaneous retrieval from GSA, SRA, ENA, and DDBJ databases. It handles over 25 different accession formats, supports Aspera downloads, parallel downloads, multi-threaded processes, FASTQ file merging, and integrity verification, simplifying data acquisition and enhancing the capacity for reanalyzing NGS data.</p><p><strong>Availability and implementation: </strong>iSeq is freely available on Bioconda (https://anaconda.org/bioconda/iseq) and GitHub (https://github.com/BioOmics/iSeq).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11561040/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: High-throughput sequencing technologies [next-generation sequencing (NGS)] are increasingly used to address diverse biological questions. Despite the rich information in NGS data, particularly with the growing datasets from repositories like the Genome Sequence Archive (GSA) at NGDC, programmatic access to public sequencing data and metadata remains limited.

Results: We developed iSeq to enable quick and straightforward retrieval of metadata and NGS data from multiple databases via the command-line interface. iSeq supports simultaneous retrieval from GSA, SRA, ENA, and DDBJ databases. It handles over 25 different accession formats, supports Aspera downloads, parallel downloads, multi-threaded processes, FASTQ file merging, and integrity verification, simplifying data acquisition and enhancing the capacity for reanalyzing NGS data.

Availability and implementation: iSeq is freely available on Bioconda (https://anaconda.org/bioconda/iseq) and GitHub (https://github.com/BioOmics/iSeq).

iSeq:获取公共测序数据的集成工具。
动机:高通量测序技术(NGS)越来越多地被用于解决各种生物学问题。尽管 NGS 数据中包含丰富的信息,特别是来自 NGDC GSA 等资源库的数据集不断增加,但对公共测序数据和元数据的程序性访问仍然有限:iSeq 支持从 GSA、SRA、ENA 和 DDBJ 数据库同时检索。它可处理超过 25 种不同的入库格式,支持 Aspera 下载、并行下载、多线程处理、FASTQ 文件合并和完整性验证,从而简化了数据采集,提高了重新分析 NGS 数据的能力:ISeq 可在 Bioconda (https://anaconda.org/bioconda/iseq) 和 GitHub (https://github.com/BioOmics/iSeq) 上免费获取。补充信息:补充数据可在 Bioinformatics online 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信