iSeq: an integrated tool to fetch public sequencing data.

Bioinformatics (Oxford, England) Pub Date : 2024-11-01 DOI:10.1093/bioinformatics/btae641

Haoyu Chao, Zhuojin Li, Dijun Chen, Ming Chen

引用次数: 0

Abstract

Motivation: High-throughput sequencing technologies [next-generation sequencing (NGS)] are increasingly used to address diverse biological questions. Despite the rich information in NGS data, particularly with the growing datasets from repositories like the Genome Sequence Archive (GSA) at NGDC, programmatic access to public sequencing data and metadata remains limited.

Results: We developed iSeq to enable quick and straightforward retrieval of metadata and NGS data from multiple databases via the command-line interface. iSeq supports simultaneous retrieval from GSA, SRA, ENA, and DDBJ databases. It handles over 25 different accession formats, supports Aspera downloads, parallel downloads, multi-threaded processes, FASTQ file merging, and integrity verification, simplifying data acquisition and enhancing the capacity for reanalyzing NGS data.

Availability and implementation: iSeq is freely available on Bioconda (https://anaconda.org/bioconda/iseq) and GitHub (https://github.com/BioOmics/iSeq).

查看原文本刊更多论文

iSeq：获取公共测序数据的集成工具。

动机：高通量测序技术（NGS）越来越多地被用于解决各种生物学问题。尽管 NGS 数据中包含丰富的信息，特别是来自 NGDC GSA 等资源库的数据集不断增加，但对公共测序数据和元数据的程序性访问仍然有限：iSeq 支持从 GSA、SRA、ENA 和 DDBJ 数据库同时检索。它可处理超过 25 种不同的入库格式，支持 Aspera 下载、并行下载、多线程处理、FASTQ 文件合并和完整性验证，从而简化了数据采集，提高了重新分析 NGS 数据的能力：ISeq 可在 Bioconda (https://anaconda.org/bioconda/iseq) 和 GitHub (https://github.com/BioOmics/iSeq) 上免费获取。补充信息：补充数据可在 Bioinformatics online 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量