Rnalib: a Python library for custom transcriptomics analyses.

Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI:10.1093/bioinformatics/btae751

Niko Popitsch, Stefan L Ameres

{"title":"Rnalib: a Python library for custom transcriptomics analyses.","authors":"Niko Popitsch, Stefan L Ameres","doi":"10.1093/bioinformatics/btae751","DOIUrl":null,"url":null,"abstract":"Motivation: The efficient and reproducible analysis of high-throughput sequencing datasets necessitates the development of methodical and robust computational pipelines that integrate established and bespoke bioinformatics analysis tools, often written in high-level programming languages such as Python. Despite the increasing availability of programming libraries for genomics, there is a noticeable lack of tools specifically focused on transcriptomics. Key tasks in this area include the association of gene features (e.g. transcript isoforms, introns or untranslated regions) with relevant subsections of (large) genomics datasets across diverse data formats, as well as efficient querying of these data based on genomic locations and annotation attributes.Results: To address the needs of transcriptomics data analyses, we developed rnalib, a Python library designed for creating custom bioinformatics analysis methods. Built on existing Python libraries like pysam and pyBigWig, rnalib offers random access support, enabling efficient access to relevant subregions of large, genome-wide datasets. Rnalib extends the filtering and access capabilities of these libraries and includes additional checks to prevent common errors when integrating genomics datasets. The library is centred on an object-oriented Transcriptome class that provides methods for stepwise annotation of gene features with both, local and remote data sources. The rnalib Application Programming Interface cleanly separates immutable genomic locations from associated, mutable data, and offers a wide range of methods for iterating, querying, and exporting collated datasets. Rnalib establishes a fast, readable, reproducible, and robust framework for developing novel transcriptomics data analysis tools and methods.Availability and implementation: Source code, documentation, and tutorials are available at https://github.com/popitsch/rnalib.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734754/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: The efficient and reproducible analysis of high-throughput sequencing datasets necessitates the development of methodical and robust computational pipelines that integrate established and bespoke bioinformatics analysis tools, often written in high-level programming languages such as Python. Despite the increasing availability of programming libraries for genomics, there is a noticeable lack of tools specifically focused on transcriptomics. Key tasks in this area include the association of gene features (e.g. transcript isoforms, introns or untranslated regions) with relevant subsections of (large) genomics datasets across diverse data formats, as well as efficient querying of these data based on genomic locations and annotation attributes.

Results: To address the needs of transcriptomics data analyses, we developed rnalib, a Python library designed for creating custom bioinformatics analysis methods. Built on existing Python libraries like pysam and pyBigWig, rnalib offers random access support, enabling efficient access to relevant subregions of large, genome-wide datasets. Rnalib extends the filtering and access capabilities of these libraries and includes additional checks to prevent common errors when integrating genomics datasets. The library is centred on an object-oriented Transcriptome class that provides methods for stepwise annotation of gene features with both, local and remote data sources. The rnalib Application Programming Interface cleanly separates immutable genomic locations from associated, mutable data, and offers a wide range of methods for iterating, querying, and exporting collated datasets. Rnalib establishes a fast, readable, reproducible, and robust framework for developing novel transcriptomics data analysis tools and methods.

Availability and implementation: Source code, documentation, and tutorials are available at https://github.com/popitsch/rnalib.

查看原文本刊更多论文

Rnalib：用于自定义转录组学分析的Python库。

动机：高通量测序数据集的高效和可重复分析需要有系统和强大的计算管道的发展，这些计算管道集成了已建立和定制的生物信息学分析工具，通常用高级编程语言（如Python）编写。尽管基因组学的编程库越来越多，但转录组学方面的工具明显缺乏。该领域的关键任务包括将基因特征（如转录异构体、内含子或未翻译区域）与跨不同数据格式的（大型）基因组学数据集的相关子集相关联，以及基于基因组位置和注释属性对这些数据进行有效查询。结果：为了满足转录组学数据分析的需求，我们开发了rnalib，一个专门用于创建自定义生物信息学分析方法的Python库。rnalib基于现有的Python库（如pysam和pyBigWig），提供随机访问支持，能够有效访问大型全基因组数据集的相关子区域。Rnalib扩展了这些库的过滤和访问功能，并包括额外的检查，以防止集成基因组学数据集时出现常见错误。该库以面向对象的转录组类为中心，该类提供了使用本地和远程数据源逐步注释基因特征的方法。rnalib API将不可变的基因组位置与相关的可变数据清晰地分离开来，并提供了一系列迭代、查询和导出已整理数据集的方法。Rnalib为开发新的转录组学数据分析工具和方法建立了一个快速、可读、可重复和强大的框架。可用性：源代码、文档和教程可在https://github.com/popitsch/rnalib.Supplementary上获得：补充数据可在Bioinformatics在线上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量