{"title":"Rnalib:用于自定义转录组学分析的Python库。","authors":"Niko Popitsch, Stefan L Ameres","doi":"10.1093/bioinformatics/btae751","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The efficient and reproducible analysis of high-throughput sequencing datasets necessitates the development of methodical and robust computational pipelines that integrate established and bespoke bioinformatics analysis tools, often written in high-level programming languages such as Python. Despite the increasing availability of programming libraries for genomics, there is a noticeable lack of tools specifically focused on transcriptomics. Key tasks in this area include the association of gene features (e.g. transcript isoforms, introns or untranslated regions) with relevant subsections of (large) genomics datasets across diverse data formats, as well as efficient querying of these data based on genomic locations and annotation attributes.</p><p><strong>Results: </strong>To address the needs of transcriptomics data analyses, we developed rnalib, a Python library designed for creating custom bioinformatics analysis methods. Built on existing Python libraries like pysam and pyBigWig, rnalib offers random access support, enabling efficient access to relevant subregions of large, genome-wide datasets. Rnalib extends the filtering and access capabilities of these libraries and includes additional checks to prevent common errors when integrating genomics datasets. The library is centred on an object-oriented Transcriptome class that provides methods for stepwise annotation of gene features with both, local and remote data sources. The rnalib Application Programming Interface cleanly separates immutable genomic locations from associated, mutable data, and offers a wide range of methods for iterating, querying, and exporting collated datasets. Rnalib establishes a fast, readable, reproducible, and robust framework for developing novel transcriptomics data analysis tools and methods.</p><p><strong>Availability and implementation: </strong>Source code, documentation, and tutorials are available at https://github.com/popitsch/rnalib.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734754/pdf/","citationCount":"0","resultStr":"{\"title\":\"Rnalib: a Python library for custom transcriptomics analyses.\",\"authors\":\"Niko Popitsch, Stefan L Ameres\",\"doi\":\"10.1093/bioinformatics/btae751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>The efficient and reproducible analysis of high-throughput sequencing datasets necessitates the development of methodical and robust computational pipelines that integrate established and bespoke bioinformatics analysis tools, often written in high-level programming languages such as Python. Despite the increasing availability of programming libraries for genomics, there is a noticeable lack of tools specifically focused on transcriptomics. Key tasks in this area include the association of gene features (e.g. transcript isoforms, introns or untranslated regions) with relevant subsections of (large) genomics datasets across diverse data formats, as well as efficient querying of these data based on genomic locations and annotation attributes.</p><p><strong>Results: </strong>To address the needs of transcriptomics data analyses, we developed rnalib, a Python library designed for creating custom bioinformatics analysis methods. Built on existing Python libraries like pysam and pyBigWig, rnalib offers random access support, enabling efficient access to relevant subregions of large, genome-wide datasets. Rnalib extends the filtering and access capabilities of these libraries and includes additional checks to prevent common errors when integrating genomics datasets. The library is centred on an object-oriented Transcriptome class that provides methods for stepwise annotation of gene features with both, local and remote data sources. The rnalib Application Programming Interface cleanly separates immutable genomic locations from associated, mutable data, and offers a wide range of methods for iterating, querying, and exporting collated datasets. Rnalib establishes a fast, readable, reproducible, and robust framework for developing novel transcriptomics data analysis tools and methods.</p><p><strong>Availability and implementation: </strong>Source code, documentation, and tutorials are available at https://github.com/popitsch/rnalib.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734754/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rnalib: a Python library for custom transcriptomics analyses.
Motivation: The efficient and reproducible analysis of high-throughput sequencing datasets necessitates the development of methodical and robust computational pipelines that integrate established and bespoke bioinformatics analysis tools, often written in high-level programming languages such as Python. Despite the increasing availability of programming libraries for genomics, there is a noticeable lack of tools specifically focused on transcriptomics. Key tasks in this area include the association of gene features (e.g. transcript isoforms, introns or untranslated regions) with relevant subsections of (large) genomics datasets across diverse data formats, as well as efficient querying of these data based on genomic locations and annotation attributes.
Results: To address the needs of transcriptomics data analyses, we developed rnalib, a Python library designed for creating custom bioinformatics analysis methods. Built on existing Python libraries like pysam and pyBigWig, rnalib offers random access support, enabling efficient access to relevant subregions of large, genome-wide datasets. Rnalib extends the filtering and access capabilities of these libraries and includes additional checks to prevent common errors when integrating genomics datasets. The library is centred on an object-oriented Transcriptome class that provides methods for stepwise annotation of gene features with both, local and remote data sources. The rnalib Application Programming Interface cleanly separates immutable genomic locations from associated, mutable data, and offers a wide range of methods for iterating, querying, and exporting collated datasets. Rnalib establishes a fast, readable, reproducible, and robust framework for developing novel transcriptomics data analysis tools and methods.
Availability and implementation: Source code, documentation, and tutorials are available at https://github.com/popitsch/rnalib.