NAR Genomics and Bioinformatics最新文献

筛选
英文 中文
Optimal Representative Strain selector-a comprehensive pipeline for selecting next-generation reference strains of bacterial species.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae173
Chiara Tarracchini, Federico Fontana, Silvia Petraro, Gabriele Andrea Lugli, Leonardo Mancabelli, Francesca Turroni, Marco Ventura, Christian Milani
{"title":"Optimal Representative Strain selector-a comprehensive pipeline for selecting next-generation reference strains of bacterial species.","authors":"Chiara Tarracchini, Federico Fontana, Silvia Petraro, Gabriele Andrea Lugli, Leonardo Mancabelli, Francesca Turroni, Marco Ventura, Christian Milani","doi":"10.1093/nargab/lqae173","DOIUrl":"10.1093/nargab/lqae173","url":null,"abstract":"<p><p>Although it is common practice to use historically established 'reference strains' or 'type strains' for laboratory experiments, this approach often overlooks how effectively these strains represent the full ecological, genetic and functional diversity of the species within a specific ecological niche. In this context, this study proposes the Optimal Representative Strain (ORS) selector tool (https://zenodo.org/doi/10.5281/zenodo.13772191), an innovative bioinformatic pipeline capable of evaluating how a strain represents its whole species from a genetic and functional perspective, in addition to considering its ecological distribution in a particular ecological niche. Based on publicly available genomes, the strain that best fits all these three microbiological aspects is designated as an optimal representative strain. Moreover, a user-friendly software called Local Alternative Optimal Representative Strain selector was developed to allow researchers to screen their local library of bacterial strains for an optimal available alternative based on the reference optimal representative strain. Five different bacterial species, i.e. <i>Lacticaseibacillus paracasei</i>, <i>Lactobacillus delbrueckii</i>, <i>Streptococcus thermophilus</i>, <i>Bacteroides thetaiotaomicron</i> and <i>Lactococcus lactis</i>, were tested in three different environments to evaluate the performance of the bioinformatic pipeline in selecting optimal representative strains.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae173"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TSS-Captur: a user-friendly pipeline for characterizing unclassified RNA transcripts.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae168
Mathias Witte Paz, Thomas Vogel, Kay Nieselt
{"title":"TSS-Captur: a user-friendly pipeline for characterizing unclassified RNA transcripts.","authors":"Mathias Witte Paz, Thomas Vogel, Kay Nieselt","doi":"10.1093/nargab/lqae168","DOIUrl":"10.1093/nargab/lqae168","url":null,"abstract":"<p><p>RNA-seq and its 5'-enrichment methods for prokaryotes have enabled the precise identification of transcription start sites (TSSs), improving gene expression analysis. Computational methods are applied to these data to identify TSSs and classify them based on proximal annotated genes. While some TSSs cannot be classified at all (orphan TSSs), other TSSs are found on the reverse strand of known genes (antisense TSSs) but are not associated with the direct transcription of any known gene. Here, we introduce TSS-Captur, a novel pipeline, which uses computational approaches to characterize genomic regions starting from experimentally confirmed but unclassified TSSs. By analyzing TSS data, TSS-Captur characterizes unclassified signals, complementing prokaryotic genome annotation tools. TSS-Captur categorizes extracted transcripts as either messenger RNA for genes with coding potential or non-coding RNA (ncRNA) for non-translated genes. Additionally, it predicts the transcription termination site for each putative transcript. For ncRNA genes, the secondary structure is computed. Moreover, all putative promoter regions are analyzed to identify enriched motifs. An interactive report allows seamless data exploration. We validated TSS-Captur with a <i>Campylobacter jejuni</i> dataset and characterized unlabeled ncRNAs in <i>Streptomyces coelicolor</i>. TSS-Captur is available both as a web-application and as a command-line tool.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae168"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655288/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GRAViTy-V2: a grounded viral taxonomy application.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae183
Richard Mayne, Pakorn Aiewsakun, Dann Turner, Evelien M Adriaenssens, Peter Simmonds
{"title":"GRAViTy-V2: a grounded viral taxonomy application.","authors":"Richard Mayne, Pakorn Aiewsakun, Dann Turner, Evelien M Adriaenssens, Peter Simmonds","doi":"10.1093/nargab/lqae183","DOIUrl":"10.1093/nargab/lqae183","url":null,"abstract":"<p><p>Taxonomic classification of viruses is essential for understanding their evolution. Genomic classification of viruses at higher taxonomic ranks, such as order or phylum, is typically based on alignment and comparison of amino acid sequence motifs in conserved genes. Classification at lower taxonomic ranks, such as genus or species, is usually based on nucleotide sequence identities between genomic sequences. Building on our whole-genome analytical classification framework, we here describe Genome Relationships Applied to Viral Taxonomy Version 2 (GRAViTy-V2), which encompasses a greatly expanded range of features and numerous optimisations, packaged as an application that may be used as a general-purpose virus classification tool. Using 28 datasets derived from the ICTV 2022 taxonomy proposals, GRAViTy-V2 output was compared against human expert-curated classifications used for assignments in the 2023 round of ICTV taxonomy changes. GRAViTy-V2 produced taxonomies equivalent to manually-curated versions down to the family level and in almost all cases, to genus and species levels. The majority of discrepant results arose from errors in coding sequence annotations in INDSC records, or from inclusion of incomplete genome sequences in the analysis. Analysis times ranged from 1-506 min (median 3.59) on datasets with 17-1004 genomes and mean genome length of 3000-1 000 000 bases.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae183"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655284/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ReAlign-N: an integrated realignment approach for multiple nucleic acid sequence alignment, combining global and local realignments.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae170
Yixiao Zhai, Tong Zhou, Yanming Wei, Quan Zou, Yansu Wang
{"title":"ReAlign-N: an integrated realignment approach for multiple nucleic acid sequence alignment, combining global and local realignments.","authors":"Yixiao Zhai, Tong Zhou, Yanming Wei, Quan Zou, Yansu Wang","doi":"10.1093/nargab/lqae170","DOIUrl":"10.1093/nargab/lqae170","url":null,"abstract":"<p><p>Ensuring accurate multiple sequence alignment (MSA) is essential for comprehensive biological sequence analysis. However, the complexity of evolutionary relationships often results in variations that generic alignment tools may not adequately address. Realignment is crucial to remedy this issue. Currently, there is a lack of realignment methods tailored for nucleic acid sequences, particularly for lengthy sequences. Thus, there's an urgent need for the development of realignment methods better suited to address these challenges. This study presents ReAlign-N, a realignment method explicitly designed for multiple nucleic acid sequence alignment. ReAlign-N integrates both global and local realignment strategies for improved accuracy. In the global realignment phase, ReAlign-N incorporates K-Band and innovative memory-saving technology into the dynamic programming approach, ensuring high efficiency and minimal memory requirements for large-scale realignment tasks. The local realignment stage employs full matching and entropy scoring methods to identify low-quality regions and conducts realignment through MAFFT. Experimental results demonstrate that ReAlign-N consistently outperforms initial alignments on simulated and real datasets. Furthermore, compared to ReformAlign, the only existing multiple nucleic acid sequence realignment tool, ReAlign-N, exhibits shorter running times and occupies less memory space. The source code and test data for ReAlign-N are available on GitHub (https://github.com/malabz/ReAlign-N).</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae170"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximate nearest neighbor graph provides fast and efficient embedding with applications for large-scale biological data. 近似近邻图提供快速高效的嵌入,可应用于大规模生物数据。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae172
Jianshu Zhao, Jean Pierre Both, Konstantinos T Konstantinidis
{"title":"Approximate nearest neighbor graph provides fast and efficient embedding with applications for large-scale biological data.","authors":"Jianshu Zhao, Jean Pierre Both, Konstantinos T Konstantinidis","doi":"10.1093/nargab/lqae172","DOIUrl":"10.1093/nargab/lqae172","url":null,"abstract":"<p><p>Dimension reduction (DR or embedding) algorithms such as t-SNE and UMAP have many applications in big data visualization but remain slow for large datasets. Here, we further improve the UMAP-like algorithms by (i) combining several aspects of t-SNE and UMAP to create a new DR algorithm; (ii) replacing its rate-limiting step, the K-nearest neighbor graph (K-NNG), with a Hierarchical Navigable Small World (HNSW) graph; and (iii) extending the functionality to DNA/RNA sequence data by combining HNSW with locality sensitive hashing algorithms (e.g. MinHash) for distance estimations among sequences. We also provide additional features including computation of local intrinsic dimension and hubness, which can reflect structures and properties of the underlying data that strongly affect the K-NNG accuracy, and thus the quality of the resulting embeddings. Our library, called annembed, is implemented, and fully parallelized in Rust and shows competitive accuracy compared to the popular UMAP-like algorithms. Additionally, we showcase the usefulness and scalability of our library with three real-world examples: visualizing a large-scale microbial genomic database, visualizing single-cell RNA sequencing data and metagenomic contig (or population) binning. Therefore, annembed can facilitate DR for several tasks for biological data analysis where distance computation is expensive or when there are millions to billions of data points to process.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae172"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiCrayon reveals distinct layers of multi-state 3D chromatin organization.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae182
Ben Nolan, Hannah L Harris, Achyuth Kalluchi, Timothy E Reznicek, Christopher T Cummings, M Jordan Rowley
{"title":"HiCrayon reveals distinct layers of multi-state 3D chromatin organization.","authors":"Ben Nolan, Hannah L Harris, Achyuth Kalluchi, Timothy E Reznicek, Christopher T Cummings, M Jordan Rowley","doi":"10.1093/nargab/lqae182","DOIUrl":"10.1093/nargab/lqae182","url":null,"abstract":"<p><p>Chromatin contact maps are often shown as 2D heatmaps and visually compared to 1D genomic data by simple juxtaposition. While common, this strategy is imprecise, placing the onus on the reader to align features with each other. To remedy this, we developed HiCrayon, an interactive tool that facilitates the integration of 3D chromatin organization maps and 1D datasets. This visualization method integrates data from genomic assays directly into the chromatin contact map by coloring interactions according to 1D signal. HiCrayon is implemented using R shiny and python to create a graphical user interface application, available in both web and containerized format to promote accessibility. We demonstrate the utility of HiCrayon in visualizing the effectiveness of compartment calling and the relationship between ChIP-seq and various features of chromatin organization. We also demonstrate the improved visualization of other 3D genomic phenomena, such as differences between loops associated with CTCF/cohesin versus those associated with H3K27ac. We then demonstrate HiCrayon's visualization of organizational changes that occur during differentiation and use HiCrayon to detect compartment patterns that cannot be assigned to either A or B compartments, revealing a distinct third chromatin compartment.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae182"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655295/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New developments for the Quest for Orthologs benchmark service.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-11 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae167
Adrian Altenhoff, Yannis Nevers, Vinh Tran, Dushyanth Jyothi, Maria Martin, Salvatore Cosentino, Sina Majidian, Marina Marcet-Houben, Diego Fuentes-Palacios, Emma Persson, Thomas Walsh, Odile Lecompte, Toni Gabaldón, Steven Kelly, Yanhui Hu, Wataru Iwasaki, Salvador Capella-Gutierrez, Christophe Dessimoz, Paul D Thomas, Ingo Ebersberger, Erik Sonnhammer
{"title":"New developments for the Quest for Orthologs benchmark service.","authors":"Adrian Altenhoff, Yannis Nevers, Vinh Tran, Dushyanth Jyothi, Maria Martin, Salvatore Cosentino, Sina Majidian, Marina Marcet-Houben, Diego Fuentes-Palacios, Emma Persson, Thomas Walsh, Odile Lecompte, Toni Gabaldón, Steven Kelly, Yanhui Hu, Wataru Iwasaki, Salvador Capella-Gutierrez, Christophe Dessimoz, Paul D Thomas, Ingo Ebersberger, Erik Sonnhammer","doi":"10.1093/nargab/lqae167","DOIUrl":"10.1093/nargab/lqae167","url":null,"abstract":"<p><p>The Quest for Orthologs (QfO) orthology benchmark service (https://orthology.benchmarkservice.org) hosts a wide range of standardized benchmarks for orthology inference evaluation. It is supported and maintained by the QfO consortium, and is used to gather ortholog predictions and to examine strengths and weaknesses of newly developed and existing orthology inference methods. The web server allows different inference methods to be compared in a standardized way using the same proteome data. The benchmark results are useful for developing new methods and can help researchers to guide their choice of orthology method for applications in comparative genomics and phylogenetic analysis. We here present a new release of the Orthology Benchmark Service with a new benchmark based on feature architecture similarity as well as updated reference proteomes. We further provide a meta-analysis of the public predictions from 18 different orthology assignment methods to reveal how they relate in terms of ortholog predictions and benchmark performance. These results can guide users of orthologs to the best suited method for their purpose.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae167"},"PeriodicalIF":4.0,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632614/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Water-mediated ribonucleotide-amino acid pairs and higher-order structures at the RNA-protein interface: analysis of the crystal structure database and a topological classification.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-11 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae161
Raman Jangra, John F Trant, Purshotam Sharma
{"title":"Water-mediated ribonucleotide-amino acid pairs and higher-order structures at the RNA-protein interface: analysis of the crystal structure database and a topological classification.","authors":"Raman Jangra, John F Trant, Purshotam Sharma","doi":"10.1093/nargab/lqae161","DOIUrl":"10.1093/nargab/lqae161","url":null,"abstract":"<p><p>Water is essential for the formation, stability and function of RNA-protein complexes. To delineate the structural role of water molecules in shaping the interactions between RNA and proteins, we comprehensively analyzed a dataset of 329 crystal structures of these complexes to identify water-mediated hydrogen-bonded contacts at RNA-protein interface. Our survey identified a total of 4963 water bridges. We then employed a graph theory-based approach to present a robust classification scheme, encompassing triplets, quartets and quintet bridging topologies, each further delineated into sub-topologies. The frequency of water bridges within each topology decreases with the increasing degree of water node, with simple triplet water bridges outnumbering the higher-order topologies. Overall, this analysis demonstrates the variety of water-mediated interactions and highlights the importance of water as not only the medium but also the organizing principle underlying biomolecular interactions. Further, our study emphasizes the functional significance of water-mediated interactions in RNA-protein complexes, and paving the way for exploring how these interactions operate in complex biological environments. Altogether, this understanding not only enhances insights into biomolecular dynamics but also informs the rational design of RNA-protein complexes, providing a framework for potential applications in biotechnology and therapeutics. All the scripts, and data are available at <i>https://github.com/PSCPU/waterbridges</i>.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae161"},"PeriodicalIF":4.0,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632616/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico and in vivo analyses of a novel variant in MYO6 identified in a family with postlingual non-syndromic hearing loss from Argentina.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-11 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae162
Paula I Buonfiglio, Carlos D Bruque, Lucía Salatino, Vanesa Lotersztein, Mariela Pace, Sofia Grinberg, Ana B Elgoyhen, Paola V Plazas, Viviana Dalamón
{"title":"<i>In silico</i> and <i>in vivo</i> analyses of a novel variant in <i>MYO</i>6 identified in a family with postlingual non-syndromic hearing loss from Argentina.","authors":"Paula I Buonfiglio, Carlos D Bruque, Lucía Salatino, Vanesa Lotersztein, Mariela Pace, Sofia Grinberg, Ana B Elgoyhen, Paola V Plazas, Viviana Dalamón","doi":"10.1093/nargab/lqae162","DOIUrl":"10.1093/nargab/lqae162","url":null,"abstract":"<p><p>Hereditary hearing loss stands as the most prevalent sensory disorder, with over 124 non-syndromic genes and approximately 400 syndromic forms of deafness identified in humans. The clinical presentation of these conditions spans a spectrum, ranging from mild to profound hearing loss. The aim of this study was to identify the genetic cause of hearing loss in a family and functionally validate a novel variant identified in the <i>MYO</i>6 gene. After Whole Exome Sequencing analysis, the variant c.2775G>C p.Arg925Ser in <i>MYO</i>6 was detected in a family with postlingual non-syndromic hearing loss. By protein modeling a change in the electrostatic charge of the single alpha helix domain surface was revealed. Through a knockdown phenotype rescue assay in zebrafish, the detrimental effects of the identified variant on the auditory system was determined. These findings underscore the significance of a comprehensive approach, integrating both <i>in silico</i> and <i>in vivo</i> strategies, to ascertain the pathogenicity of this candidate variant. Such an approach has demonstrated its effectiveness in achieving an accurate genetic diagnosis and in promoting a more profound comprehension of the mechanisms that underlie the pathophysiology of hearing.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae162"},"PeriodicalIF":4.0,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632615/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GIN-TONIC: non-hierarchical full-text indexing for graph genomes.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-11 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae159
Ünsal Öztürk, Marco Mattavelli, Paolo Ribeca
{"title":"GIN-TONIC: non-hierarchical full-text indexing for graph genomes.","authors":"Ünsal Öztürk, Marco Mattavelli, Paolo Ribeca","doi":"10.1093/nargab/lqae159","DOIUrl":"10.1093/nargab/lqae159","url":null,"abstract":"<p><p>This paper presents a new data structure, GIN-TONIC (<b>G</b>raph <b>IN</b>dexing <b>T</b>hrough <b>O</b>ptimal <b>N</b>ear <b>I</b>nterval <b>C</b>ompaction), designed to index arbitrary string-labelled directed graphs representing, for instance, pangenomes or transcriptomes. GIN-TONIC provides several capabilities not offered by other graph-indexing methods based on the FM-Index. It is non-hierarchical, handling a graph as a monolithic object; it indexes at nucleotide resolution all possible walks in the graph without the need to explicitly store them; it supports exact substring queries in polynomial time and space for all possible walk roots in the graph, even if there are exponentially many walks corresponding to such roots. Specific ad-hoc optimizations, such as precomputed caches, allow GIN-TONIC to achieve excellent performance for input graphs of various topologies and sizes. Robust scalability capabilities and a querying performance close to that of a linear FM-Index are demonstrated for two real-world applications on the scale of human pangenomes and transcriptomes. Source code and associated benchmarks are available on GitHub.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae159"},"PeriodicalIF":4.0,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信