NAR Genomics and Bioinformatics最新文献_第10页

Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets. 通过整合大规模转录组数据集，植物表型预测得到了改善。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-27 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae184

Zefeng Wu, Yali Sun, Xiaoqiang Zhao, Zigang Liu, Wenqi Zhou, Yining Niu

{"title":"Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.","authors":"Zefeng Wu, Yali Sun, Xiaoqiang Zhao, Zigang Liu, Wenqi Zhou, Yining Niu","doi":"10.1093/nargab/lqae184","DOIUrl":"10.1093/nargab/lqae184","url":null,"abstract":"Research on the dynamic expression of genes in plants is important for understanding different biological processes. We used the large amounts of transcriptomic data from various plant sample sources that are publicly available to investigate whether the expression levels of a subset of highly variable genes (HVGs) can be used to accurately identify the phenotypes of plants. Using maize (Zea mays L.) as an example, we built machine learning (ML) models to predict phenotypes using a gene expression dataset of 21 612 bulk RNA sequencing samples. We showed that the ML models achieved excellent prediction accuracy using only the HVGs to identify different phenotypes, including tissue types, developmental stages, cultivars and stress conditions. By ML models, several important functional genes were found to be associated with different phenotypes. We performed a similar analysis in rice (Orzya sativa L.) and found that the ML models could be generalized across species. However, the models trained from maize did not perform well in rice, probably because of the expression divergence of the conserved HVGs between the two species. Overall, our results provide an ML framework for phenotype prediction using gene expression profiles, which may contribute to precision management of crops in agricultural practices.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae184"},"PeriodicalIF":4.0,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11672113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142903716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL), a Snakemate workflow for rapid and bulk analysis of Illumina sequencing of SARS-CoV-2 genomes. SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL)，用于快速和批量分析SARS-CoV-2基因组Illumina测序的Snakemate工作流程。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae176

Jalees A Nasir, Finlay Maguire, Kendrick M Smith, Emily M Panousis, Sheridan J C Baker, Patryk Aftanas, Amogelang R Raphenya, Brian P Alcock, Hassaan Maan, Natalie C Knox, Arinjay Banerjee, Karen Mossman, Bo Wang, Jared T Simpson, Robert A Kozak, Samira Mubareka, Andrew G McArthur

{"title":"SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL), a Snakemate workflow for rapid and bulk analysis of Illumina sequencing of SARS-CoV-2 genomes.","authors":"Jalees A Nasir, Finlay Maguire, Kendrick M Smith, Emily M Panousis, Sheridan J C Baker, Patryk Aftanas, Amogelang R Raphenya, Brian P Alcock, Hassaan Maan, Natalie C Knox, Arinjay Banerjee, Karen Mossman, Bo Wang, Jared T Simpson, Robert A Kozak, Samira Mubareka, Andrew G McArthur","doi":"10.1093/nargab/lqae176","DOIUrl":"10.1093/nargab/lqae176","url":null,"abstract":"The incorporation of sequencing technologies in frontline and public health healthcare settings was vital in developing virus surveillance programs during the Coronavirus Disease 2019 (COVID-19) pandemic caused by transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, increased data acquisition poses challenges for both rapid and accurate analyses. To overcome these hurdles, we developed the SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL) for quick bulk analyses of Illumina short-read sequencing data. SIGNAL is a Snakemake workflow that seamlessly manages parallel tasks to process large volumes of sequencing data. A series of outputs are generated, including consensus genomes, variant calls, lineage assessments and identified variants of concern (VOCs). Compared to other existing SARS-CoV-2 sequencing workflows, SIGNAL is one of the fastest-performing analysis tools while maintaining high accuracy. The source code is publicly available (github.com/jaleezyy/covid-19-signal) and is optimized to run on various systems, with software compatibility and resource management all handled within the workflow. Overall, SIGNAL illustrated its capacity for high-volume analyses through several contributions to publicly funded government public health surveillance programs and can be a valuable tool for continuing SARS-CoV-2 Illumina sequencing efforts and will inform the development of similar strategies for rapid viral sequence assessment.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae176"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring transcription modalities from bimodal, single-cell RNA sequencing data. 从双模单细胞 RNA 测序数据中探索转录模式

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae179

Enikő Regényi, Mir-Farzin Mashreghi, Christof Schütte, Vikram Sunkara

{"title":"Exploring transcription modalities from bimodal, single-cell RNA sequencing data.","authors":"Enikő Regényi, Mir-Farzin Mashreghi, Christof Schütte, Vikram Sunkara","doi":"10.1093/nargab/lqae179","DOIUrl":"10.1093/nargab/lqae179","url":null,"abstract":"There is a growing interest in generating bimodal, single-cell RNA sequencing (RNA-seq) data for studying biological pathways. These data are predominantly utilized in understanding phenotypic trajectories using RNA velocities; however, the shape information encoded in the two-dimensional resolution of such data is not yet exploited. In this paper, we present an elliptical parametrization of two-dimensional RNA-seq data, from which we derived statistics that reveal four different modalities. These modalities can be interpreted as manifestations of the changes in the rates of splicing, transcription or degradation. We performed our analysis on a cell cycle and a colorectal cancer dataset. In both datasets, we found genes that are not picked up by differential gene expression analysis (DGEA), and are consequently unnoticed, yet visibly delineate phenotypes. This indicates that, in addition to DGEA, searching for genes that exhibit the discovered modalities could aid recovering genes that set phenotypes apart. For communities studying biomarkers and cellular phenotyping, the modalities present in bimodal RNA-seq data broaden the search space of genes, and furthermore, allow for incorporating cellular RNA processing into regulatory analyses.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae179"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655292/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mining single-cell data for cell type-disease associations. 挖掘单细胞数据的细胞类型-疾病关联。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae180

Kevin G Chen, Kathryn O Farley, Timo Lassmann

{"title":"Mining single-cell data for cell type-disease associations.","authors":"Kevin G Chen, Kathryn O Farley, Timo Lassmann","doi":"10.1093/nargab/lqae180","DOIUrl":"10.1093/nargab/lqae180","url":null,"abstract":"A robust understanding of the cellular mechanisms underlying diseases sets the foundation for the effective design of drugs and other interventions. The wealth of existing single-cell atlases offers the opportunity to uncover high-resolution information on expression patterns across various cell types and time points. To better understand the associations between cell types and diseases, we leveraged previously developed tools to construct a standardized analysis pipeline and systematically explored associations across four single-cell datasets, spanning a range of tissue types, cell types and developmental time periods. We utilized a set of existing tools to identify co-expression modules and temporal patterns per cell type and then investigated these modules for known disease and phenotype enrichments. Our pipeline reveals known and novel putative cell type-disease associations across all investigated datasets. In addition, we found that automatically discovered gene co-expression modules and temporal clusters are enriched for drug targets, suggesting that our analysis could be used to identify novel therapeutic targets.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae180"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PyNetCor: a high-performance Python package for large-scale correlation analysis. PyNetCor：用于大规模相关分析的高性能 Python 软件包。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae177

Shibin Long, Yan Xia, Lifeng Liang, Ying Yang, Hailiang Xie, Xiaokai Wang

{"title":"PyNetCor: a high-performance Python package for large-scale correlation analysis.","authors":"Shibin Long, Yan Xia, Lifeng Liang, Ying Yang, Hailiang Xie, Xiaokai Wang","doi":"10.1093/nargab/lqae177","DOIUrl":"10.1093/nargab/lqae177","url":null,"abstract":"The development of multi-omics technologies has generated an abundance of biological datasets, providing valuable resources for investigating potential relationships within complex biological systems. However, most correlation analysis tools face computational challenges when dealing with these high-dimensional datasets containing millions of features. Here, we introduce pyNetCor, a fast and scalable tool for constructing correlation networks on large-scale and high-dimensional data. PyNetCor features optimized algorithms for both full correlation coefficient matrix computation and top-k correlation search, outperforming other tools in the field in terms of runtime and memory consumption. It utilizes a linear interpolation strategy to rapidly estimate P-values and achieve false discovery rate control, demonstrating a speedup of over 110 times compared to existing methods. Overall, pyNetCor supports large-scale correlation analysis, a crucial foundational step for various bioinformatics workflows, and can be easily integrated into downstream applications to accelerate the process of extracting biological insights from data.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae177"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved characterization of 3' single-cell RNA-seq libraries with paired-end avidity sequencing. 利用对端亲和度测序改进3'单细胞RNA-seq文库的表征。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae175

John T Chamberlin, Austin E Gillen, Aaron R Quinlan

{"title":"Improved characterization of 3' single-cell RNA-seq libraries with paired-end avidity sequencing.","authors":"John T Chamberlin, Austin E Gillen, Aaron R Quinlan","doi":"10.1093/nargab/lqae175","DOIUrl":"10.1093/nargab/lqae175","url":null,"abstract":"Prevailing poly(dT)-primed 3' single-cell RNA-seq protocols generate barcoded cDNA fragments containing the reverse transcriptase priming site or in principle the polyadenylation site. Direct sequencing across this site was historically difficult because of DNA sequencing errors induced by the homopolymeric primer at the 'barcode' end. Here, we evaluate the capability of 'avidity base chemistry' DNA sequencing from Element Biosciences to sequence through the primer and enable accurate paired-end read alignment and precise quantification of polyadenylation sites. We find that the Element Aviti instrument sequences through the thymine homopolymer into the subsequent cDNA sequence without detectable loss of accuracy. The additional sequence enables direct and independent assignment of reads to polyadenylation sites, which bypasses the complexities and limitations of conventional approaches but does not consistently improve read mapping rates compared to single-end alignment. We also characterize low-level artifacts and demonstrate necessary adjustments to adapter trimming and sequence alignment regardless of platform, particularly in the context of extended read lengths. Our analyses confirm that Element avidity sequencing is an effective alternative to Illumina sequencing for standard single-cell RNA-seq, particularly for polyadenylation site measurement but do not rule out the potential for similar performance from other emerging platforms.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae175"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Navigating Illumina DNA methylation data: biology versus technical artefacts. 导航Illumina DNA甲基化数据：生物学与技术人工制品。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae181

Selina Glaser, Helene Kretzmer, Iris Tatjana Kolassa, Matthias Schlesner, Anja Fischer, Isabell Fenske, Reiner Siebert, Ole Ammerpohl

{"title":"Navigating Illumina DNA methylation data: biology versus technical artefacts.","authors":"Selina Glaser, Helene Kretzmer, Iris Tatjana Kolassa, Matthias Schlesner, Anja Fischer, Isabell Fenske, Reiner Siebert, Ole Ammerpohl","doi":"10.1093/nargab/lqae181","DOIUrl":"10.1093/nargab/lqae181","url":null,"abstract":"Illumina-based BeadChip arrays have revolutionized genome-wide DNA methylation profiling, pushing it into diagnostics. However, comprehensive quality assessment remains challenging within a wide range of available tissue materials and sample preparation methods. This study tackles two critical issues: differentiating between biological effects and technical artefacts in suboptimal quality samples and the impact of the first sample on the Illumina-like normalization algorithm. We introduce three quality control scores based on global DNA methylation distribution (DB-Score), bin distance from copy number variation analysis (BIN-Score) and consistently methylated CpGs (CM-Score) that rely on biological features rather than internal array controls. These scores, designed to be adjustable for different analysis tools and sample cohort characteristics, were explored and benchmarked across independent cohorts. Additionally, we reveal deviations in beta values caused by different sample rankings with the Illumina-like normalization algorithm, verified these with whole-genome methylation sequencing data and showed effects on differential DNA methylation analysis. Our findings underscore the necessity of consistently utilizing a pre-defined normalization sample within the ranking process to boost reproducibility of the Illumina-like normalization algorithm. Overall, our study delivers valuable insights, practical recommendations and R functions designed to enhance reproducibility and quality assurance of DNA methylation analysis, particularly for challenging sample types.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae181"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AntiBody Sequence Database. 抗体序列数据库。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae171

Simon Malesys, Rachel Torchet, Bertrand Saunier, Nicolas Maillet

{"title":"AntiBody Sequence Database.","authors":"Simon Malesys, Rachel Torchet, Bertrand Saunier, Nicolas Maillet","doi":"10.1093/nargab/lqae171","DOIUrl":"10.1093/nargab/lqae171","url":null,"abstract":"Antibodies play a crucial role in the humoral immune response against health threats, such as viral infections. Although the theoretical number of human immunoglobulins is well over a trillion, the total number of unique antibody protein sequences accessible in databases is much lower than the number found in a single individual. Training AI (Artificial Intelligence) models, for example to assist in developing serodiagnoses or antibody-based therapies, requires building datasets according to strict criteria to include as many standardized antibody sequences as possible. However, the available sequences are scattered across partially redundant databases, making it difficult to compile them into single non-redundant datasets. Here, we introduce ABSD (AntiBody Sequence Database, https://absd.pasteur.cloud), which contains data from major publicly available resources, creating the largest standardized, automatically updated and non-redundant source of public antibody sequences. This user-friendly and open website enables users to generate lists of antibodies based on selected criteria and download the unique sequence pairs of their variable regions.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae171"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655285/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IDclust: Iterative clustering for unsupervised identification of cell types with single cell transcriptomics and epigenomics. IDclust：迭代聚类与单细胞转录组学和表观基因组学无监督的细胞类型鉴定。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae174

Pacôme Prompsy, Mélissa Saichi, Félix Raimundo, Céline Vallot

{"title":"IDclust: Iterative clustering for unsupervised identification of cell types with single cell transcriptomics and epigenomics.","authors":"Pacôme Prompsy, Mélissa Saichi, Félix Raimundo, Céline Vallot","doi":"10.1093/nargab/lqae174","DOIUrl":"10.1093/nargab/lqae174","url":null,"abstract":"The increasing diversity of single-cell datasets require systematic cell type characterization. Clustering is a critical step in single-cell analysis, heavily influencing downstream analyses. However, current unsupervised clustering algorithms rely on biologically irrelevant parameters that require manual optimization and fail to capture hierarchical relationships between clusters. We developed IDclust, a framework that identifies clusters with significant biological features at multiple resolutions using biologically meaningful thresholds like fold change, adjusted P-value and fraction of expressing cells. By iteratively processing and clustering subsets of the dataset, IDclust guarantees that all clusters found have significantly different features and stops only when no more interpretable cluster is found. It also creates a hierarchy of clusters, enabling visualization of the hierarchical relationships between different clusters. Analyzing multiple single-cell transcriptomic reference datasets, IDclust achieves superior clustering accuracy compared to state of the art algorithms. We showcase its utility by identifying previously unannotated clusters and identifying branching patterns in scATAC datasets. Using it's unsupervised nature and ability to analyze different -omics, we compare the resolution of different histone marks in multi-omic paired-tag dataset. Overall, IDclust automates single-cell exploration, facilitates cell type annotation and provides a biologically interpretable basis for clustering.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae174"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655290/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal Representative Strain selector-a comprehensive pipeline for selecting next-generation reference strains of bacterial species. 最优代表性菌株选择器——选择下一代参考菌株的综合管道。

IF 4

NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae173

Chiara Tarracchini, Federico Fontana, Silvia Petraro, Gabriele Andrea Lugli, Leonardo Mancabelli, Francesca Turroni, Marco Ventura, Christian Milani

{"title":"Optimal Representative Strain selector-a comprehensive pipeline for selecting next-generation reference strains of bacterial species.","authors":"Chiara Tarracchini, Federico Fontana, Silvia Petraro, Gabriele Andrea Lugli, Leonardo Mancabelli, Francesca Turroni, Marco Ventura, Christian Milani","doi":"10.1093/nargab/lqae173","DOIUrl":"10.1093/nargab/lqae173","url":null,"abstract":"Although it is common practice to use historically established 'reference strains' or 'type strains' for laboratory experiments, this approach often overlooks how effectively these strains represent the full ecological, genetic and functional diversity of the species within a specific ecological niche. In this context, this study proposes the Optimal Representative Strain (ORS) selector tool (https://zenodo.org/doi/10.5281/zenodo.13772191), an innovative bioinformatic pipeline capable of evaluating how a strain represents its whole species from a genetic and functional perspective, in addition to considering its ecological distribution in a particular ecological niche. Based on publicly available genomes, the strain that best fits all these three microbiological aspects is designated as an optimal representative strain. Moreover, a user-friendly software called Local Alternative Optimal Representative Strain selector was developed to allow researchers to screen their local library of bacterial strains for an optimal available alternative based on the reference optimal representative strain. Five different bacterial species, i.e. Lacticaseibacillus paracasei, Lactobacillus delbrueckii, Streptococcus thermophilus, Bacteroides thetaiotaomicron and Lactococcus lactis, were tested in three different environments to evaluate the performance of the bioinformatic pipeline in selecting optimal representative strains.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae173"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0