bioRxiv - Bioinformatics最新文献_第3页

Revisiting the taxonomy of Enterococcus casseliflavus and related species 重新审视卡氏肠球菌及相关物种的分类法

bioRxiv - Bioinformatics Pub Date : 2024-09-17 DOI: 10.1101/2024.09.16.613146

Matheus Miguel Soares de Medeiros Lima, Janira Prichula, Tetsu Sakamoto

{"title":"Revisiting the taxonomy of Enterococcus casseliflavus and related species","authors":"Matheus Miguel Soares de Medeiros Lima, Janira Prichula, Tetsu Sakamoto","doi":"10.1101/2024.09.16.613146","DOIUrl":"https://doi.org/10.1101/2024.09.16.613146","url":null,"abstract":"Enterococcus casseliflavus, a commonly mobile and yellow-colored bacterium, is a commensal member of the gastrointestinal tract. It is occasionally found in cases of bacteremia and other human infections. A concern is that all strains of this species have the vanC gene group on their chromosome, which confers resistance to vancomycin. The classification of E. casseliflavus is challenging, as it presents 99% identity in 16S analysis with E. gallinarum and, mainly, with E. flavescens, often being classified as a single species. This study aimed to revisit the taxonomy of E. casseliflavus and other related species by carrying out a comprehensive analysis of the genomic data available for these species in public databases.analyzing the genomic data. For this, 155 genomes of E. casseliflavus related species (E. casseliflavus, E. flavescens, E. entomosocium, and E. innesii) were retrieved and submitted to Average Nucleotide Identity (ANI) and phylogenomic analysis. Both approaches showed three well-delineated clusters which correspond to three Enterococcus species (E. casseliflavus, E. flavescens and E. innesii). Here we suggest (1) the removal of synonym status between E. flavescens and E. cassliflavus, and (2) addition of synonym status between E. entomosocium and E. casseliflavus.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"207 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

haCCA: Multi-module Integrating of spatial transcriptomes and metabolomes. haCCA：空间转录组和代谢组的多模块整合。

bioRxiv - Bioinformatics Pub Date : 2024-09-17 DOI: 10.1101/2024.08.20.608773

Xiaotian Shen, Xiaoyun Zhang

引用次数: 0

The Precise Basecalling of Short-Read Nanopore Sequencing 短读数纳米孔测序的精确基数调用

bioRxiv - Bioinformatics Pub Date : 2024-09-17 DOI: 10.1101/2024.09.12.612746

Ziyuan Wang, Mei-Juan Tu, Chengcheng Song, Ziyang Liu, Katherine K Wang, Shuibing Chen, Ai-Ming Yu, HONGXU DING

引用次数: 0

PangeBlocks: customized construction of pangenome graphs via maximal blocks PangeBlocks：通过最大块定制构建泛基因组图谱

bioRxiv - Bioinformatics Pub Date : 2024-09-17 DOI: 10.1101/2024.09.17.613426

Paola Bonizzoni, Jorge Eduardo Avila Cartes, Simone Ciccolella, Gianluca Della Vedova, Luca Denti

{"title":"PangeBlocks: customized construction of pangenome graphs via maximal blocks","authors":"Paola Bonizzoni, Jorge Eduardo Avila Cartes, Simone Ciccolella, Gianluca Della Vedova, Luca Denti","doi":"10.1101/2024.09.17.613426","DOIUrl":"https://doi.org/10.1101/2024.09.17.613426","url":null,"abstract":"Background: The construction of a pangenome graph is a fundamental task in pangenomics. A natural theoretical question is how to formalize the computational problem of building an optimal pangenome graph, making explicit\u0000the underlying optimization criterion and the set of feasible solutions. Current approaches build a pangenome graph with some heuristics, without assuming some explicit optimization criteria. Thus it is unclear how a specific optimization criterion affects the graph topology and downstream analysis, like read mapping and variant calling.\u0000Methods: In this paper, by leveraging the notion of maximal block in a Multiple Sequence Alignment (MSA), we reframe the pangenome graph construction problem as an exact cover problem on blocks called Minimum Weighted Block Cover (MWBC). Then we propose an Integer Linear Programming (ILP) formulation for the MWBC problem that allows us to study the most natural objective functions for building a graph.\u0000Results: We provide an implementation of the ILP approach for solving the MWBC and we evaluate it on SARS-CoV-2 complete genomes, showing how different objective functions lead to pangenome graphs that have different properties, hinting that the specific downstream task can drive the graph construction phase.\u0000Conclusion: We show that a customized construction of a pangenome graph based on selecting objective functions has a direct impact on the resulting graphs.\u0000In particular, our formalization of the MWBC problem, based on finding an optimal subset of blocks covering an MSA, paves the way to novel practical approaches to graph representations of an MSA where the user can guide the construction.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

easybio: an R Package for Single-Cell Annotation with CellMarker2.0 easybio：使用 CellMarker2.0 进行单细胞注释的 R 软件包

bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.14.609619

Cui Wei

{"title":"easybio: an R Package for Single-Cell Annotation with CellMarker2.0","authors":"Cui Wei","doi":"10.1101/2024.09.14.609619","DOIUrl":"https://doi.org/10.1101/2024.09.14.609619","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) allows researchers to study biological activities at the cellular level, enabling the discovery of new cell types and the analysis of intercellular interactions. However, annotating cell types in scRNA-seq data is a crucial and time-consuming process, with its quality significantly influencing downstream analyses. Accurate identification of potential cell types provides valuable insights for discovering new cell populations or identifying novel markers for known cells, which may be utilized in future research. While various methods exist for single-cell annotation, one of the most common approaches is to use known cell markers. The CellMarker2.0 database, a human-curated repository of cell markers extracted from published articles, is widely used for this purpose. However, it currently offers only a web-based tool for usage, which can be inconvenient when integrating with workflows like Seurat. To address this limitation, we introduce easybio, an R package designed to streamline single-cell annotation using the CellMarker2.0 database in conjunction with Seurat. easybio provides a suite of functions for querying the CellMarker2.0 database locally, offering insights into potential cell types for each cluster. In addition to single-cell annotation, the package also supports various bioinformatics workflows, including RNA-seq analysis, making it a versatile tool for transcriptomic research.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing differential cell composition in single-cell studies using voomCLR 使用 voomCLR 评估单细胞研究中的细胞组成差异

bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.12.612645

Alemu Takele Assefa, Bie Verbist, Koen Van den Berge

{"title":"Assessing differential cell composition in single-cell studies using voomCLR","authors":"Alemu Takele Assefa, Bie Verbist, Koen Van den Berge","doi":"10.1101/2024.09.12.612645","DOIUrl":"https://doi.org/10.1101/2024.09.12.612645","url":null,"abstract":"In single-cell studies, a common question is whether there is a change in cell composition between conditions. While ideally, one needs absolute cell counts (number of cells per volumetric unit in a sample) to address these questions, current experimentation typically obtains cell counts that only carry relative information. It is therefore crucial to account for the compositional nature of cell count data in the statistical analysis. While recently developed methods address compositionality using compositional transformations together with a bias correction, they do not account for the uncertainty involved in estimation of the bias term, nor do they accommodate the mean-variance structure of the counts. Here, we introduce a statistical method, voomCLR, for assessing differences in cell composition between conditions incorporating both uncertainty on the bias term as well as acknowledging the mean-variance structure of the transformed data, by leveraging developments from the differential gene expression literature. We demonstrate the performances of voomCLR, illustrate the benefit of all components and compare the methodology to the state-of-the-art on simulated and real single-cell gene expression datasets.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation 通过基因速度估算优化运输揭示动态基因调控网络

bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.12.612590

Wenjun Zhao, Erica Larschan, Bjorn Sandstede, Ritambhara Singh

引用次数: 0

gsQTL: Associating genetic risk variants with gene sets by exploiting their shared variability gsQTL：利用基因组的共享变异性，将遗传风险变异与基因组联系起来

bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.13.612853

Gerard A Bouland, Niccolo Tesi, Ahmed Mahfouz, Marcel Reinders

{"title":"gsQTL: Associating genetic risk variants with gene sets by exploiting their shared variability","authors":"Gerard A Bouland, Niccolo Tesi, Ahmed Mahfouz, Marcel Reinders","doi":"10.1101/2024.09.13.612853","DOIUrl":"https://doi.org/10.1101/2024.09.13.612853","url":null,"abstract":"To investigate the functional significance of genetic risk loci identified through genome-wide association studies (GWASs), genetic loci are linked to genes based on their capacity to account for variation in gene expression, resulting in expression quantitative trait loci (eQTL). Following this, gene set analyses are commonly used to gain insights into functionality. However, the efficacy of this approach is hampered by small effect sizes and the burden of multiple testing. We propose an alternative approach: instead of examining the cumulative associations of individual genes within a gene set, we consider the collective variation of the entire gene set. We introduce the concept of gene set QTL (gsQTL), and show it to be more adept at identifying links between genetic risk variants and specific gene sets. Notably, gsQTL experiences less susceptibility to inflation or deflation of significant enrichments compared with conventional methods. Furthermore, we demonstrate the broader applicability of shared variability within gene sets. This is evident in scenarios such as the coordinated regulation of genes by a transcription factor or coordinated differential expression.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Learning Predicts Non-Normal Peptide FAIMS Mobility Distributions Directly from Sequence 深度学习直接从序列预测非正态性多肽 FAIMS 迁移率分布

bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.11.612538

Justin McKetney, Ian J Miller, Alexandre Hutton, Pavel Sinitcyn, Joshua J Coon, Jesse G Meyer

{"title":"Deep Learning Predicts Non-Normal Peptide FAIMS Mobility Distributions Directly from Sequence","authors":"Justin McKetney, Ian J Miller, Alexandre Hutton, Pavel Sinitcyn, Joshua J Coon, Jesse G Meyer","doi":"10.1101/2024.09.11.612538","DOIUrl":"https://doi.org/10.1101/2024.09.11.612538","url":null,"abstract":"Peptide ion mobility adds an extra dimension of separation to mass spectrometry-based proteomics. The ability to accurately predict peptide ion mobility would be useful to expedite assay development and to discriminate true answers in data-base search. There are methods to accurately predict peptide ion mobility through drift tube devices, but methods to predict mobility through high-field asymmetric waveform ion mobility (FAIMS) are underexplored. Here, we successfully model peptide ions' FAIMS mobility using a multi-label multi-output classification scheme to account for non-normal transmission distributions. We trained two models from over 100,000 human peptide precursors: a random forest and a long-term short-term memory (LSTM) neural network. Both models had different strengths, and the ensemble average of model predictions produced higher F2 score than either model alone. Finally, we explore cases where the models make mistakes and demonstrate predictive performance of F2=0.66 (AUROC=0.928) on a new test dataset of nearly 40,000 different E. coli peptide ions. The deep learning model is easily accessible via https://faims.xods.org.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins DeOri 10.0：经实验鉴定的真核生物复制起源数据库更新版

bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.12.612581

Yu-Hao Zeng, Zhen-Ning Yin, Hao Luo, Feng Gao

{"title":"DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins","authors":"Yu-Hao Zeng, Zhen-Ning Yin, Hao Luo, Feng Gao","doi":"10.1101/2024.09.12.612581","DOIUrl":"https://doi.org/10.1101/2024.09.12.612581","url":null,"abstract":"DNA replication is a complex and crucial biological process in eukaryotes. To facilitate the study of eukaryotic replication events, we present database of eukaryotic DNA replication origins (DeOri), a database that collects scattered data and integrates extensive sequencing data on eukaryotic DNA replication origins. With continuous updates of DeOri, the number of datasets in the new release increased from 10 to 151 and the number of sequences increased from 16,145 to 9,742,396. Besides nucleotide sequences and bed files, corresponding annotation files, such as coding sequences (CDS), mRNA, and other biological elements within replication origins, are also provided. The experimental techniques used for each dataset, as well as other statistical data, are also presented on web page. Differences in experimental methods, cell lines, and sequencing technologies have resulted in distinct replication origins, making it challenging to differentiate between cell-specific and non-specific replication. We combined multiple replication origins at the species level, scored them, and screened them. The screened regions were considered as species-conservative origins. They are integrated and presented as reference replication origins (rORIs), including Homo sapiens, Gallus gallus, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. Additionally, we analyzed the distribution of relevant genomic elements associated with replication origins at the genome level, such as CpG island (CGI), transcription start site (TSS), and G-quadruplex (G4). These analysis results allow users to select the required data based on it. DeOri is available at http://tubic.tju.edu.cn/deori10/.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0