Matheus Miguel Soares de Medeiros Lima, Janira Prichula, Tetsu Sakamoto
{"title":"Revisiting the taxonomy of Enterococcus casseliflavus and related species","authors":"Matheus Miguel Soares de Medeiros Lima, Janira Prichula, Tetsu Sakamoto","doi":"10.1101/2024.09.16.613146","DOIUrl":"https://doi.org/10.1101/2024.09.16.613146","url":null,"abstract":"Enterococcus casseliflavus, a commonly mobile and yellow-colored bacterium, is a commensal member of the gastrointestinal tract. It is occasionally found in cases of bacteremia and other human infections. A concern is that all strains of this species have the vanC gene group on their chromosome, which confers resistance to vancomycin. The classification of E. casseliflavus is challenging, as it presents 99% identity in 16S analysis with E. gallinarum and, mainly, with E. flavescens, often being classified as a single species. This study aimed to revisit the taxonomy of E. casseliflavus and other related species by carrying out a comprehensive analysis of the genomic data available for these species in public databases.analyzing the genomic data. For this, 155 genomes of E. casseliflavus related species (E. casseliflavus, E. flavescens, E. entomosocium, and E. innesii) were retrieved and submitted to Average Nucleotide Identity (ANI) and phylogenomic analysis. Both approaches showed three well-delineated clusters which correspond to three Enterococcus species (E. casseliflavus, E. flavescens and E. innesii). Here we suggest (1) the removal of synonym status between E. flavescens and E. cassliflavus, and (2) addition of synonym status between E. entomosocium and E. casseliflavus.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"207 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"haCCA: Multi-module Integrating of spatial transcriptomes and metabolomes.","authors":"Xiaotian Shen, Xiaoyun Zhang","doi":"10.1101/2024.08.20.608773","DOIUrl":"https://doi.org/10.1101/2024.08.20.608773","url":null,"abstract":"Spatial techniques such as spatial transcriptomes and MALDI-MSI, offering insights into both transcripts and metabolite of tissue sections. However, integrating them with high accuracy is challenge due to no shared spots or features. We present haCCA, a workflow designed to integrate spatial transcriptomes and metabolomes data using high-correlated feature pairs and modified spatial morphological alignment. This approach ensures high-resolution and accurate spot-to-spot data integration across neighbor tissue section. We applied haCCA to both publicly available 10X Visium and MALDI-MSI datasets from mouse brain tissue and a custom spatial transcriptome and MALDI-MSI dataset from an intrahepatic cholangiocarcinoma (ICC) model, exploring the metabolic alteration of NETs(neutrophil extracellular traps) on ICC, and finding a potential mechanism that NETs upregulated Scd1 to activate fatty acid metabolism. Providing new insights into the dynamic crosstalk between genes and metabolites that regulates the tumor biological behavior and drives the response to treatment. We developed and published an easy-to-use Python package to facilitate its use.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Precise Basecalling of Short-Read Nanopore Sequencing","authors":"Ziyuan Wang, Mei-Juan Tu, Chengcheng Song, Ziyang Liu, Katherine K Wang, Shuibing Chen, Ai-Ming Yu, HONGXU DING","doi":"10.1101/2024.09.12.612746","DOIUrl":"https://doi.org/10.1101/2024.09.12.612746","url":null,"abstract":"The nanopore sequencing of short sequences, whose lengths are typically less than 0.3kb therefore comparable with Illumina sequencing techniques, has recently gained wide attention. Here, we design a scheme for training nanopore basecallers that are specialized for short biomolecules. With bioengineered RNA (BioRNA) molecules as examples, we demonstrate the superior accuracy of basecallers trained by our scheme.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PangeBlocks: customized construction of pangenome graphs via maximal blocks","authors":"Paola Bonizzoni, Jorge Eduardo Avila Cartes, Simone Ciccolella, Gianluca Della Vedova, Luca Denti","doi":"10.1101/2024.09.17.613426","DOIUrl":"https://doi.org/10.1101/2024.09.17.613426","url":null,"abstract":"Background: The construction of a pangenome graph is a fundamental task in pangenomics. A natural theoretical question is how to formalize the computational problem of building an optimal pangenome graph, making explicit\u0000the underlying optimization criterion and the set of feasible solutions. Current approaches build a pangenome graph with some heuristics, without assuming some explicit optimization criteria. Thus it is unclear how a specific optimization criterion affects the graph topology and downstream analysis, like read mapping and variant calling.\u0000Methods: In this paper, by leveraging the notion of maximal block in a Multiple Sequence Alignment (MSA), we reframe the pangenome graph construction problem as an exact cover problem on blocks called Minimum Weighted Block Cover (MWBC). Then we propose an Integer Linear Programming (ILP) formulation for the MWBC problem that allows us to study the most natural objective functions for building a graph.\u0000Results: We provide an implementation of the ILP approach for solving the MWBC and we evaluate it on SARS-CoV-2 complete genomes, showing how different objective functions lead to pangenome graphs that have different properties, hinting that the specific downstream task can drive the graph construction phase.\u0000Conclusion: We show that a customized construction of a pangenome graph based on selecting objective functions has a direct impact on the resulting graphs.\u0000In particular, our formalization of the MWBC problem, based on finding an optimal subset of blocks covering an MSA, paves the way to novel practical approaches to graph representations of an MSA where the user can guide the construction.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"easybio: an R Package for Single-Cell Annotation with CellMarker2.0","authors":"Cui Wei","doi":"10.1101/2024.09.14.609619","DOIUrl":"https://doi.org/10.1101/2024.09.14.609619","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) allows researchers to study biological activities at the cellular level, enabling the discovery of new cell types and the analysis of intercellular interactions. However, annotating cell types in scRNA-seq data is a crucial and time-consuming process, with its quality significantly influencing downstream analyses. Accurate identification of potential cell types provides valuable insights for discovering new cell populations or identifying novel markers for known cells, which may be utilized in future research. While various methods exist for single-cell annotation, one of the most common approaches is to use known cell markers. The CellMarker2.0 database, a human-curated repository of cell markers extracted from published articles, is widely used for this purpose. However, it currently offers only a web-based tool for usage, which can be inconvenient when integrating with workflows like Seurat. To address this limitation, we introduce easybio, an R package designed to streamline single-cell annotation using the CellMarker2.0 database in conjunction with Seurat. easybio provides a suite of functions for querying the CellMarker2.0 database locally, offering insights into potential cell types for each cluster. In addition to single-cell annotation, the package also supports various bioinformatics workflows, including RNA-seq analysis, making it a versatile tool for transcriptomic research.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alemu Takele Assefa, Bie Verbist, Koen Van den Berge
{"title":"Assessing differential cell composition in single-cell studies using voomCLR","authors":"Alemu Takele Assefa, Bie Verbist, Koen Van den Berge","doi":"10.1101/2024.09.12.612645","DOIUrl":"https://doi.org/10.1101/2024.09.12.612645","url":null,"abstract":"In single-cell studies, a common question is whether there is a change in cell composition between conditions. While ideally, one needs absolute cell counts (number of cells per volumetric unit in a sample) to address these questions, current experimentation typically obtains cell counts that only carry relative information. It is therefore crucial to account for the compositional nature of cell count data in the statistical analysis. While recently developed methods address compositionality using compositional transformations together with a bias correction, they do not account for the uncertainty involved in estimation of the bias term, nor do they accommodate the mean-variance structure of the counts. Here, we introduce a statistical method, voomCLR, for assessing differences in cell composition between conditions incorporating both uncertainty on the bias term as well as acknowledging the mean-variance structure of the transformed data, by leveraging developments from the differential gene expression literature. We demonstrate the performances of voomCLR, illustrate the benefit of all components and compare the methodology to the state-of-the-art on simulated and real single-cell gene expression datasets.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation","authors":"Wenjun Zhao, Erica Larschan, Bjorn Sandstede, Ritambhara Singh","doi":"10.1101/2024.09.12.612590","DOIUrl":"https://doi.org/10.1101/2024.09.12.612590","url":null,"abstract":"Inferring gene regulatory networks from gene expression data is an important and challenging problem in the biology community. We propose OTVelo, a methodology that takes time-stamped single-cell gene expression data as input and predicts gene regulation across two time points. It is known that the rate of change of gene expression, which we will refer to as gene velocity, provides crucial information that enhances such inference; however, this information is not always available due to the limitations in sequencing depth. Our algorithm overcomes this limitation by estimating gene velocities using optimal transport. We then infer gene regulation using time-lagged correlation and Granger causality via regularized linear regression. Instead of providing an aggregated network across all time points, our method uncovers the underlying dynamical mechanism across time points. We validate our algorithm on 13 simulated datasets with both synthetic and curated networks and demonstrate its efficacy on 4 experimental data sets.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"188 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gerard A Bouland, Niccolo Tesi, Ahmed Mahfouz, Marcel Reinders
{"title":"gsQTL: Associating genetic risk variants with gene sets by exploiting their shared variability","authors":"Gerard A Bouland, Niccolo Tesi, Ahmed Mahfouz, Marcel Reinders","doi":"10.1101/2024.09.13.612853","DOIUrl":"https://doi.org/10.1101/2024.09.13.612853","url":null,"abstract":"To investigate the functional significance of genetic risk loci identified through genome-wide association studies (GWASs), genetic loci are linked to genes based on their capacity to account for variation in gene expression, resulting in expression quantitative trait loci (eQTL). Following this, gene set analyses are commonly used to gain insights into functionality. However, the efficacy of this approach is hampered by small effect sizes and the burden of multiple testing. We propose an alternative approach: instead of examining the cumulative associations of individual genes within a gene set, we consider the collective variation of the entire gene set. We introduce the concept of gene set QTL (gsQTL), and show it to be more adept at identifying links between genetic risk variants and specific gene sets. Notably, gsQTL experiences less susceptibility to inflation or deflation of significant enrichments compared with conventional methods. Furthermore, we demonstrate the broader applicability of shared variability within gene sets. This is evident in scenarios such as the coordinated regulation of genes by a transcription factor or coordinated differential expression.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins","authors":"Yu-Hao Zeng, Zhen-Ning Yin, Hao Luo, Feng Gao","doi":"10.1101/2024.09.12.612581","DOIUrl":"https://doi.org/10.1101/2024.09.12.612581","url":null,"abstract":"DNA replication is a complex and crucial biological process in eukaryotes. To facilitate the study of eukaryotic replication events, we present database of eukaryotic DNA replication origins (DeOri), a database that collects scattered data and integrates extensive sequencing data on eukaryotic DNA replication origins. With continuous updates of DeOri, the number of datasets in the new release increased from 10 to 151 and the number of sequences increased from 16,145 to 9,742,396. Besides nucleotide sequences and bed files, corresponding annotation files, such as coding sequences (CDS), mRNA, and other biological elements within replication origins, are also provided. The experimental techniques used for each dataset, as well as other statistical data, are also presented on web page. Differences in experimental methods, cell lines, and sequencing technologies have resulted in distinct replication origins, making it challenging to differentiate between cell-specific and non-specific replication. We combined multiple replication origins at the species level, scored them, and screened them. The screened regions were considered as species-conservative origins. They are integrated and presented as reference replication origins (rORIs), including Homo sapiens, Gallus gallus, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. Additionally, we analyzed the distribution of relevant genomic elements associated with replication origins at the genome level, such as CpG island (CGI), transcription start site (TSS), and G-quadruplex (G4). These analysis results allow users to select the required data based on it. DeOri is available at http://tubic.tju.edu.cn/deori10/.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Justin McKetney, Ian J Miller, Alexandre Hutton, Pavel Sinitcyn, Joshua J Coon, Jesse G Meyer
{"title":"Deep Learning Predicts Non-Normal Peptide FAIMS Mobility Distributions Directly from Sequence","authors":"Justin McKetney, Ian J Miller, Alexandre Hutton, Pavel Sinitcyn, Joshua J Coon, Jesse G Meyer","doi":"10.1101/2024.09.11.612538","DOIUrl":"https://doi.org/10.1101/2024.09.11.612538","url":null,"abstract":"Peptide ion mobility adds an extra dimension of separation to mass spectrometry-based proteomics. The ability to accurately predict peptide ion mobility would be useful to expedite assay development and to discriminate true answers in data-base search. There are methods to accurately predict peptide ion mobility through drift tube devices, but methods to predict mobility through high-field asymmetric waveform ion mobility (FAIMS) are underexplored. Here, we successfully model peptide ions' FAIMS mobility using a multi-label multi-output classification scheme to account for non-normal transmission distributions. We trained two models from over 100,000 human peptide precursors: a random forest and a long-term short-term memory (LSTM) neural network. Both models had different strengths, and the ensemble average of model predictions produced higher F2 score than either model alone. Finally, we explore cases where the models make mistakes and demonstrate predictive performance of F2=0.66 (AUROC=0.928) on a new test dataset of nearly 40,000 different E. coli peptide ions. The deep learning model is easily accessible via https://faims.xods.org.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}