bioRxiv - Bioinformatics最新文献

筛选
英文 中文
Revisiting the taxonomy of Enterococcus casseliflavus and related species 重新审视卡氏肠球菌及相关物种的分类法
bioRxiv - Bioinformatics Pub Date : 2024-09-17 DOI: 10.1101/2024.09.16.613146
Matheus Miguel Soares de Medeiros Lima, Janira Prichula, Tetsu Sakamoto
{"title":"Revisiting the taxonomy of Enterococcus casseliflavus and related species","authors":"Matheus Miguel Soares de Medeiros Lima, Janira Prichula, Tetsu Sakamoto","doi":"10.1101/2024.09.16.613146","DOIUrl":"https://doi.org/10.1101/2024.09.16.613146","url":null,"abstract":"Enterococcus casseliflavus, a commonly mobile and yellow-colored bacterium, is a commensal member of the gastrointestinal tract. It is occasionally found in cases of bacteremia and other human infections. A concern is that all strains of this species have the vanC gene group on their chromosome, which confers resistance to vancomycin. The classification of E. casseliflavus is challenging, as it presents 99% identity in 16S analysis with E. gallinarum and, mainly, with E. flavescens, often being classified as a single species. This study aimed to revisit the taxonomy of E. casseliflavus and other related species by carrying out a comprehensive analysis of the genomic data available for these species in public databases.analyzing the genomic data. For this, 155 genomes of E. casseliflavus related species (E. casseliflavus, E. flavescens, E. entomosocium, and E. innesii) were retrieved and submitted to Average Nucleotide Identity (ANI) and phylogenomic analysis. Both approaches showed three well-delineated clusters which correspond to three Enterococcus species (E. casseliflavus, E. flavescens and E. innesii). Here we suggest (1) the removal of synonym status between E. flavescens and E. cassliflavus, and (2) addition of synonym status between E. entomosocium and E. casseliflavus.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"207 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
haCCA: Multi-module Integrating of spatial transcriptomes and metabolomes. haCCA:空间转录组和代谢组的多模块整合。
bioRxiv - Bioinformatics Pub Date : 2024-09-17 DOI: 10.1101/2024.08.20.608773
Xiaotian Shen, Xiaoyun Zhang
{"title":"haCCA: Multi-module Integrating of spatial transcriptomes and metabolomes.","authors":"Xiaotian Shen, Xiaoyun Zhang","doi":"10.1101/2024.08.20.608773","DOIUrl":"https://doi.org/10.1101/2024.08.20.608773","url":null,"abstract":"Spatial techniques such as spatial transcriptomes and MALDI-MSI, offering insights into both transcripts and metabolite of tissue sections. However, integrating them with high accuracy is challenge due to no shared spots or features. We present haCCA, a workflow designed to integrate spatial transcriptomes and metabolomes data using high-correlated feature pairs and modified spatial morphological alignment. This approach ensures high-resolution and accurate spot-to-spot data integration across neighbor tissue section. We applied haCCA to both publicly available 10X Visium and MALDI-MSI datasets from mouse brain tissue and a custom spatial transcriptome and MALDI-MSI dataset from an intrahepatic cholangiocarcinoma (ICC) model, exploring the metabolic alteration of NETs(neutrophil extracellular traps) on ICC, and finding a potential mechanism that NETs upregulated Scd1 to activate fatty acid metabolism. Providing new insights into the dynamic crosstalk between genes and metabolites that regulates the tumor biological behavior and drives the response to treatment. We developed and published an easy-to-use Python package to facilitate its use.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Precise Basecalling of Short-Read Nanopore Sequencing 短读数纳米孔测序的精确基数调用
bioRxiv - Bioinformatics Pub Date : 2024-09-17 DOI: 10.1101/2024.09.12.612746
Ziyuan Wang, Mei-Juan Tu, Chengcheng Song, Ziyang Liu, Katherine K Wang, Shuibing Chen, Ai-Ming Yu, HONGXU DING
{"title":"The Precise Basecalling of Short-Read Nanopore Sequencing","authors":"Ziyuan Wang, Mei-Juan Tu, Chengcheng Song, Ziyang Liu, Katherine K Wang, Shuibing Chen, Ai-Ming Yu, HONGXU DING","doi":"10.1101/2024.09.12.612746","DOIUrl":"https://doi.org/10.1101/2024.09.12.612746","url":null,"abstract":"The nanopore sequencing of short sequences, whose lengths are typically less than 0.3kb therefore comparable with Illumina sequencing techniques, has recently gained wide attention. Here, we design a scheme for training nanopore basecallers that are specialized for short biomolecules. With bioengineered RNA (BioRNA) molecules as examples, we demonstrate the superior accuracy of basecallers trained by our scheme.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PangeBlocks: customized construction of pangenome graphs via maximal blocks PangeBlocks:通过最大块定制构建泛基因组图谱
bioRxiv - Bioinformatics Pub Date : 2024-09-17 DOI: 10.1101/2024.09.17.613426
Paola Bonizzoni, Jorge Eduardo Avila Cartes, Simone Ciccolella, Gianluca Della Vedova, Luca Denti
{"title":"PangeBlocks: customized construction of pangenome graphs via maximal blocks","authors":"Paola Bonizzoni, Jorge Eduardo Avila Cartes, Simone Ciccolella, Gianluca Della Vedova, Luca Denti","doi":"10.1101/2024.09.17.613426","DOIUrl":"https://doi.org/10.1101/2024.09.17.613426","url":null,"abstract":"Background: The construction of a pangenome graph is a fundamental task in pangenomics. A natural theoretical question is how to formalize the computational problem of building an optimal pangenome graph, making explicit\u0000the underlying optimization criterion and the set of feasible solutions. Current approaches build a pangenome graph with some heuristics, without assuming some explicit optimization criteria. Thus it is unclear how a specific optimization criterion affects the graph topology and downstream analysis, like read mapping and variant calling.\u0000Methods: In this paper, by leveraging the notion of maximal block in a Multiple Sequence Alignment (MSA), we reframe the pangenome graph construction problem as an exact cover problem on blocks called Minimum Weighted Block Cover (MWBC). Then we propose an Integer Linear Programming (ILP) formulation for the MWBC problem that allows us to study the most natural objective functions for building a graph.\u0000Results: We provide an implementation of the ILP approach for solving the MWBC and we evaluate it on SARS-CoV-2 complete genomes, showing how different objective functions lead to pangenome graphs that have different properties, hinting that the specific downstream task can drive the graph construction phase.\u0000Conclusion: We show that a customized construction of a pangenome graph based on selecting objective functions has a direct impact on the resulting graphs.\u0000In particular, our formalization of the MWBC problem, based on finding an optimal subset of blocks covering an MSA, paves the way to novel practical approaches to graph representations of an MSA where the user can guide the construction.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
easybio: an R Package for Single-Cell Annotation with CellMarker2.0 easybio:使用 CellMarker2.0 进行单细胞注释的 R 软件包
bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.14.609619
Cui Wei
{"title":"easybio: an R Package for Single-Cell Annotation with CellMarker2.0","authors":"Cui Wei","doi":"10.1101/2024.09.14.609619","DOIUrl":"https://doi.org/10.1101/2024.09.14.609619","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) allows researchers to study biological activities at the cellular level, enabling the discovery of new cell types and the analysis of intercellular interactions. However, annotating cell types in scRNA-seq data is a crucial and time-consuming process, with its quality significantly influencing downstream analyses. Accurate identification of potential cell types provides valuable insights for discovering new cell populations or identifying novel markers for known cells, which may be utilized in future research. While various methods exist for single-cell annotation, one of the most common approaches is to use known cell markers. The CellMarker2.0 database, a human-curated repository of cell markers extracted from published articles, is widely used for this purpose. However, it currently offers only a web-based tool for usage, which can be inconvenient when integrating with workflows like Seurat. To address this limitation, we introduce easybio, an R package designed to streamline single-cell annotation using the CellMarker2.0 database in conjunction with Seurat. easybio provides a suite of functions for querying the CellMarker2.0 database locally, offering insights into potential cell types for each cluster. In addition to single-cell annotation, the package also supports various bioinformatics workflows, including RNA-seq analysis, making it a versatile tool for transcriptomic research.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing differential cell composition in single-cell studies using voomCLR 使用 voomCLR 评估单细胞研究中的细胞组成差异
bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.12.612645
Alemu Takele Assefa, Bie Verbist, Koen Van den Berge
{"title":"Assessing differential cell composition in single-cell studies using voomCLR","authors":"Alemu Takele Assefa, Bie Verbist, Koen Van den Berge","doi":"10.1101/2024.09.12.612645","DOIUrl":"https://doi.org/10.1101/2024.09.12.612645","url":null,"abstract":"In single-cell studies, a common question is whether there is a change in cell composition between conditions. While ideally, one needs absolute cell counts (number of cells per volumetric unit in a sample) to address these questions, current experimentation typically obtains cell counts that only carry relative information. It is therefore crucial to account for the compositional nature of cell count data in the statistical analysis. While recently developed methods address compositionality using compositional transformations together with a bias correction, they do not account for the uncertainty involved in estimation of the bias term, nor do they accommodate the mean-variance structure of the counts. Here, we introduce a statistical method, voomCLR, for assessing differences in cell composition between conditions incorporating both uncertainty on the bias term as well as acknowledging the mean-variance structure of the transformed data, by leveraging developments from the differential gene expression literature. We demonstrate the performances of voomCLR, illustrate the benefit of all components and compare the methodology to the state-of-the-art on simulated and real single-cell gene expression datasets.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation 通过基因速度估算优化运输揭示动态基因调控网络
bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.12.612590
Wenjun Zhao, Erica Larschan, Bjorn Sandstede, Ritambhara Singh
{"title":"Optimal transport reveals dynamic gene regulatory networks via gene velocity estimation","authors":"Wenjun Zhao, Erica Larschan, Bjorn Sandstede, Ritambhara Singh","doi":"10.1101/2024.09.12.612590","DOIUrl":"https://doi.org/10.1101/2024.09.12.612590","url":null,"abstract":"Inferring gene regulatory networks from gene expression data is an important and challenging problem in the biology community. We propose OTVelo, a methodology that takes time-stamped single-cell gene expression data as input and predicts gene regulation across two time points. It is known that the rate of change of gene expression, which we will refer to as gene velocity, provides crucial information that enhances such inference; however, this information is not always available due to the limitations in sequencing depth. Our algorithm overcomes this limitation by estimating gene velocities using optimal transport. We then infer gene regulation using time-lagged correlation and Granger causality via regularized linear regression. Instead of providing an aggregated network across all time points, our method uncovers the underlying dynamical mechanism across time points. We validate our algorithm on 13 simulated datasets with both synthetic and curated networks and demonstrate its efficacy on 4 experimental data sets.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"188 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
gsQTL: Associating genetic risk variants with gene sets by exploiting their shared variability gsQTL:利用基因组的共享变异性,将遗传风险变异与基因组联系起来
bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.13.612853
Gerard A Bouland, Niccolo Tesi, Ahmed Mahfouz, Marcel Reinders
{"title":"gsQTL: Associating genetic risk variants with gene sets by exploiting their shared variability","authors":"Gerard A Bouland, Niccolo Tesi, Ahmed Mahfouz, Marcel Reinders","doi":"10.1101/2024.09.13.612853","DOIUrl":"https://doi.org/10.1101/2024.09.13.612853","url":null,"abstract":"To investigate the functional significance of genetic risk loci identified through genome-wide association studies (GWASs), genetic loci are linked to genes based on their capacity to account for variation in gene expression, resulting in expression quantitative trait loci (eQTL). Following this, gene set analyses are commonly used to gain insights into functionality. However, the efficacy of this approach is hampered by small effect sizes and the burden of multiple testing. We propose an alternative approach: instead of examining the cumulative associations of individual genes within a gene set, we consider the collective variation of the entire gene set. We introduce the concept of gene set QTL (gsQTL), and show it to be more adept at identifying links between genetic risk variants and specific gene sets. Notably, gsQTL experiences less susceptibility to inflation or deflation of significant enrichments compared with conventional methods. Furthermore, we demonstrate the broader applicability of shared variability within gene sets. This is evident in scenarios such as the coordinated regulation of genes by a transcription factor or coordinated differential expression.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins DeOri 10.0:经实验鉴定的真核生物复制起源数据库更新版
bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.12.612581
Yu-Hao Zeng, Zhen-Ning Yin, Hao Luo, Feng Gao
{"title":"DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins","authors":"Yu-Hao Zeng, Zhen-Ning Yin, Hao Luo, Feng Gao","doi":"10.1101/2024.09.12.612581","DOIUrl":"https://doi.org/10.1101/2024.09.12.612581","url":null,"abstract":"DNA replication is a complex and crucial biological process in eukaryotes. To facilitate the study of eukaryotic replication events, we present database of eukaryotic DNA replication origins (DeOri), a database that collects scattered data and integrates extensive sequencing data on eukaryotic DNA replication origins. With continuous updates of DeOri, the number of datasets in the new release increased from 10 to 151 and the number of sequences increased from 16,145 to 9,742,396. Besides nucleotide sequences and bed files, corresponding annotation files, such as coding sequences (CDS), mRNA, and other biological elements within replication origins, are also provided. The experimental techniques used for each dataset, as well as other statistical data, are also presented on web page. Differences in experimental methods, cell lines, and sequencing technologies have resulted in distinct replication origins, making it challenging to differentiate between cell-specific and non-specific replication. We combined multiple replication origins at the species level, scored them, and screened them. The screened regions were considered as species-conservative origins. They are integrated and presented as reference replication origins (rORIs), including Homo sapiens, Gallus gallus, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. Additionally, we analyzed the distribution of relevant genomic elements associated with replication origins at the genome level, such as CpG island (CGI), transcription start site (TSS), and G-quadruplex (G4). These analysis results allow users to select the required data based on it. DeOri is available at http://tubic.tju.edu.cn/deori10/.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Predicts Non-Normal Peptide FAIMS Mobility Distributions Directly from Sequence 深度学习直接从序列预测非正态性多肽 FAIMS 迁移率分布
bioRxiv - Bioinformatics Pub Date : 2024-09-16 DOI: 10.1101/2024.09.11.612538
Justin McKetney, Ian J Miller, Alexandre Hutton, Pavel Sinitcyn, Joshua J Coon, Jesse G Meyer
{"title":"Deep Learning Predicts Non-Normal Peptide FAIMS Mobility Distributions Directly from Sequence","authors":"Justin McKetney, Ian J Miller, Alexandre Hutton, Pavel Sinitcyn, Joshua J Coon, Jesse G Meyer","doi":"10.1101/2024.09.11.612538","DOIUrl":"https://doi.org/10.1101/2024.09.11.612538","url":null,"abstract":"Peptide ion mobility adds an extra dimension of separation to mass spectrometry-based proteomics. The ability to accurately predict peptide ion mobility would be useful to expedite assay development and to discriminate true answers in data-base search. There are methods to accurately predict peptide ion mobility through drift tube devices, but methods to predict mobility through high-field asymmetric waveform ion mobility (FAIMS) are underexplored. Here, we successfully model peptide ions' FAIMS mobility using a multi-label multi-output classification scheme to account for non-normal transmission distributions. We trained two models from over 100,000 human peptide precursors: a random forest and a long-term short-term memory (LSTM) neural network. Both models had different strengths, and the ensemble average of model predictions produced higher F2 score than either model alone. Finally, we explore cases where the models make mistakes and demonstrate predictive performance of F2=0.66 (AUROC=0.928) on a new test dataset of nearly 40,000 different E. coli peptide ions. The deep learning model is easily accessible via https://faims.xods.org.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142250367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信