BMC Bioinformatics最新文献

筛选
英文 中文
Cross-validation for training and testing co-occurrence network inference algorithms.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-03-06 DOI: 10.1186/s12859-025-06083-7
Daniel Agyapong, Jeffrey Ryan Propster, Jane Marks, Toby Dylan Hocking
{"title":"Cross-validation for training and testing co-occurrence network inference algorithms.","authors":"Daniel Agyapong, Jeffrey Ryan Propster, Jane Marks, Toby Dylan Hocking","doi":"10.1186/s12859-025-06083-7","DOIUrl":"10.1186/s12859-025-06083-7","url":null,"abstract":"<p><strong>Background: </strong>Microorganisms are found in almost every environment, including soil, water, air and inside other organisms, such as animals and plants. While some microorganisms cause diseases, most of them help in biological processes such as decomposition, fermentation and nutrient cycling. Much research has been conducted on the study of microbial communities in various environments and how their interactions and relationships can provide insight into various diseases. Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria. Existing network inference algorithms employ techniques such as correlation, regularized linear regression, and conditional dependence, which have different hyper-parameters that determine the sparsity of the network. These complex microbial communities form intricate ecological networks that are fundamental to ecosystem functioning and host health. Understanding these networks is crucial for developing targeted interventions in both environmental and clinical settings. The emergence of high-throughput sequencing technologies has generated unprecedented amounts of microbiome data, necessitating robust computational methods for network inference and validation.</p><p><strong>Results: </strong>Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples, both of which have several drawbacks that limit their applicability in real microbiome composition data sets. We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data. Our method demonstrates superior performance in handling compositional data and addressing the challenges of high dimensionality and sparsity inherent in real microbiome datasets. The proposed framework also provides robust estimates of network stability.</p><p><strong>Conclusions: </strong>Our empirical study shows that the proposed cross-validation method is useful for hyper-parameter selection (training) and comparing the quality of inferred networks between different algorithms (testing). This advancement represents a significant step forward in microbiome network analysis, providing researchers with a reliable tool for understanding complex microbial interactions. The method's applicability extends beyond microbiome studies to other fields where network inference from high-dimensional compositional data is crucial, such as gene regulatory networks and ecological food webs. Our framework establishes a new standard for validation in network inference, potentially accelerating discoveries in microbial ecology and human health.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"74"},"PeriodicalIF":2.9,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143566007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust phylogenetic tree-based microbiome association test using repeatedly measured data for composition bias.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-03-06 DOI: 10.1186/s12859-024-06002-2
Kangjin Kim, Sungho Won
{"title":"Robust phylogenetic tree-based microbiome association test using repeatedly measured data for composition bias.","authors":"Kangjin Kim, Sungho Won","doi":"10.1186/s12859-024-06002-2","DOIUrl":"https://doi.org/10.1186/s12859-024-06002-2","url":null,"abstract":"<p><strong>Background: </strong>The effects of microbiota on the host phenotypes can differ substantially depending on their age. Longitudinally measured microbiome data allow for the detection of the age modification effect and are useful for the detection of microorganisms related to the progression of disease whose identification change over time. Moreover, longitudinal analysis facilitates the estimation of the within-subject covariate effect, is robust to the between-subject confounders, and provides better evidence for the causal relationship than cross-sectional studies. However, this method of analysis is limited by compositional bias, and few statistical methods can estimate the effect of microbiota on host diseases with repeatedly measured 16S rRNA gene data. Herein, we propose mTMAT, which is applicable to longitudinal microbiome data and is robust to compositional bias.</p><p><strong>Results: </strong>mTMAT normalized the microbial abundance and utilized the ratio of the pooled abundance for association analysis. mTMAT is based on generalized estimating equations with a robust variance estimator and can be applied to repeatedly measured microbiome data. The robustness of mTMAT against compositional bias is underscored by its utilization of abundance ratios.</p><p><strong>Conclusions: </strong>With extensive simulation studies, we showed that mTMAT is statistically relatively powerful and is robust to compositional bias. mTMAT enables detection of microbial taxa associated with host diseases using repeatedly measured 16S rRNA gene data and can provide deeper insights into bacterial pathology.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"75"},"PeriodicalIF":2.9,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143571970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TRain: T-cell receptor automated immunoinformatics.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-03-06 DOI: 10.1186/s12859-025-06074-8
Austin Seamann, Maia Bennett-Boehm, Ryan Ehrlich, Anna Gil, Liisa Selin, Dario Ghersi
{"title":"TRain: T-cell receptor automated immunoinformatics.","authors":"Austin Seamann, Maia Bennett-Boehm, Ryan Ehrlich, Anna Gil, Liisa Selin, Dario Ghersi","doi":"10.1186/s12859-025-06074-8","DOIUrl":"https://doi.org/10.1186/s12859-025-06074-8","url":null,"abstract":"<p><strong>Background: </strong>The scarcity of available structural data makes characterizing the binding of T-cell receptors (TCRs) to peptide-Major Histocompatibility Complexes (pMHCs) very challenging. The recent surge in sequencing data makes TCRs an ideal target for protein structure modeling. Through these 3D models, researchers can potentially identify key motifs on the TCR's binding regions. Furthermore, computational methods can be employed to pair a TCR structure with a pMHC, leading to predictions of docked TCRpMHC structures. However, going from sequence to predicted 3D TCRpMHC complexes requires a non-trivial amount of steps and specialized immunoinformatics expertise.</p><p><strong>Results: </strong>We developed a Python tool named TRain (T-cell Receptor Automated ImmunoiNformatics) to streamline this process by: (1) converting single-cell sequencing data into full TCR amino acid sequences; (2) efficiently submitting TCR amino acid sequences to existing TCR-specific modeling pipelines; (3) pairing modeled TCR structures with existing crystal structures of pMHC complexes in a non-biased manner before docking; (3) automating the preparation and submission process of TCRs and pMHCs for docking using the RosettaDock tool; and (4) providing scripts to analyze the predicted TCRpMHC interface. We illustrate the basic functionality of TRain with a case study, while further information can be found in a dedicated manual.</p><p><strong>Conclusions: </strong>We introduced an open-source tool that streamlines going from full TCR sequence information to predicted 3D TCRpMHC complexes, using well-established tools. Analyzing these predicted complexes can provide deeper insights into the binding properties of TCRs, and can help shed light on one of the key steps in adaptive immune responses.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"76"},"PeriodicalIF":2.9,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143571972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A distribution-guided Mapper algorithm.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-03-05 DOI: 10.1186/s12859-025-06085-5
Yuyang Tao, Shufei Ge
{"title":"A distribution-guided Mapper algorithm.","authors":"Yuyang Tao, Shufei Ge","doi":"10.1186/s12859-025-06085-5","DOIUrl":"https://doi.org/10.1186/s12859-025-06085-5","url":null,"abstract":"<p><strong>Background: </strong>The Mapper algorithm is an essential tool for exploring the data shape in topological data analysis. With a dataset as an input, the Mapper algorithm outputs a graph representing the topological features of the whole dataset. This graph is often regarded as an approximation of a Reeb graph of a dataset. The classic Mapper algorithm uses fixed interval lengths and overlapping ratios, which might fail to reveal subtle features of a dataset, especially when the underlying structure is complex.</p><p><strong>Results: </strong>In this work, we introduce a distribution-guided Mapper algorithm named D-Mapper, which utilizes the property of the probability model and data intrinsic characteristics to generate density-guided covers and provide enhanced topological features. Moreover, we introduce a metric accounting for both the quality of overlap clustering and extended persistent homology to measure the performance of Mapper-type algorithms. Our numerical experiments indicate that the D-Mapper outperforms the classic Mapper algorithm in various scenarios. We also apply the D-Mapper to a SARS-COV-2 coronavirus RNA sequence dataset to explore the topological structure of different virus variants. The results indicate that the D-Mapper algorithm can reveal both the vertical and horizontal evolutionary processes of the viruses. Our code is available at https://github.com/ShufeiGe/D-Mapper .</p><p><strong>Conclusion: </strong>The D-Mapper algorithm can generate covers from data based on a probability model. This work demonstrates the power of fusing probabilistic models with Mapper algorithms.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"73"},"PeriodicalIF":2.9,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143566004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-approach co-expression analysis framework (D-CAF) enables identification of novel circadian co-regulation from multi-omic timeseries data.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-03-04 DOI: 10.1186/s12859-025-06089-1
Joshua Chuah, Carmalena V Cordi, Juergen Hahn, Jennifer M Hurley
{"title":"Dual-approach co-expression analysis framework (D-CAF) enables identification of novel circadian co-regulation from multi-omic timeseries data.","authors":"Joshua Chuah, Carmalena V Cordi, Juergen Hahn, Jennifer M Hurley","doi":"10.1186/s12859-025-06089-1","DOIUrl":"10.1186/s12859-025-06089-1","url":null,"abstract":"<p><strong>Background: </strong>The circadian clock is a central driver of many biological and behavioral processes, regulating the levels of many genes and proteins, termed clock controlled genes and proteins (CCGs/CCPs), to impart biological timing at the molecular level. While transcriptomic and proteomic data has been analyzed to find potential CCGs and CCPs, multi-omic modeling of circadian data, which has the potential to enhance the understanding of circadian control of biological timing, remains relatively rare due to several methodological hurdles. To address this gap, a dual-approach co-expression analysis framework (D-CAF) was created to perform co-expression analysis that is robust to Gaussian noise perturbations on time-series measurements of both transcripts and proteins.</p><p><strong>Results: </strong>Applying this D-CAF framework to previously gathered transcriptomic and proteomic data from mouse macrophages gathered over circadian time, we identified small, highly significant clusters of oscillating transcripts and proteins in the unweighted similarity matrices and larger, less significant clusters of of oscillating transcripts and proteins using the weighted similarity network. Functional enrichment analysis of these clusters identified novel immunological response pathways that appear to be under circadian control.</p><p><strong>Conclusions: </strong>Overall, our findings suggest that D-CAF is a tool that can be used by the circadian community to integrate multi-omic circadian data to improve our understanding of the mechanisms of circadian regulation of molecular processes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"72"},"PeriodicalIF":2.9,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881278/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143555805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProToDeviseR: an automated protein topology scheme generator.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-03-03 DOI: 10.1186/s12859-025-06088-2
Petar Petrov, Valerio Izzi
{"title":"ProToDeviseR: an automated protein topology scheme generator.","authors":"Petar Petrov, Valerio Izzi","doi":"10.1186/s12859-025-06088-2","DOIUrl":"10.1186/s12859-025-06088-2","url":null,"abstract":"<p><strong>Background: </strong>Amino acid sequence characterization is a fundamental part of virtually any protein analysis, and creating concise and clear protein topology schemes is of high importance in proteomics studies. Although numerous databases and prediction servers exist, it is challenging to incorporate data from various, and sometimes contending, resources into a publication-ready scheme.</p><p><strong>Results: </strong>Here, we present the Protein Topology Deviser R package (ProToDeviseR) for the automatic generation of protein topology schemes from database accession numbers, raw results from multiple prediction servers, or a manually prepared table of features. The application offers a graphical user interface, implemented in R Shiny, hosting an enhanced version of Pfam's domains generator for the rendering of visually appealing schemes.</p><p><strong>Conclusions: </strong>ProToDeviseR can easily and quickly generate topology schemes by interrogating UniProt or NCBI GenPept databases and elegantly combine features from various resources.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"71"},"PeriodicalIF":2.9,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143540051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PyPropel: a Python-based tool for efficiently processing and characterising protein data.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-03-01 DOI: 10.1186/s12859-025-06079-3
Jianfeng Sun, Jinlong Ru, Adam P Cribbs, Dapeng Xiong
{"title":"PyPropel: a Python-based tool for efficiently processing and characterising protein data.","authors":"Jianfeng Sun, Jinlong Ru, Adam P Cribbs, Dapeng Xiong","doi":"10.1186/s12859-025-06079-3","DOIUrl":"10.1186/s12859-025-06079-3","url":null,"abstract":"<p><strong>Background: </strong>The volume of protein sequence data has grown exponentially in recent years, driven by advancements in metagenomics. Despite this, a substantial proportion of these sequences remain poorly annotated, underscoring the need for robust bioinformatics tools to facilitate efficient characterisation and annotation for functional studies.</p><p><strong>Results: </strong>We present PyPropel, a Python-based computational tool developed to streamline the large-scale analysis of protein data, with a particular focus on applications in machine learning. PyPropel integrates sequence and structural data pre-processing, feature generation, and post-processing for model performance evaluation and visualisation, offering a comprehensive solution for handling complex protein datasets.</p><p><strong>Conclusion: </strong>PyPropel provides added value over existing tools by offering a unified workflow that encompasses the full spectrum of protein research, from raw data pre-processing to functional annotation and model performance analysis, thereby supporting efficient protein function studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"70"},"PeriodicalIF":2.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11871610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143536374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-27 DOI: 10.1186/s12859-025-06078-4
Matthew D Koslovsky
{"title":"Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model.","authors":"Matthew D Koslovsky","doi":"10.1186/s12859-025-06078-4","DOIUrl":"10.1186/s12859-025-06078-4","url":null,"abstract":"<p><p>The human microbiome is the collection of microorganisms living on and inside of our bodies. A major aim of microbiome research is understanding the role microbial communities play in human health with the goal of designing personalized interventions that modulate the microbiome to treat or prevent disease. Microbiome data are challenging to analyze due to their high-dimensionality, overdispersion, and zero-inflation. Analysis is further complicated by the steps taken to collect and process microbiome samples. For example, sequencing instruments have a fixed capacity for the total number of reads delivered. It is therefore essential to treat microbial samples as compositional. Another complicating factor of modeling microbiome data is that taxa counts are subject to measurement error introduced at various stages of the measurement protocol. Advances in sequencing technology and preprocessing pipelines coupled with our growing knowledge of the human microbiome have reduced, but not eliminated, measurement error. Ignoring measurement error during analysis, though common in practice, can then lead to biased inference and curb reproducibility. We propose a Dirichlet-multinomial modeling framework for microbiome data with excess zeros and potential taxonomic misclassification. We demonstrate how accommodating taxonomic misclassification improves estimation performance and investigate differences in gut microbial composition between healthy and obese children.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"69"},"PeriodicalIF":2.9,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11869466/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143522470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Assessment of Protein Large Language Models for Enzyme Commission Number Prediction.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-27 DOI: 10.1186/s12859-025-06081-9
João Capela, Maria Zimmermann-Kogadeeva, Aalt D J van Dijk, Dick de Ridder, Oscar Dias, Miguel Rocha
{"title":"Comparative Assessment of Protein Large Language Models for Enzyme Commission Number Prediction.","authors":"João Capela, Maria Zimmermann-Kogadeeva, Aalt D J van Dijk, Dick de Ridder, Oscar Dias, Miguel Rocha","doi":"10.1186/s12859-025-06081-9","DOIUrl":"10.1186/s12859-025-06081-9","url":null,"abstract":"<p><strong>Background: </strong>Protein large language models (LLM) have been used to extract representations of enzyme sequences to predict their function, which is encoded by enzyme commission (EC) numbers. However, a comprehensive comparison of different LLMs for this task is still lacking, leaving questions about their relative performance. Moreover, protein sequence alignments (e.g. BLASTp or DIAMOND) are often combined with machine learning models to assign EC numbers from homologous enzymes, thus compensating for the shortcomings of these models' predictions. In this context, LLMs and sequence alignment methods have not been extensively compared as individual predictors, raising unaddressed questions about LLMs' performance and limitations relative to the alignment methods. In this study, we set out to assess the performance of ESM2, ESM1b, and ProtBERT language models in their ability to predict EC numbers, comparing them with BLASTp, against each other and against models that rely on one-hot encodings of amino acid sequences.</p><p><strong>Results: </strong>Our findings reveal that combining these LLMs with fully connected neural networks surpasses the performance of deep learning models that rely on one-hot encodings. Moreover, although BLASTp provided marginally better results overall, DL models provide results that complement BLASTp's, revealing that LLMs better predict certain EC numbers while BLASTp excels in predicting others. The ESM2 stood out as the best model among the LLMs tested, providing more accurate predictions on difficult annotation tasks and for enzymes without homologs.</p><p><strong>Conclusions: </strong>Crucially, this study demonstrates that LLMs still have to be improved to become the gold standard tool over BLASTp in mainstream enzyme annotation routines. On the other hand, LLMs can provide good predictions for more difficult-to-annotate enzymes, particularly when the identity between the query sequence and the reference database falls below 25%. Our results reinforce the claim that BLASTp and LLM models complement each other and can be more effective when used together.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"68"},"PeriodicalIF":2.9,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11866580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143522475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining single-cell ATAC and RNA sequencing for supervised cell annotation.
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-26 DOI: 10.1186/s12859-025-06084-6
Jaidip Gill, Abhijit Dasgupta, Brychan Manry, Natasha Markuzon
{"title":"Combining single-cell ATAC and RNA sequencing for supervised cell annotation.","authors":"Jaidip Gill, Abhijit Dasgupta, Brychan Manry, Natasha Markuzon","doi":"10.1186/s12859-025-06084-6","DOIUrl":"10.1186/s12859-025-06084-6","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell analysis offers insights into cellular heterogeneity and individual cell function. Cell type annotation is the first and critical step for performing such an analysis. Current methods mostly utilize single-cell RNA sequencing data. Several studies demonstrated improved unsupervised annotation when combining RNA with single-cell ATAC sequencing, but improvements in supervised methods have not been explored.</p><p><strong>Results: </strong>Single-cell 10x genomics multiome datasets containing paired ATAC and RNA from human peripheral blood mononuclear cells (PBMC) and neuronal cells with Alzheimer's Disease were used for supervised annotation. Using linear and nonlinear dimensionality reduction methods and random forest, support vector machine and logistic regression classification models, we demonstrate the improvement in supervised annotation and prediction confidence in PBMC data when using a combination of RNA seq and ATAC-seq data. No such improvement was observed when annotating neuronal cells. Specifically, F1 scores were improved when using scVI embeddings to annotate PBMC sub-types. CD4 T effector memory cells showed the largest improvement in F1 score.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"67"},"PeriodicalIF":2.9,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11863512/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143514600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信