BMC Bioinformatics最新文献

筛选
英文 中文
Rprot-Vec: a deep learning approach for fast protein structure similarity calculation. Rprot-Vec:一种快速蛋白质结构相似度计算的深度学习方法。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-10 DOI: 10.1186/s12859-025-06213-1
Yichuan Zhang, Wen Zhang
{"title":"Rprot-Vec: a deep learning approach for fast protein structure similarity calculation.","authors":"Yichuan Zhang, Wen Zhang","doi":"10.1186/s12859-025-06213-1","DOIUrl":"10.1186/s12859-025-06213-1","url":null,"abstract":"<p><strong>Background: </strong>Predicting protein structural similarity and detecting homologous sequences remain fundamental and challenging tasks in computational biology. Accurate identification of structural homologs enables function inference for newly discovered or unannotated proteins. Traditional approaches often require full 3D structural data, which is unavailable for most proteins. Thus, there is a need for sequence-based methods capable of inferring structural similarity efficiently and at scale.</p><p><strong>Result: </strong>We present Rprot-Vec (Rapid Protein Vector), a deep learning model that predicts protein structural similarity and performs homology detection using only primary sequence data. The model integrates bidirectional GRU and multi-scale CNN layers with ProtT5-based encoding, enabling accurate and fast similarity estimation. Rprot-Vec achieves a 65.3% accurate similarity prediction rate in the homologous region (TM-score > 0.8), with an average prediction error of 0.0561 across all TM-score intervals. Despite having only 41% of the parameters of TM-vec, Rprot-Vec outperforms both public and locally trained TM-vec baselines in all tested settings. Additionally, we constructed and released three curated training datasets (CATH_TM_score_S/M/L), supporting further research in this area.</p><p><strong>Conclusion: </strong>Rprot-Vec offers a fast and lightweight solution for sequence-based structural similarity prediction. It can be applied in protein homology detection, structure-function inference, drug repurposing, and other downstream biological tasks. Its open-source availability and released datasets facilitate broader adoption and further development by the research community.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"171"},"PeriodicalIF":2.9,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12243341/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144607286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new alignment-free method: K-mer Subsequence Natural Vector (K-mer SNV) for classification of fungi. 基于K-mer子序列自然向量(K-mer SNV)的真菌分类新方法
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-09 DOI: 10.1186/s12859-025-06152-x
Lily He, Mochao Huang, Gulinisha Yiming, Yi Zhu, Ruowei Liu, Jinghan Chen, Stephen S T Yau
{"title":"A new alignment-free method: K-mer Subsequence Natural Vector (K-mer SNV) for classification of fungi.","authors":"Lily He, Mochao Huang, Gulinisha Yiming, Yi Zhu, Ruowei Liu, Jinghan Chen, Stephen S T Yau","doi":"10.1186/s12859-025-06152-x","DOIUrl":"10.1186/s12859-025-06152-x","url":null,"abstract":"<p><p>As eukaryotic organisms, fungi play a pivotal role within ecosystems and exert profound influences on agriculture, the pharmaceutical industry, and human health. The classification of fungi in databases has emerged as a crucial and complex issue in the field of biology. In this study, by leveraging the local distribution of k-mer in nucleotide sequences, we introduce a novel alignment-free method, denoted as k-mer SNV, to address this challenge. On a large fungi dataset including 120,140 sequences, our innovative approach has achieved remarkable success in predicting the taxonomic labels of fungi across six hierarchical taxonomic levels: phylum (99.52%), class (98.17%), order (97.20%), family (96.11%), genus (94.14%), and species (93.32%). The approach is also evaluated on the common Taxxi benchmark dataset. Based on these results, it has been convincingly demonstrated that the k-mer SNV method exhibits outstanding performance in processing large-scale fungal sequence data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"170"},"PeriodicalIF":2.9,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12239250/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144599254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIVGenoPipe: a nextflow pipeline for the detection of HIV-1 drug resistance using a real-time sample-specific reference sequence. HIVGenoPipe: nextflow使用实时样本特异性参考序列检测HIV-1耐药性的管道。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-07 DOI: 10.1186/s12859-025-06201-5
Thoai Dotrang, Brad T Sherman, Lisheng Dai, Muhammad Ayub Khan, Helene C Highbarger, Whitney Bruchey, Sylvain Laverdure, Michael W Baseler, Tomozumi Imamichi, Robin L Dewar, Weizhong Chang
{"title":"HIVGenoPipe: a nextflow pipeline for the detection of HIV-1 drug resistance using a real-time sample-specific reference sequence.","authors":"Thoai Dotrang, Brad T Sherman, Lisheng Dai, Muhammad Ayub Khan, Helene C Highbarger, Whitney Bruchey, Sylvain Laverdure, Michael W Baseler, Tomozumi Imamichi, Robin L Dewar, Weizhong Chang","doi":"10.1186/s12859-025-06201-5","DOIUrl":"10.1186/s12859-025-06201-5","url":null,"abstract":"<p><strong>Background: </strong>The emergence of HIV drug resistance is a challenge in controlling the acquired immunodeficiency syndrome (AIDS) pandemic caused by human immunodeficiency virus-1 (HIV-1) infection. Detection of drug resistance variants at minor frequencies can help to formulate successful antiretroviral therapy (ART) regimens for people living with HIV (PLWH) and reduce the emergence of drug resistance. Therefore, a pipeline which can accurately produce consensus nucleotide sequences and identify drug resistance mutations (DRMs) at defined frequency thresholds will be helpful in the treatment of PLWH, analysis of virus evolution, and the control of the pandemic.</p><p><strong>Results: </strong>We have developed a pipeline, HIVGenoPipe, to determine HIV drug resistance variants within the gag-pol region above user-defined frequencies for HIV-1 samples sequenced using Illumina technology. The pipeline has been validated by comparing its results with the results generated by a widely used pipeline, HyDRA, which is limited to the pol region, and with the results generated by Sanger sequencing technology using the same set of 30 samples. The variant frequency used to generate ambiguous consensus sequences in HIVGenoPipe is more accurate than other pipelines because a sample-specific reference, which is generated in real-time with a novel hybrid strategy of de novo and reference-based assembly, is used for the frequency calculation, leading to more accurate drug resistance calls for use by clinicians. In addition, since Nextflow is used as the pipeline platform, HIVGenoPipe inherently has great portability, scalability and reproducibility; and the components can be updated or replaced independently if required.</p><p><strong>Conclusions: </strong>We developed HIVGenoPipe for the detection of HIV-1 drug resistance. It constructs more accurate gag-pol consensus sequences, leading to improved detection of DRMs. HIVGenoPipe is open source and freely available under the MIT license at https://github.com/LHRI-Bioinformatics/HIVGenoPipe . The current release (v1.0.1) is archived and available at https://doi.org/ https://doi.org/10.5281/zenodo.15528502 .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"168"},"PeriodicalIF":2.9,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144582899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiscMycoVir: a user-friendly platform for discovering mycoviruses in fungal transcriptomes. DiscMycoVir:一个用户友好的发现真菌转录组中的分枝病毒的平台。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-07 DOI: 10.1186/s12859-025-06196-z
Agorakis Bompotas, Nikitas-Rigas Kalogeropoulos, Maria Giachali, Ioly Kotta-Loizou, Christos Makris
{"title":"DiscMycoVir: a user-friendly platform for discovering mycoviruses in fungal transcriptomes.","authors":"Agorakis Bompotas, Nikitas-Rigas Kalogeropoulos, Maria Giachali, Ioly Kotta-Loizou, Christos Makris","doi":"10.1186/s12859-025-06196-z","DOIUrl":"10.1186/s12859-025-06196-z","url":null,"abstract":"<p><strong>Purpose: </strong>The article presents DiscMycoVir, an elegant and user-friendly platform for discovering mycoviruses in fungal transcriptomes. DiscMycoVir is a pipeline of established tools for next-generation sequencing analysis and database searching, incorporated in an interface that facilitates accessibility even for users that have no programming skills and expertise. A comprehensive and detailed result report enhances user experience. DiscMycoVir can be accessed online for reviewing purposes at: https://discmycovir.imslab.gr:8000 and the source code is located at https://github.com/abompotas/DiscMycoVir . We recommend using the GitHub repository, as the online platform may lack the necessary resources to ensure uninterrupted service especially on large files.</p><p><strong>Methods-results: </strong>We employed state-of-the-art technologies in the design and implementation phase of the platform. We present the application of the platform in screening RNA-seq data from the yeast Candida auris for mycoviruses, demonstrating its efficiency and simplicity in use.</p><p><strong>Conclusions: </strong>DiscMycoVir serves as a user-friendly platform for identifying mycoviruses in RNA-seq data. Our tool was successfully implemented to discover mycoviruses in a C. auris isolate and could be adapted to detect viruses in transcriptomes from other organisms as well.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"169"},"PeriodicalIF":2.9,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235966/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144582972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-task genomic prediction using gated residual variable selection neural networks. 基于门控残差变量选择神经网络的多任务基因组预测。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-07 DOI: 10.1186/s12859-025-06188-z
Yuhua Fan, Patrik Waldmann
{"title":"Multi-task genomic prediction using gated residual variable selection neural networks.","authors":"Yuhua Fan, Patrik Waldmann","doi":"10.1186/s12859-025-06188-z","DOIUrl":"10.1186/s12859-025-06188-z","url":null,"abstract":"<p><strong>Background: </strong>The recent development of high-throughput sequencing techniques provide massive data that can be used in genome-wide prediction (GWP). Although GWP is effective on its own, the incorporation of traditional polygenic pedigree information into GWP has been shown to further improve prediction accuracy. However, most of the methods developed in this field require that individuals with genomic information can be connected to the polygenic pedigree within a standard linear mixed model framework that involves calculation of computationally demanding matrix inverses of the combined pedigrees. The extension of this integrated approach to more flexible machine learning methods has been slow.</p><p><strong>Methods: </strong>This study aims to enhance genomic prediction by implementing gated residual variable selection neural networks (GRVSNN) for multi-task genomic prediction. By integrating low-rank information from pedigree-based relationship matrices with genomic markers, we seek to improve predictive accuracy and interpretability compared to conventional regression and deep learning (DL) models. The prediction properties of the GRVSNN model are evaluated on several real-world datasets, including loblolly pine, mouse and pig.</p><p><strong>Results: </strong>The experimental results demonstrate that the GRVSNN model outperforms traditional tabular genomic prediction models, including Bayesian regression methods and LassoNet. Using genomic and pedigree information, GRVSNN achieves a lower mean squared error (MSE), and higher Pearson (r) and distance (dCor) correlation between predicted and true phenotypic values in the test data. Moreover, GRVSNN selects fewer genetic markers and pedigree loadings which improves interpretability.</p><p><strong>Conclusion: </strong>The suggested GRVSNN framework provides a novel and computationally effective approach to improve genomic prediction accuracy by integrating information from traditional pedigrees with genomic data. The model's ability to conduct multi-task predictions underscores its potential to enhance selection processes in agricultural species and predict multiple diseases in precision medicine.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"167"},"PeriodicalIF":2.9,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235769/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144582900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Replacing normalizations with interval assumptions enhances differential expression and differential abundance analyses. 用区间假设代替归一化可以增强差分表达和差分丰度分析。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-01 DOI: 10.1186/s12859-025-06177-2
Kyle C McGovern, Justin D Silverman
{"title":"Replacing normalizations with interval assumptions enhances differential expression and differential abundance analyses.","authors":"Kyle C McGovern, Justin D Silverman","doi":"10.1186/s12859-025-06177-2","DOIUrl":"10.1186/s12859-025-06177-2","url":null,"abstract":"<p><strong>Background: </strong>Methods for differential expression and differential abundance analysis often rely on normalization to address sample-to-sample variation in sequencing depth. However, normalizations imply strict, unrealistic assumptions about the unmeasured scale of biological systems (e.g., microbial load or total cellular transcription). Even slight errors in these assumptions introduce bias, leading to elevated false positive and negative rates.</p><p><strong>Results: </strong>We introduce interval assumptions as a generalization of normalizations. Unlike normalizations, our interval methods allow researchers to account for potential errors in assumptions about the system scale. Interval assumptions are also customizable and allow researchers to express more biologically plausible assumptions about scale. Interval assumptions even generalize Quantitative Microbiome Profiling (QMP), allowing researchers to account for errors in flow cytometry-based measurements of total cellular concentration. We develop a novel hypothesis testing framework that allows us to integrate interval assumptions into existing tools. We develop a modified version of the popular ALDEx2 method using interval assumptions rather than normalizations. Through real and simulated data analyses, we find that interval assumptions can dramatically decrease false positive rates (i.e., from 45% to 5%) while retaining or increasing statistical power. We also study interval assumptions under misspecification and show they still improve on normalizations.</p><p><strong>Conclusions: </strong>Interval assumptions enhance the rigor and reproducibility of differential expression and differential abundance analyses. Our results add to a growing body of literature arguing that normalizations should be replaced with alternative methods that allow researchers to account for scale uncertainty. However, compared to recent alternatives like scale models and sensitivity analyses, interval assumptions are easier to use, are more robust to misspecification, and have stronger and more interpretable inferential guarantees.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"164"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12218962/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated sparse feature selection in high-dimensional proteomics data via 1-bit compressed sensing and K-Medoids clustering. 基于1位压缩感知和K-Medoids聚类的高维蛋白质组学数据自动稀疏特征选择。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-01 DOI: 10.1186/s12859-025-06193-2
FuDong Wen, Yue Su, Dan Liu, YuPeng Wang, MeiNa Liu
{"title":"Automated sparse feature selection in high-dimensional proteomics data via 1-bit compressed sensing and K-Medoids clustering.","authors":"FuDong Wen, Yue Su, Dan Liu, YuPeng Wang, MeiNa Liu","doi":"10.1186/s12859-025-06193-2","DOIUrl":"10.1186/s12859-025-06193-2","url":null,"abstract":"<p><strong>Background: </strong>High-dimensional proteomics data present significant challenges in biomarker discovery due to technical noise, feature redundancy, and multicollinearity. Current feature selection methods, including filter, wrapper, and embedded approaches, struggle with stability, sparsity, and computational efficiency. To address these limitations, we propose Soft-Thresholded Compressed Sensing (ST-CS), a hybrid framework integrating 1-bit compressed sensing with K-Medoids clustering. Unlike conventional methods relying on manual thresholds, ST-CS automates feature selection by dynamically partitioning coefficient magnitudes into discriminative biomarkers and noise.</p><p><strong>Results: </strong>Evaluations on simulated and real-world proteomic datasets demonstrated ST-CS's superiority in feature selection capability and classification performance. In simulations, ST-CS achieved feature selection robustness with balanced sensitivity (> 80%) and specificity (> 99.8%), reducing false discovery rates (FDR) by 20-50% compared to Hard-Thresholded Compressed Sensing (HT-CS). Additionally, it attained superior F1 scores and Matthews Correlation Coefficients (MCC), outperforming HT-CS, LASSO, and SPLSDA in identifying true biomarkers while suppressing noise. For classification performance, ST-CS surpassed all methods in the area under the receiver operating characteristic curve (AUC) across varying noise levels while maintaining sparsity. Applied to Clinical Proteomic Tumor Analysis Consortium (CPTAC) datasets, ST-CS matched HT-CS's classification accuracy (AUC = 97.47% for intrahepatic cholangiocarcinoma) but with 57% fewer selected features (37 vs. 86), demonstrating its dual strength in precision biomarker discovery and predictive accuracy. For glioblastoma data, ST-CS achieved higher AUC (72.71%) than HT-CS (72.15%), LASSO (67.80%), and SPLSDA (71.38%) while retaining a parsimonious feature set (30 vs. 58 features for HT-CS). In ovarian serous cystadenocarcinoma, ST-CS further demonstrated its adaptability, attaining superior AUC (75.86%) over HT-CS (75.61%), LASSO (61.00%), and SPLSDA (70.75%) with only 24 ± 5 selected biomarkers. These results highlight ST-CS's ability to rigorously automate feature selection while balancing classification efficacy, interpretability, and scalability for translational proteomics.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"165"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220089/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coconut: covariate-assisted composite null hypothesis testing with applications to replicability analysis of high-throughput experimental data. 椰子:协变量辅助复合零假设检验与应用于高通量实验数据的可复制性分析。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-01 DOI: 10.1186/s12859-025-06163-8
Yan Li, Yanmei Li, Han Ma, Zitong Yue, Xin Zhang
{"title":"Coconut: covariate-assisted composite null hypothesis testing with applications to replicability analysis of high-throughput experimental data.","authors":"Yan Li, Yanmei Li, Han Ma, Zitong Yue, Xin Zhang","doi":"10.1186/s12859-025-06163-8","DOIUrl":"10.1186/s12859-025-06163-8","url":null,"abstract":"<p><strong>Background: </strong>Multiple testing of composite null hypotheses is critical for identifying simultaneous signals across studies. While it is common to incorporate external information in simple null hypotheses, exploiting such auxiliary covariates to provide prior structural relationships among composite null hypotheses and boost the statistical power remains challenging.</p><p><strong>Results: </strong>We propose a robust and powerful covariate-assisted composite null hypothesis testing (CoCoNuT) procedure based on a Bayesian framework to identify replicable signals in two studies while asymptotically controlling the false discovery rate. CoCoNuT innovatively adopts a three-dimensional mixture model to consider two primary studies and an integrative auxiliary covariate jointly. While accounting for heterogeneity across studies, the local false discovery rate optimally captures cross-study and cross-feature information, providing improved rankings of feature importance.</p><p><strong>Conclusions: </strong>Theoretical and empirical evaluations confirm the validity and efficiency of CoCoNuT. Extensive simulations demonstrate that CoCoNuT outperforms conventional methods that do not exploit auxiliary covariates while controlling the FDR. We apply CoCoNuT to schizophrenia genome-wide association studies, illustrating its higher power in identifying replicable genetic variants with the assistance of relevant auxiliary studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"163"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12210505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensemble machine learning-based pre-trained annotation approach for scRNA-seq data using gradient boosting with genetic optimizer. 基于集成机器学习的scRNA-seq数据预训练标注方法。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-01 DOI: 10.1186/s12859-025-06151-y
Osama Elnahas, Waleed M Ead, Yushan Qiu, Jian Lu
{"title":"Ensemble machine learning-based pre-trained annotation approach for scRNA-seq data using gradient boosting with genetic optimizer.","authors":"Osama Elnahas, Waleed M Ead, Yushan Qiu, Jian Lu","doi":"10.1186/s12859-025-06151-y","DOIUrl":"10.1186/s12859-025-06151-y","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of gene expression by allowing researchers to analyze the transcriptomes of individual cells. This technology provides unprecedented insights into cellular heterogeneity, cellular states, and biological processes at a single-cell resolution. The problem of single-cell RNA annotation involves assigning meaningful labels or annotations to each cell in the scRNA-seq dataset, indicating its corresponding cell type, state, or biological function. Current annotation methods are often challenged by limited source data quality, sensitivity to batch effects, and poor adaptability to uncharacterized cell types. We propose an ensemble machine learning-based pre-trained annotation framework that integrates gradient boosting and genetic optimization for robust feature selection. The proposed method uses ensemble learning to enhance annotation accuracy under data scarcity, addressing limitations in existing supervised methods by leveraging a combination of multiple annotated datasets and feature alignment strategies. Through comprehensive benchmarking across varied biological contexts, we demonstrate that the proposed approach significantly improves annotation accuracy and generalization across different scRNA-seq platforms, especially under conditions of reduced reference data. Results confirm its versatility and resilience in accurately annotating cell types, even under reduced data conditions, establishing it as a powerful tool for cell-type classification in scRNA-seq data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"166"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2dSpAn-Auto: an automated tool for analysis of two-dimensional dendritic spine images. 2dSpAn-Auto:用于分析二维树突脊柱图像的自动工具。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-01 DOI: 10.1186/s12859-025-06179-0
Shauvik Paul, Rahul Pramanick, Nirmal Das, Ewa Baczynska, Zeinab Bedrood, Tapabrata Chakraborti, Subhadip Basu, Jakub Wlodarczyk
{"title":"2dSpAn-Auto: an automated tool for analysis of two-dimensional dendritic spine images.","authors":"Shauvik Paul, Rahul Pramanick, Nirmal Das, Ewa Baczynska, Zeinab Bedrood, Tapabrata Chakraborti, Subhadip Basu, Jakub Wlodarczyk","doi":"10.1186/s12859-025-06179-0","DOIUrl":"10.1186/s12859-025-06179-0","url":null,"abstract":"<p><strong>Background: </strong>Quantitative analysis of dendritic spine morphology and density is crucial for understanding synaptic plasticity and its role in neuropsychiatric disorders, including Alzheimer's disease and schizophrenia. While both 3D and 2D approaches exist for spine analysis, 2D methods offer advantages in computational efficiency, rapid assessment, and more reasonable to use in case of limited z-resolution images acquired through confocal and previous generation super-resolution microscopy. In this work, we developed a modality-agnostic spine segmentation approach based on 2D skeletonization. Specifically, we implemented two analytical workflows, viz., 2dSpAn-Auto.b, that implements binary skeletonization alogrithm and 2dSpAn-Auto.f, that generates fuzzy skeletons directly from gray-scale images. Our developed method enables fast and automatic segmentation and morphological analysis of 2D maximum intensity projection images of dendritic spines. Expert users can fine-tune parameters when needed, though default settings prove robust across various imaging conditions. The developed 2dSpAn-Auto software tool is most suitable for automated batch processing while maintaining user flexibility through an intuitive graphical interface.</p><p><strong>Results: </strong>2dSpAn-Auto is validated across multiple imaging modalities (in vitro, ex vivo, and in vivo) for automatic assessment of dendritic spine parameters including spine density, morphometry (spine area, spine length, head width, minimum and average neck width), and total dendrite length. Validation studies demonstrate high accuracy and reproducibility across varying imaging protocols and experimental conditions. Multiple images from similar experimental setups can be processed seamlessly in the batch mode.</p><p><strong>Conclusions: </strong>2dSpAn-Auto provides a robust, interpretable solution for fast analysis of dendritic spines, a critical need in neurological research and clinical assessment. The combination of automated processing with optional expert oversight makes it suitable for both routine analysis and specialized research applications. The software, complete with the source code and comprehensive documentation, is available to the research community for non-commercial use under GNU General Public License (GPL) v3.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"162"},"PeriodicalIF":2.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12211165/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144538039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信