BMC Bioinformatics最新文献

筛选
英文 中文
PuMA: PubMed gene/cell type-relation Atlas. PubMed基因/细胞类型关系图谱。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-29 DOI: 10.1186/s12859-025-06236-8
Lucas Bickmann, Sarah Sandmann, Carolin Walter, Julian Varghese
{"title":"PuMA: PubMed gene/cell type-relation Atlas.","authors":"Lucas Bickmann, Sarah Sandmann, Carolin Walter, Julian Varghese","doi":"10.1186/s12859-025-06236-8","DOIUrl":"10.1186/s12859-025-06236-8","url":null,"abstract":"<p><strong>Background: </strong>Rapid extraction and visualization of cell-specific gene expression is important for automatic cell type annotation, e.g. in single cell analysis. There is an emerging field in which tools such as curated databases or machine learning methods are used to support cell type annotation. However, complementing approaches to efficiently incorporate the latest knowledge of free-text articles from literature databases, such as PubMed, are understudied.</p><p><strong>Results: </strong>This work introduces the PubMed Gene/Cell type-Relation Atlas (PuMA) which provides a local, easy-to-use web-interface to facilitate literature-driven cell type annotation. It utilizes a pretrained machine learning based named entity recognition model in order to extract gene and cell type concepts from PubMed, links biomedical ontologies, and suggests gene to cell type relations based on a ranking score. It includes a search tool for genes and cell types, additionally providing an interactive graph visualization for exploring cross-relations. Each result is fully traceable by linking the relevant PubMed articles.</p><p><strong>Conclusions: </strong>This work enables researchers to analyse and automatize cell type annotation based on PubMed articles. It complements manual curated marker gene databases and enables interactive visualizations. The evaluation shows that PuMA is competitive against an extensive manual curated database across three gold standard datasets and two species-mouse and human. The software framework is freely available and enables regular article imports for incremental knowledge updates.GitLab: https://imigitlab.uni-muenster.de/published/PuMA/.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"201"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phylo-rs: an extensible phylogenetic analysis library in rust. Phylo-rs:一个可扩展的rust系统发育分析库。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-29 DOI: 10.1186/s12859-025-06234-w
Sriram Vijendran, Tavis Anderson, Alexey Markin, Oliver Eulenstein
{"title":"Phylo-rs: an extensible phylogenetic analysis library in rust.","authors":"Sriram Vijendran, Tavis Anderson, Alexey Markin, Oliver Eulenstein","doi":"10.1186/s12859-025-06234-w","DOIUrl":"10.1186/s12859-025-06234-w","url":null,"abstract":"<p><strong>Background: </strong>The advent of next-generation and long-read sequencing technologies has provided an ever-increasing wealth of phylogenetic data that require specially designed algorithms to decipher the underlying evolutionary relationships. As large-scale data become increasingly accessible, there is a concomitant need for efficient computational libraries that facilitate the development and dissemination of specialized algorithms for phylogenetic comparative biology.</p><p><strong>Results: </strong>We introduce Phylo-rs: a fast, extensible, general-purpose library for phylogenetic analysis and inference written in the Rust programming language. Phylo-rs leverages a combination of speed, memory-safety, and native WebAssembly support offered by Rust to provide a robust set of memory-efficient data structures and elementary phylogenetic algorithms. Phylo-rs focuses on the efficient and convenient deployment of software aimed at large-scale phylogenetic analysis and inference. Scalability analysis against popular libraries shows that Phylo-rs performs comparably or better on key algorithms. We utilized it to assess the phylogenetic diversity of influenza A virus in swine, identifying virus groups that are undergoing evolutionary expansion that could be targeted for control through multivalent vaccines. Additionally, we used Phylo-rs to enhance phylogenetic inference by visualizing tree space from Markov chain Monte Carlo (MCMC) Bayesian analysis, efficiently computing approximately five billion tree pair distances to evaluate convergence and select MCMC runs for genomic epidemiology.</p><p><strong>Conclusion: </strong>Phylo-rs enables the design and implementation of cutting-edge software for phylogenetic analysis, thereby facilitating the application and dissemination of theoretical advancements in biology. Phylo-rs is available under an open-source license on GitHub at https://github.com/sriram98v/phylo-rs , with documentation available at https://docs.rs/phylo/latest/phylo/ .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"197"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144740999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug-target interaction prediction based on graph convolutional autoencoder with dynamic weighting residual GCN. 基于动态加权残差GCN的图卷积自编码器药物-靶标相互作用预测。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-29 DOI: 10.1186/s12859-025-06198-x
Ming Zeng, Min Wang, Fuqiang Xie, Zhiwei Ji
{"title":"Drug-target interaction prediction based on graph convolutional autoencoder with dynamic weighting residual GCN.","authors":"Ming Zeng, Min Wang, Fuqiang Xie, Zhiwei Ji","doi":"10.1186/s12859-025-06198-x","DOIUrl":"10.1186/s12859-025-06198-x","url":null,"abstract":"<p><strong>Background: </strong>The exploration of drug-target interactions (DTIs) is a critical step in drug discovery and drug repurposing. Recently, network-based methods have emerged as a prominent research area for predicting DTIs. These methods excel by extracting both topological and feature information from DTIs networks, thereby achieving superior DTIs prediction performance. However, the majority of existing GCN-based methods utilize shallow graph neural networks, which are incapable of extracting higher-level semantic information. Additionally, the current training of models lacks an effective guiding mechanism, leading to the insufficient improvement of network's representation capabilities.</p><p><strong>Results: </strong>In this paper, we propose a graph convolutional autoencoder model, named DDGAE, for DTIs prediction. We develop a DWR-GCN module, which incorporates dynamic weighting graph convolution with residual connection, to improve the representation capability for DTI heterogeneous networks. Further, to improve the learning efficiency of the model, we devise a dual self-supervised joint training mechanism. Specifically, this mechanism integrates DWR-GCN and a graph convolutional autoencoder into a cohesive system, enhancing both the learning performance and stability of DDGAE.</p><p><strong>Conclusion: </strong>Experimental results show that DDGAE significantly outperforms several SOTA models in DTIs prediction, achieving optimal performance and the reliability of our method is verified by case study.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"200"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309189/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate human genome analysis with element avidity sequencing. 精确的人类基因组分析与元素贪婪测序。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-25 DOI: 10.1186/s12859-025-06191-4
Andrew Carroll, Alexey Kolesnikov, Daniel E Cook, Lucas Brambrink, Kelly N Wiseman, Sophie M Billings, Semyon Kruglyak, Bryan R Lajoie, Junhua Zhao, Shawn E Levy, Cory Y McLean, Kishwar Shafin, Maria Nattestad, Pi-Chuan Chang
{"title":"Accurate human genome analysis with element avidity sequencing.","authors":"Andrew Carroll, Alexey Kolesnikov, Daniel E Cook, Lucas Brambrink, Kelly N Wiseman, Sophie M Billings, Semyon Kruglyak, Bryan R Lajoie, Junhua Zhao, Shawn E Levy, Cory Y McLean, Kishwar Shafin, Maria Nattestad, Pi-Chuan Chang","doi":"10.1186/s12859-025-06191-4","DOIUrl":"10.1186/s12859-025-06191-4","url":null,"abstract":"<p><strong>Background: </strong>New sequencing technologies provide options for the scientific community to design studies and build clinical workflows. These options expand user choice, and can enable more accurate, scalable, or affordable workflows depending on the fit between scientist needs and platform capability. However, it is essential to understand the performance of these new technologies for different tasks, especially for capabilities that were not possible or tractable in prior technologies. We investigate the new sequencing technology avidity from Element Biosciences. to help the scientific community understand the performance of the options to generate sequencing data.</p><p><strong>Results: </strong>We show that Element whole genome sequencing achieves higher mapping and variant calling accuracy compared to Illumina sequencing at the same coverage, with larger differences at lower coverages (20-30x). We quantify base error rates of Element reads, finding lower error rates, especially in homopolymer and tandem repeat regions. We use Element's ability to generate paired end sequencing with longer insert sizes than typical short-read sequencing. We show that longer insert sizes result in even higher accuracy, with long insert Element sequencing giving more accurate genome analyses at all coverages.</p><p><strong>Conclusions: </strong>New options for sequencing technologies can analyze genomes comparably or better than prior standard methods.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"194"},"PeriodicalIF":3.3,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12291380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144717378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Soft graph clustering for single-cell RNA sequencing data. 单细胞RNA测序数据的软图聚类。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-25 DOI: 10.1186/s12859-025-06231-z
Ping Xu, Pengfei Wang, Zhiyuan Ning, Meng Xiao, Min Wu, Yuanchun Zhou
{"title":"Soft graph clustering for single-cell RNA sequencing data.","authors":"Ping Xu, Pengfei Wang, Zhiyuan Ning, Meng Xiao, Min Wu, Yuanchun Zhou","doi":"10.1186/s12859-025-06231-z","DOIUrl":"10.1186/s12859-025-06231-z","url":null,"abstract":"<p><strong>Background: </strong>Clustering analysis is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis for elucidating cellular heterogeneity and diversity. Recent graph-based scRNA-seq clustering methods, particularly graph neural networks (GNNs), have significantly improved in tackling the challenges of high-dimension, high-sparsity, and frequent dropout events that lead to ambiguous cell population boundaries. However, one major challenge for GNN-based methods is their reliance on hard graph constructions derived from similarity matrices. These constructions introduce difficulties when applied to scRNA-seq data due to: (i) The simplification of intercellular relationships into binary edges (0 or 1) by applying thresholds, which restricts the capture of continuous similarity features among cells and leads to significant information loss. (ii) The presence of significant inter-cluster connections within hard graphs, which can confuse GNN methods that rely heavily on graph structures, potentially causing erroneous message propagation and biased clustering outcomes.</p><p><strong>Results: </strong>To tackle these challenges, we introduce scSGC, a Soft Graph Clustering for single-cell RNA sequencing data, which aims to more accurately characterize continuous similarities among cells through non-binary edge weights, thereby mitigating the limitations of rigid data structures. The scSGC framework comprises three core components: (i) a zero-inflated negative binomial (ZINB)-based feature autoencoder designed to effectively handle the sparsity and dropout issues in scRNA-seq data; (ii) a dual-channel cut-informed soft graph embedding module, constructed through deep graph-cut information, capturing continuous similarities between cells while preserving the intrinsic data structures of scRNA-seq; and (iii) an optimal transport-based clustering optimization module, achieving optimal delineation of cell populations while maintaining high biological relevance.</p><p><strong>Conclusion: </strong>By integrating dual-channel cut-informed soft graph representation learning, a ZINB-based feature autoencoder, and optimal transport-driven clustering optimization, scSGC effectively overcomes the challenges associated with traditional hard graph constructions in GNN methods. Extensive experiments across ten datasets demonstrate that scSGC outperforms 13 state-of-the-art clustering models in clustering accuracy, cell type annotation, and computational efficiency. These results highlight its substantial potential to advance scRNA-seq data analysis and deepen our understanding of cellular heterogeneity.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"195"},"PeriodicalIF":3.3,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12291377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144717379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining whole genome sequencing and non-adaptive group testing for large-scale ethnicity screens. 结合全基因组测序和非适应性群体测试进行大规模种族筛选。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-24 DOI: 10.1186/s12859-025-06192-3
Elior Avraham, Noam Shental
{"title":"Combining whole genome sequencing and non-adaptive group testing for large-scale ethnicity screens.","authors":"Elior Avraham, Noam Shental","doi":"10.1186/s12859-025-06192-3","DOIUrl":"10.1186/s12859-025-06192-3","url":null,"abstract":"<p><strong>Background: </strong>Estimating an individual's ethnicity from genetic data is crucial for analyzing disease association studies, making informed medical decisions, conducting forensic investigations, and tracing genealogical ancestry.</p><p><strong>Results: </strong>This work combines non-adaptive group testing using the mathematical field of compressed sensing and standard short-read sequencing to allow an up to 4-fold increase in the number of samples in large-scale ethnicity estimates. The method requires no prior knowledge regarding the tested individuals and provides almost identical results compared to testing each individual independently. Our results are based on simulated data, and on simulations based on experimental data from the 1000 Genomes Project and the Human Genome Diversity Project.</p><p><strong>Conclusions: </strong>Our computational approach aims to reduce the costs of large-scale ancestry testing by up to 4-fold in many real-life scenarios while not compromising accuracy. We hope this method will allow more efficient large-scale ethnicity screenings.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"192"},"PeriodicalIF":3.3,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating exon-exon junction reads enhances differential splicing detection. 结合外显子-外显子连接读取增强了差异剪接检测。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-24 DOI: 10.1186/s12859-025-06210-4
Mai T Pham, Michael J G Milevskiy, Jane E Visvader, Yunshun Chen
{"title":"Incorporating exon-exon junction reads enhances differential splicing detection.","authors":"Mai T Pham, Michael J G Milevskiy, Jane E Visvader, Yunshun Chen","doi":"10.1186/s12859-025-06210-4","DOIUrl":"10.1186/s12859-025-06210-4","url":null,"abstract":"<p><strong>Background: </strong>RNA sequencing (RNA-seq) is a gold standard technology for studying gene and transcript expression. Different transcripts from the same gene are usually determined by varying combinations of exons within the gene, formed by splicing events. One method of studying differential alternative splicing between groups in short-read RNA-seq experiments is through differential exon usage (DEU) analysis, which uses exon-level read counts along with downstream statistical testing strategies. However, the standard exon counting method does not consider exon-junction information, which may reduce the statistical power in detecting splicing alterations.</p><p><strong>Results: </strong>We present a new workflow for differential splicing analysis, called differential exon-junction usage (DEJU). This DEJU analysis workflow adopts a new feature quantification approach that jointly summarises exon and exon-exon junction reads, which are then integrated into the established Rsubread-edgeR/limma frameworks. We performed comprehensive simulation studies to benchmark the performance of DEJU against existing methods. We also applied DEJU to a mouse mammary gland RNA-seq dataset, revealing biologically meaningful splicing events that could not be detected previously.</p><p><strong>Conclusions: </strong>We demonstrate that incorporating exon-exon junction reads significantly improves the detection of differential splicing events. The proposed DEJU workflow offers increased statistical power and computational efficiency compared to widely used existing approaches, while effectively controlling the false discovery rate.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"193"},"PeriodicalIF":3.3,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffCoRank: a comprehensive framework for discovering hub genes and differential gene co-expression in brain implant-associated tissue responses. DiffCoRank:发现中枢基因和差异基因共表达在脑植入相关组织反应的综合框架。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-23 DOI: 10.1186/s12859-025-06232-y
Anirban Chakraborty, Erin K Purcell, Michael G Moore
{"title":"DiffCoRank: a comprehensive framework for discovering hub genes and differential gene co-expression in brain implant-associated tissue responses.","authors":"Anirban Chakraborty, Erin K Purcell, Michael G Moore","doi":"10.1186/s12859-025-06232-y","DOIUrl":"10.1186/s12859-025-06232-y","url":null,"abstract":"<p><strong>Background: </strong>Brain implants have significant potential for therapeutic applications and neuroscience research, but complex tissue responses often compromise their long-term stability. To address this challenge, differential coexpression analysis can be used to identify key molecular regulators involved in brain implant responses.</p><p><strong>Results: </strong>We developed DiffCoRank, an integrated framework that improves differential coexpression analysis by integrating the techniques of RNA-Seq data preprocessing, gene filtering, correlation-based module identification, and network analysis to discover differentially coexpressed gene clusters. A key innovation of our approach is false discovery rate (FDR) based selection of strongly connected genes (SCGs), by which we improve detection of strong coexpression patterns that otherwise could be lost to spurious correlations. To enhance the identification of different modules, we employ a hybrid clustering technique that combines uniform manifold approximation and projection (UMAP) with density-based spatial clustering of applications with noise (DBSCAN). We propose a multi-criteria hub gene ranking system incorporating network centrality metrics such as degree, closeness, betweenness, and eigenvector centrality to prioritise biologically relevant genes. Additionally, we created a user-friendly application to visualize and explore the results of DiffCoRank interactively.</p><p><strong>Conclusions: </strong>Our method successfully identified key gene modules involved in oxidative stress, calcium signaling, immunological regulation, autophagic recovery, and vascular remodeling in RNA-Seq data of implanted rat brain tissue. Furthermore, we compared our results to those of other existing coexpression analysis frameworks, showing that our method successfully identifies unique regulatory processes and consistent coexpression patterns. Our research offers novel insights into the molecular processes that explain implant-tissue interactions and possible approaches to improve the robustness and biocompatibility of brain interfaces.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"191"},"PeriodicalIF":3.3,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144697573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LDA-SCGB: inferring lncRNA-disease associations based on condensed gradient boosting. LDA-SCGB:基于凝聚梯度增强推断lncrna与疾病的关联。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-22 DOI: 10.1186/s12859-025-06169-2
Chengqiu Dai, Linna Wang, Yingwei Deng, Xuzhu Gao, Jingyu Zhang
{"title":"LDA-SCGB: inferring lncRNA-disease associations based on condensed gradient boosting.","authors":"Chengqiu Dai, Linna Wang, Yingwei Deng, Xuzhu Gao, Jingyu Zhang","doi":"10.1186/s12859-025-06169-2","DOIUrl":"10.1186/s12859-025-06169-2","url":null,"abstract":"<p><strong>Background: </strong>Long non-coding RNAs (lncRNAs) play essential roles in various physiological and pathological processes. Inferring new lncRNA-disease associations (LDAs) not only promotes us to better understand these complex biological processes, but also provides new options for the diagnosis and prevention of diseases.</p><p><strong>Results: </strong>A novel computational model, LDA-SCGB, is proposed to predict new LDAs. LDA-SCGB first extracts features of each lncRNA-disease pair with singular value decomposition. Next, it classifies unknown lncRNA-disease pairs through the condensed gradient boosting model. The results demonstrated that LDA-SCGB greatly outperformed the other four representative LDA inference methods (SDLDA, LDNFSGB, LDAenDL and LDASR) under 5-fold cross validations on lncRNAs, diseases, and lncRNA-disease pairs on three LDA datasets, which were from lncRNADisease v2.0, MNDR, and lncRNADisease v3.0, respectively. LDA-SCGB was further used to find potential lncRNAs for colorectal cancer, heart failure, and lung adenocarcinoma. The results demonstrated that CCDC26, MIAT, and CCDC26 had higher association probability with colorectal cancer, heart failure, and lung adenocarcinoma, respectively.</p><p><strong>Conclusions: </strong>We foresee that LDA-SCGB was capable of predicting potential lncRNAs for complex diseases and further assisting in cancer diagnosis and therapy.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"190"},"PeriodicalIF":3.3,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144688798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BPFun: a deep learning framework for bioactive peptide function prediction using multi-label strategy by transformer-driven and sequence rich intrinsic information. BPFun:一个基于转换器驱动和序列丰富内在信息的多标签策略的生物活性肽功能预测深度学习框架。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-07-21 DOI: 10.1186/s12859-025-06190-5
Lun Zhu, Hao Sun, Sen Yang
{"title":"BPFun: a deep learning framework for bioactive peptide function prediction using multi-label strategy by transformer-driven and sequence rich intrinsic information.","authors":"Lun Zhu, Hao Sun, Sen Yang","doi":"10.1186/s12859-025-06190-5","DOIUrl":"10.1186/s12859-025-06190-5","url":null,"abstract":"<p><p>Bioactive peptides are beneficial or have physiological effects on the life activities of biological organisms. The functions of bioactive peptides are diverse, usually with one or more, so accurately detecting the multiple functions of multi-functional peptides is extremely important. Traditional experimental identification methods are time-consuming, laborious and costly. To overcome these problems, we adopt a computational biology approach and propose a new model BPFun based on deep learning, which can predict seven functions including anticancer, antibacterial, antihypertensive and so on. In BPFun, we obtained the features of bioactive peptides from different aspects, including biological and physicochemical features. Meanwhile, adopting data augmentation to solve the problem of data imbalance. We combine convolutional networks of different scales and Bi-LSTM layers to obtain high-level feature vectors of different features. Finally, the prediction performance is improved by combining these fused features and combining the self-attention mechanism and the Bi-LSTM layer. Our experiments show that BPFun based on five types of sequence features significantly improves the prediction performance of bioactive peptides. Experiments on the test dataset showed that BPFun gets the accuracy and absolute truth value of 0.6577 and 0.6573 on the dataset of seven functional classifications and was superior to other methods. Codes and data are available at https://github.com/291357657/BPFun .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"187"},"PeriodicalIF":2.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12278619/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144681864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信