Mai T Pham, Michael J G Milevskiy, Jane E Visvader, Yunshun Chen
{"title":"Incorporating exon-exon junction reads enhances differential splicing detection.","authors":"Mai T Pham, Michael J G Milevskiy, Jane E Visvader, Yunshun Chen","doi":"10.1186/s12859-025-06210-4","DOIUrl":"10.1186/s12859-025-06210-4","url":null,"abstract":"<p><strong>Background: </strong>RNA sequencing (RNA-seq) is a gold standard technology for studying gene and transcript expression. Different transcripts from the same gene are usually determined by varying combinations of exons within the gene, formed by splicing events. One method of studying differential alternative splicing between groups in short-read RNA-seq experiments is through differential exon usage (DEU) analysis, which uses exon-level read counts along with downstream statistical testing strategies. However, the standard exon counting method does not consider exon-junction information, which may reduce the statistical power in detecting splicing alterations.</p><p><strong>Results: </strong>We present a new workflow for differential splicing analysis, called differential exon-junction usage (DEJU). This DEJU analysis workflow adopts a new feature quantification approach that jointly summarises exon and exon-exon junction reads, which are then integrated into the established Rsubread-edgeR/limma frameworks. We performed comprehensive simulation studies to benchmark the performance of DEJU against existing methods. We also applied DEJU to a mouse mammary gland RNA-seq dataset, revealing biologically meaningful splicing events that could not be detected previously.</p><p><strong>Conclusions: </strong>We demonstrate that incorporating exon-exon junction reads significantly improves the detection of differential splicing events. The proposed DEJU workflow offers increased statistical power and computational efficiency compared to widely used existing approaches, while effectively controlling the false discovery rate.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"193"},"PeriodicalIF":3.3,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288301/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anirban Chakraborty, Erin K Purcell, Michael G Moore
{"title":"DiffCoRank: a comprehensive framework for discovering hub genes and differential gene co-expression in brain implant-associated tissue responses.","authors":"Anirban Chakraborty, Erin K Purcell, Michael G Moore","doi":"10.1186/s12859-025-06232-y","DOIUrl":"10.1186/s12859-025-06232-y","url":null,"abstract":"<p><strong>Background: </strong>Brain implants have significant potential for therapeutic applications and neuroscience research, but complex tissue responses often compromise their long-term stability. To address this challenge, differential coexpression analysis can be used to identify key molecular regulators involved in brain implant responses.</p><p><strong>Results: </strong>We developed DiffCoRank, an integrated framework that improves differential coexpression analysis by integrating the techniques of RNA-Seq data preprocessing, gene filtering, correlation-based module identification, and network analysis to discover differentially coexpressed gene clusters. A key innovation of our approach is false discovery rate (FDR) based selection of strongly connected genes (SCGs), by which we improve detection of strong coexpression patterns that otherwise could be lost to spurious correlations. To enhance the identification of different modules, we employ a hybrid clustering technique that combines uniform manifold approximation and projection (UMAP) with density-based spatial clustering of applications with noise (DBSCAN). We propose a multi-criteria hub gene ranking system incorporating network centrality metrics such as degree, closeness, betweenness, and eigenvector centrality to prioritise biologically relevant genes. Additionally, we created a user-friendly application to visualize and explore the results of DiffCoRank interactively.</p><p><strong>Conclusions: </strong>Our method successfully identified key gene modules involved in oxidative stress, calcium signaling, immunological regulation, autophagic recovery, and vascular remodeling in RNA-Seq data of implanted rat brain tissue. Furthermore, we compared our results to those of other existing coexpression analysis frameworks, showing that our method successfully identifies unique regulatory processes and consistent coexpression patterns. Our research offers novel insights into the molecular processes that explain implant-tissue interactions and possible approaches to improve the robustness and biocompatibility of brain interfaces.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"191"},"PeriodicalIF":3.3,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144697573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chengqiu Dai, Linna Wang, Yingwei Deng, Xuzhu Gao, Jingyu Zhang
{"title":"LDA-SCGB: inferring lncRNA-disease associations based on condensed gradient boosting.","authors":"Chengqiu Dai, Linna Wang, Yingwei Deng, Xuzhu Gao, Jingyu Zhang","doi":"10.1186/s12859-025-06169-2","DOIUrl":"10.1186/s12859-025-06169-2","url":null,"abstract":"<p><strong>Background: </strong>Long non-coding RNAs (lncRNAs) play essential roles in various physiological and pathological processes. Inferring new lncRNA-disease associations (LDAs) not only promotes us to better understand these complex biological processes, but also provides new options for the diagnosis and prevention of diseases.</p><p><strong>Results: </strong>A novel computational model, LDA-SCGB, is proposed to predict new LDAs. LDA-SCGB first extracts features of each lncRNA-disease pair with singular value decomposition. Next, it classifies unknown lncRNA-disease pairs through the condensed gradient boosting model. The results demonstrated that LDA-SCGB greatly outperformed the other four representative LDA inference methods (SDLDA, LDNFSGB, LDAenDL and LDASR) under 5-fold cross validations on lncRNAs, diseases, and lncRNA-disease pairs on three LDA datasets, which were from lncRNADisease v2.0, MNDR, and lncRNADisease v3.0, respectively. LDA-SCGB was further used to find potential lncRNAs for colorectal cancer, heart failure, and lung adenocarcinoma. The results demonstrated that CCDC26, MIAT, and CCDC26 had higher association probability with colorectal cancer, heart failure, and lung adenocarcinoma, respectively.</p><p><strong>Conclusions: </strong>We foresee that LDA-SCGB was capable of predicting potential lncRNAs for complex diseases and further assisting in cancer diagnosis and therapy.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"190"},"PeriodicalIF":3.3,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144688798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BPFun: a deep learning framework for bioactive peptide function prediction using multi-label strategy by transformer-driven and sequence rich intrinsic information.","authors":"Lun Zhu, Hao Sun, Sen Yang","doi":"10.1186/s12859-025-06190-5","DOIUrl":"10.1186/s12859-025-06190-5","url":null,"abstract":"<p><p>Bioactive peptides are beneficial or have physiological effects on the life activities of biological organisms. The functions of bioactive peptides are diverse, usually with one or more, so accurately detecting the multiple functions of multi-functional peptides is extremely important. Traditional experimental identification methods are time-consuming, laborious and costly. To overcome these problems, we adopt a computational biology approach and propose a new model BPFun based on deep learning, which can predict seven functions including anticancer, antibacterial, antihypertensive and so on. In BPFun, we obtained the features of bioactive peptides from different aspects, including biological and physicochemical features. Meanwhile, adopting data augmentation to solve the problem of data imbalance. We combine convolutional networks of different scales and Bi-LSTM layers to obtain high-level feature vectors of different features. Finally, the prediction performance is improved by combining these fused features and combining the self-attention mechanism and the Bi-LSTM layer. Our experiments show that BPFun based on five types of sequence features significantly improves the prediction performance of bioactive peptides. Experiments on the test dataset showed that BPFun gets the accuracy and absolute truth value of 0.6577 and 0.6573 on the dataset of seven functional classifications and was superior to other methods. Codes and data are available at https://github.com/291357657/BPFun .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"187"},"PeriodicalIF":2.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12278619/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144681864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keith Mitchell, Samuel Hunter, Lutz Froenicke, Karl Murray, Matthew Settles, James S Trimmer
{"title":"clonevdjseq: A workflow and bioinformatics management system for sequencing, archiving, and analysis of VDJ sequences from clonal libraries.","authors":"Keith Mitchell, Samuel Hunter, Lutz Froenicke, Karl Murray, Matthew Settles, James S Trimmer","doi":"10.1186/s12859-025-06107-2","DOIUrl":"10.1186/s12859-025-06107-2","url":null,"abstract":"<p><strong>Background: </strong>Advances in next-generation sequencing technologies have facilitated extensive analysis of B cell and T cell receptor (BCR/TCR, respectively) sequences from monoclonal hybridoma libraries, single B cells, and single T cells, generating vast amounts of important data pertaining to antigen recognition. However, existing workflows and bioinformatics tools often lack the flexibility and scalability needed to handle large clonal level datasets effectively. An initial system and hybridoma dependent version of this code was distributed as part of the NeuroMabSeq publication, but clonevdjseq aims to be a technical addendum for broader system compatibility and enhanced modeling.</p><p><strong>Results: </strong>We present clonevdjseq, an integrated and accessible software solution leveraging nextflow and Django. Developed primarily for large hybridoma libraries, the workflow and pipeline is amenable to BCR/TCR sequence analysis of homogenous populations or clones of B and T cells, respectively. The clonevdjseq pipeline includes modules for read processing, amplicon denoising, and quality control of paired variable light/heavy chains of BCRs from B cells and hybridomas, or alpha(ɑ)/beta(β) and delta(δ)/gamma(γ) chains of TCRs in the case of T cell applications. The pipeline is built upon a robust, high-throughput library prep protocol, upon which processed data has been verified across thousands of monoclonal antibodies. The results of this effort has yielded sequences used to develop functional recombinant monoclonal antibodies and single chain variable fragments as a part of the NeuroMabSeq initiative where thousands of hybridoma samples were processed (Mitchell et al. in Sci Rep 13(1):16200, 2023) as well as provide additional modeling and extensibility to other modalities. The clonevdjseq software is accessible via Nextflow and also offers a database and web app as a final optional step in the processing for dissemination of results and data exploration.</p><p><strong>Conclusions: </strong>clonevdjseq offers a comprehensive and scalable solution for the processing and analysis of large monoclonal and oligoclonal VDJ datasets. Its modular design, dynamic pipeline, and robust database integration facilitate efficient data management and analysis. The platform is publicly available and aims to support the research community by providing an accessible and flexible tool for archiving and dissemination of BCR sequences from hybridomas, with applicability for other applications such as TCR sequences from single-cell T cell populations.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"186"},"PeriodicalIF":2.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12278597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144681913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed Abdelkader, Nur A Ferdous, Mohamed El-Hadidi, Tomasz Burzykowski, Mohamed Mysara
{"title":"metaGEENOME: an integrated framework for differential abundance analysis of microbiome data in cross-sectional and longitudinal studies.","authors":"Ahmed Abdelkader, Nur A Ferdous, Mohamed El-Hadidi, Tomasz Burzykowski, Mohamed Mysara","doi":"10.1186/s12859-025-06217-x","DOIUrl":"10.1186/s12859-025-06217-x","url":null,"abstract":"<p><strong>Background: </strong>Detecting biomarkers is a key objective in microbiome research, often done through 16S rRNA amplicon sequencing or shotgun metagenomic analysis. A critical step in this process is differential abundance (DA) analysis, which aims to pinpoint taxa whose abundance significantly differs between groups. However, DA analysis remains challenging due to high dimensionality, compositionality, sparsity, inter-taxa correlations, uneven abundance distributions, and missing values-all which hinder our ability to model the data accurately. Despite the availability of many DA tools, balancing high statistical power with effective false discovery rate (FDR) control remains a major limitation.</p><p><strong>Results: </strong>Here, we introduce a novel approach for DA analysis that integrates counts adjusted with Trimmed Mean of M-values (CTF) normalization and Centered Log Ratio (CLR) transformation with Generalized Estimating Equation (GEE) model. We benchmarked our approach against eight widely used tools employing both simulated and real datasets in cross-sectional and longitudinal settings. While several tools (e.g. MetagenomeSeq, edgeR, DESeq2 and Lefse) achieved high sensitivity, they often failed to adequately control the FDR. In contrast, our method demonstrated high sensitivity and specificity when compared to other approaches that successfully controlled the FDR, including ALDEx2, limma-voom, ANCOM, and ANCOM-BC2.</p><p><strong>Conclusions: </strong>Our approach effectively addresses key challenges in microbiome data analysis across both cross-sectional and longitudinal designs. Integrated into the R package metaGEENOME (https://github.com/M-Mysara/metaGEENOME), our framework provides a flexible, scalable and statistically robust solution for DA analysis, offering improved FDR control and enhanced performance for biomarker discovery in microbiome studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"189"},"PeriodicalIF":2.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281747/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144681914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aryana-bs: context-aware alignment of bisulfite-sequencing reads.","authors":"Hassan Nikaein, Ali Sharifi-Zarchi, Afsoon Afzal, Saeedeh Ezzati, Farzane Rasti, Hamidreza Chitsaz, Govindarajan Kunde-Ramamoorthy","doi":"10.1186/s12859-025-06182-5","DOIUrl":"10.1186/s12859-025-06182-5","url":null,"abstract":"<p><strong>Background: </strong>DNA methylation is essential in various biological processes, including imprinting, development, inflammation, and numerous disorders, such as cancer. Bisulfite sequencing (BS) serves as the gold standard for measuring DNA methylation at single-base resolution by converting unmethylated cytosines to thymines while leaving methylated cytosines intact. However, this C-to-T conversion presents a well-known challenge in conventional short-read aligners, which treat these conversions as substitutions. Many aligners that require seed sequences fail when frequent C-to-T conversions occur over short distances, resulting in reduced alignment accuracy. To address this challenge, two alignment methods have been well established: three-letter alignment and wildcard alignment. Three-letter alignment faces the significant issue of data loss by converting all thymines to cytosines, which obscures meaningful information. On the other hand, wildcard alignment introduces a biased alignment, failing to treat reads from unmethylated and methylated regions equally, leading to artifacts in methylation level estimation and inaccuracies in quantifying DNA methylation. This work introduces ARYANA-BS, a novel BS aligner that diverges from conventional DNA aligners by directly integrating BS-specific base alterations within its alignment engine. Leveraging known DNA methylation patterns across different genomic contexts, ARYANA-BS constructs five indexes from the reference genome, aligns each read to all indexes, and selects the alignment with the minimum penalty. To further refine alignment accuracy, an optional Expectation-Maximization (EM) step is incorporated, which integrates methylation probability information into the decision-making process for choosing the optimal index for each read. This approach aims to enhance BS read alignment accuracy by accommodating the complexities of DNA methylation patterns across diverse genomic contexts.</p><p><strong>Results: </strong>Experimental evaluations on both simulated and real data reveal that ARYANA-BS achieves state-of-the-art accuracy, maintaining competitive speed and memory efficiency.</p><p><strong>Conclusions: </strong>ARYANA-BS significantly improves alignment accuracy for bisulfite sequencing data by effectively integrating DNA methylation-specific alterations and genomic context. It outperforms existing methods, such as BSMAP, bwa-meth, Bismark, BSBolt, and abismal, particularly in robustness against genomic biases and alignment of longer, higher-error reads, demonstrating suitability for cancer research and cell-free DNA studies. While the Expectation-Maximization (EM) algorithm provides only modest initial improvements, it establishes a valuable framework for future refinement and potential enhancements in sensitive applications.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"188"},"PeriodicalIF":2.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144681863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kavipriya Gananathan, D Manjula, Vijayan Sugumaran
{"title":"DTIP-WINDGRU a novel drug-target interaction prediction with wind-enhanced gated recurrent unit.","authors":"Kavipriya Gananathan, D Manjula, Vijayan Sugumaran","doi":"10.1186/s12859-025-06141-0","DOIUrl":"10.1186/s12859-025-06141-0","url":null,"abstract":"<p><strong>Background: </strong>Identification of drug target interactions (DTI) is an important part of the drug discovery process. Since prediction of DTI using laboratory tests is time consuming and laborious, automated tools using computational intelligence (CI) techniques become essential. The prediction of DTI is a challenging process due to the absence of known drug-target relationship and no experimentally verified negative samples. The datasets with limited or unbalanced data, do not perform well. The models that use heterogeneous networks, non-linear fusion techniques, and heuristic similarity selection may need a lot of computational power and experience to implement and fine-tune. The latest developments in machine learning (ML) and deep learning (DL) models can be employed for effective DTI prediction process.</p><p><strong>Results: </strong>To that end, this study develops a novel DTI Prediction model, namely, DTIP-WINDGRU Drug-Target Interaction Prediction with Wind-Enhanced GRU. The major aim is to determine the DTIs in both labelled and unlabelled samples accurately compared to traditional wet lab experiments. To accomplish this, the proposed DTIP-WINDGRU model primarily performs pre-processing and class labelling. In addition, drug-to-drug (D-D) and target-to-target (T-T) interactions are employed to initialize the weights of the GRU model and are employed for the, DTI prediction process. Finally, the Wind Driven Optimization (WDO) algorithm is utilized to optimally choose the hyperparameters involved in the GRU model.</p><p><strong>Conclusions: </strong>For ensuring the effectual prediction results of the DTIP-WINDGRU model, a widespread experimentation process was carried out using four datasets. This comprehensive comparative study highlighted the better performance of the DTIP-WINDGRU model over existing techniques.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"185"},"PeriodicalIF":2.9,"publicationDate":"2025-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12278605/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144673896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCNT: an R package for data analysis and visualization of single-cell and spatial transcriptomics.","authors":"Jianbo Qing, Jialu Wu, Yafeng Li, Junnan Wu","doi":"10.1186/s12859-025-06209-x","DOIUrl":"10.1186/s12859-025-06209-x","url":null,"abstract":"<p><strong>Background: </strong>The emergence of single-cell (SC) and spatial transcriptomics (ST) has revolutionized our understanding of gene expression dynamics in complex tissues. However, it also presents challenges for data analysis and visualization, particularly due to the complexity of ST data and the diversity of analysis platforms. The SCNT (Single-Cell, Single-Nucleus, and Spatial Transcriptomics Analysis and Visualization Tools) package was developed to address these challenges by providing an efficient and user-friendly tool for processing, analyzing, and visualizing SC and ST data.</p><p><strong>Results: </strong>SCNT is an R-based package that integrates widely used tools such as Seurat and ggplot2, enabling seamless conversion between Seurat and H5ad formats. The package supports high-resolution spatial visualization, including customizable gene expression and clustering plots. SCNT also simplifies key data analysis steps, such as quality control, dimensionality reduction, and doublet detection, significantly enhancing workflow efficiency. We tested SCNT on publicly available PBMC dataset, Visum and Visium HD human kidney tissue data, demonstrating its effectiveness.</p><p><strong>Conclusions: </strong>SCNT offers a valuable tool for researchers exploring SC and ST data. Its simplicity, flexibility, and powerful visualization capabilities provide a streamlined workflow for both novice and advanced users. Future developments will focus on expanding support for additional ST platforms and enhancing multi-omics data integration.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"184"},"PeriodicalIF":2.9,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12273005/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144667011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Li, Denggao Zheng, Kaijie Sun, Chi Qin, Yuchen Duan, Qingqing Zhou, Yunxia Yin, Hongxing Kan, Jili Hu
{"title":"A densely connected framework for cancer subtype classification.","authors":"Yu Li, Denggao Zheng, Kaijie Sun, Chi Qin, Yuchen Duan, Qingqing Zhou, Yunxia Yin, Hongxing Kan, Jili Hu","doi":"10.1186/s12859-025-06230-0","DOIUrl":"10.1186/s12859-025-06230-0","url":null,"abstract":"<p><strong>Background: </strong>Reliable identification of cancer subtypes is crucial for devising personalized treatment strategies. Integrating multi-omics data has proven to be an effective method for analyzing cancer subtypes. By combining molecular information across various layers, a more comprehensive understanding of biological characteristics and disease mechanisms can be achieved.</p><p><strong>Results: </strong>We propose DEGCN, a novel deep learning model that integrates a three-channel Variational Autoencoder (VAE) for multi-omics dimensionality reduction and a densely connected Graph Convolutional Network (GCN) for effective subtype classification. DEGCN leverages the complementary strengths of non-linear feature extraction and graph-based relational learning, enabling accurate and robust classification of renal cancer subtypes. Experimental results demonstrate that DEGCN achieves a cross-validated classification accuracy of 97.06% ± 2.04% on renal cancer data, outperforming conventional machine learning algorithms and state-of-the-art deep learning models. Moreover, its generalization ability is validated on breast and gastric cancer datasets from TCGA, with cross-validated classification accuracies of 89.82% ± 2.29% and 88.64% ± 5.24%, respectively, indicating strong cross-cancer predictive performance.</p><p><strong>Conclusion: </strong>The study highlights the outstanding performance of DEGCN in heterogeneous data integration and classification accuracy, demonstrating the model's potential in cancer subtype prediction and its application in guiding clinical treatment.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"183"},"PeriodicalIF":2.9,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12273249/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144667010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}