Wei Zhang, Zeqi Xu, Ruochen Yu, Mingfeng Jiang, Qi Dai
{"title":"DualGCN-GE: integration of spatiotemporal representations from whole-blood expression data with dual-view graph convolution network to identify Parkinson's disease subtypes.","authors":"Wei Zhang, Zeqi Xu, Ruochen Yu, Mingfeng Jiang, Qi Dai","doi":"10.1186/s12859-025-06181-6","DOIUrl":"10.1186/s12859-025-06181-6","url":null,"abstract":"<p><strong>Background: </strong>As a typical type of neurodegenerative disorders, Parkinson's disease(PD) is characterized by significant clinical and progression heterogeneity. Based on gene expression data, reliable detection of PACE subtypes in Parkinson's disease(PD-PACE) has played a crucial role in addressing the heterogeneity of this disease. Established machine learning approaches generally adopt single-view learning schemes and employ temporal features underlying RNA sequencing data. Topological features, which are associated with gene graphs and cell graphs, were disregarded in previous works. Actually, Parkinson-specific gene graphs(PGG) could act as topological features to capture structural changes of molecular networks.</p><p><strong>Results: </strong>Under the framework of dual-view graph learning, this study proposes a DualGCN-GE method to identify multiple PD-PACE subtypes from whole-blood expression data, with regards of progression velocity. This DualGCN-GE method has proposed dual-view graph convolution network(GCN) to integrate temporal and topological features underlying whole-blood expression data, thus detecting PD-PACE subtypes. Experimental analysis of three benchmark datasets has validated the effectiveness and advantage of the DualGCN-GE method in the disease subtype detection task.</p><p><strong>Conclusion: </strong>For gene expression data of human blood samples, topological features have encoded unique information that are absent in temporal features. Using a collaborative fusion strategy, spatio-temporal representations extracted from whole blood expression data have improved accuracy and reliability in detecting PD-PACE subtypes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"208"},"PeriodicalIF":3.3,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12341084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144833847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GenomicLayers: sequence-based simulation of epi-genomes.","authors":"Dave T Gerrard","doi":"10.1186/s12859-025-06224-y","DOIUrl":"10.1186/s12859-025-06224-y","url":null,"abstract":"<p><strong>Background: </strong>Cellular development and differentiation in Eukaryotes depends upon sequential gene regulatory decisions that allow a single genome to encode many hundreds of distinct cellular phenotypes. Decisions are stored in the regulatory state of each cell, an important part of which is the epi-genome-the collection of proteins, RNA and their specific associations with the genome. Additionally, further cellular responses are, in part, determined by this regulatory state. To date, models of regulatory state have failed to include the contingency of incoming regulatory signals on the current epi-genetic state and none have done so at the whole-genome level.</p><p><strong>Results: </strong>Here we introduce GenomicLayers, a new R package to run rules-based simulations of epigenetic state changes genome-wide in Eukaryotes. Simulations model the accumulation of changes to genome-wide layers by user-specified binding factors. As a first exemplar, we show two versions of a simple model of the recruitment and spreading of epigenetic marks near telomeres in the yeast Saccharomyces cerevisiae. By combining the output from 100 runs of the simulation, we generate whole genome predictions of epigenetic state at 1 bp resolution. The example yeast models are included within a 'vignette' with the GenomicLayers package, which is available at https://github.com/davetgerrard/GenomicLayers . To demonstrate the use of GenomicLayers on the full human reference genome (hg38), we show the results from parameter refinement on a simplistic model of the action of pluripotency factors against a self-spreading repressor seeded at CpG islands. The human genome model is included in supplementary information as an R script.</p><p><strong>Conclusions: </strong>GenomicLayers enables scientists working on diverse eukaryotic organisms to test models of gene regulation in silico. Applications include epigenetic silencing, activation by combinatorial binding of transcription factors and the sink effects caused by down-regulation of components of epigenetic regulators. The software is intended to be used to parameterise, refine and combine models and thereby capitalise on data from the thousands of studies of Eukaryotic epigenomes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"205"},"PeriodicalIF":3.3,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12323044/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised contrastive learning variational autoencoder Integrating single-cell multimodal mosaic datasets.","authors":"Zihao Wang, Zeyu Wu, Minghua Deng","doi":"10.1186/s12859-025-06239-5","DOIUrl":"10.1186/s12859-025-06239-5","url":null,"abstract":"<p><p>As single-cell sequencing technology became widely used, scientists found that single-modality data alone could not fully meet the research needs of complex biological systems. To address this issue, researchers began simultaneously collect multi-modal single-cell omics data. But different sequencing technologies often result in datasets where one or more data modalities are missing. Therefore, mosaic datasets are more common when we analyze. However, the high dimensionality and sparsity of the data increase the difficulty, and the presence of batch effects poses an additional challenge. To address these challenges, we proposes a flexible integration framework based on Variational Autoencoder called scGCM. The main task of scGCM is to integrate single-cell multimodal mosaic data and eliminate batch effects. This method was conducted on multiple datasets, encompassing different modalities of single-cell data. The results demonstrate that, compared to state-of-the-art multimodal data integration methods, scGCM offers significant advantages in clustering accuracy and data consistency. The source code of scGCM can be accessed at https://github.com/closmouz/scCGM .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"206"},"PeriodicalIF":3.3,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12323256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gadi Chaykin, Omer Sabary, Nili Furman, Dvir Ben Shabat, Eitan Yaakobi
{"title":"Dna-storalator: a computational simulator for DNA data storage.","authors":"Gadi Chaykin, Omer Sabary, Nili Furman, Dvir Ben Shabat, Eitan Yaakobi","doi":"10.1186/s12859-025-06222-0","DOIUrl":"10.1186/s12859-025-06222-0","url":null,"abstract":"<p><strong>Background: </strong>DNA data storage is an emerging technology that caught the attention of many researchers and engineers. This technology uses DNA molecules as a storage medium and thus presents an extremely dense and durable storage device. However, the unique nature of the errors in DNA, which include insertion, deletion, and substitution errors, requires the development of new algorithmic and coding solutions for these storage systems.</p><p><strong>Results: </strong>The DNA-Storalator is a cross-platform software tool that simulates in a simplified digital point of view biological and computational processes involved in the process of storing data in DNA molecules. The simulator receives an input file with the designed DNA strands that store digital data and emulates the different biological and algorithmical components of DNA-based storage system. The biological component includes simulation of the synthesis, PCR, and sequencing stages which are expensive and complicated and therefore are not widely accessible to the community. These processes amplify the data and generate noisy copies of each DNA strand, where the errors are insertions, deletions, long-deletions, and substitutions. The DNA-Storalator injects errors to the data based on the error rates, as they vary between different synthesis and sequencing technologies. The rates are based on comprehensive analysis of data from previous experiments but can also be customized. Additionally, the tool can analyze new datasets and characterize their error rates to build new error models for future usage in the simulator. The DNA-Storalator also enables control of the amplification process and the distribution of the number of copies per designed strand. The coding and algorithmic components are: 1. Clustering algorithms which partition all output noisy strands into groups according to the designed strand they originated from; 2. State-of-the-art reconstruction algorithms that are invoked on each cluster to output a close/exact estimation of the designed strand; 3. Integration with external error-correcting codes and other encoding and decoding techniques.</p><p><strong>Conclusions: </strong>The suggested computational DNA storage simulator grants researchers from all fields an accessible complete simulator to examine new biological technologies, coding techniques, and algorithms for current and future DNA storage systems.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"204"},"PeriodicalIF":3.3,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12323093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Literature data-based de novo candidates for drug repurposing.","authors":"Xianglong Liang, Xin Jiang, Yifang Ma","doi":"10.1186/s12859-025-06237-7","DOIUrl":"10.1186/s12859-025-06237-7","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"203"},"PeriodicalIF":3.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12317455/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144764466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CYCLONE: recycle contrastive learning for integrating single-cell gene expression data.","authors":"Han Ji, Xinwei He, Hongwei Li","doi":"10.1186/s12859-025-06214-0","DOIUrl":"10.1186/s12859-025-06214-0","url":null,"abstract":"<p><strong>Background: </strong>Combining single-cell transcriptome sequencing results from several batches reduces batch effect, which improves our understanding of cellular identity and function.</p><p><strong>Results: </strong>This paper introduces CYCLONE, a new method for integrating single-cell gene expression data using a recycle contrastive learning network. The contrastive learning network and the VAE model work together to jointly train the low-dimensional representations. Additionally, they update the indices of inter-batch MNN pairs to generate positive pairs from a reduced-noise low-dimensional space. Meanwhile, CYCLONE cyclically updates the MNN pairs by iteratively training the low-dimensional space to gradually improve the confidence of the positive sample pairs, and augments the MNN pairs with KNN pairs to identify batch-specific cell types, thus avoiding the problems associated with overcorrecting for the batch effect. The performance of CYCLONE was evaluated on simulated and real scRNA-seq datasets, confirming its ability to improve clustering accuracy while successfully eliminating batch effects. In addition, experiments on batch-specific cell types identification validated CYCLONE's ability to retain batch-specific information while eliminating batch effect, thus preserving batch-specific cell types.</p><p><strong>Conclusion: </strong>CYCLONE is an effective integration method based on recycle contrastive learning that improves the accuracy of cell clustering while successfully eliminating batch effects and preserving batch-specific information.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"202"},"PeriodicalIF":3.3,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312599/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuyi Yang, Anderson Bussing, Giampiero Marra, Michelle L Brinkmeier, Sally A Camper, Shannon W Davis, Yen-Yi Ho
{"title":"Time-coexpress: temporal trajectory modeling of dynamic gene co-expression patterns using single-cell transcriptomics data.","authors":"Shuyi Yang, Anderson Bussing, Giampiero Marra, Michelle L Brinkmeier, Sally A Camper, Shannon W Davis, Yen-Yi Ho","doi":"10.1186/s12859-025-06218-w","DOIUrl":"10.1186/s12859-025-06218-w","url":null,"abstract":"<p><strong>Background: </strong>The rapid advancement of single-cell RNA sequencing (scRNAseq) technology provides high-resolution views of transcriptomic activity within individual cells. Most routine analyses of scRNAseq data focus on individual genes; however, the one-gene-at-a-time analysis is likely to miss meaningful genetic interactions. Gene co-expression analysis addresses this limitation by identifying coordinated changes in gene expression in response to cellular conditions, such as developmental or temporal trajectories. Existing approaches to gene co-expression analysis often assume restrictive linear relationships. However, gene co-expression can change in complex, non-linear ways, which suggests the need for more flexible and accurate methods.</p><p><strong>Results: </strong>We propose a copula-based framework, TIME-CoExpress, with proper data-driven smoothing functions to model non-linear changes in gene co-expression along cellular temporal trajectories. Our method provides the flexibility to incorporate characteristics commonly observed in scRNAseq data, such as over-dispersion and zero-inflation, into the modeling framework. In addition to modeling gene co-expression, TIME-CoExpress captures dynamic changes in gene-level zero-inflation rates and mean expression levels, providing a more comprehensive analysis of scRNAseq data. Through a series of simulation analyses, we evaluated the performance of the proposed approach. We further demonstrated its implementation using a scRNAseq dataset and identified differentially co-expressed gene pairs along the cellular temporal trajectory during pituitary embryonic development, comparing [Formula: see text] and wild-type mice.</p><p><strong>Conclusions: </strong>The proposed framework enables flexible and robust identification of dynamic, non-linear changes in gene co-expression, zero-inflation rates, and mean expression levels along temporal trajectories in scRNAseq data. Detecting these changes provides deeper insights into the biological processes and offers a better understanding of gene regulation throughout cellular development.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"199"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valentine Rech de Laval, Benjamin Dainat, Philippe Engel, Marc Robinson-Rechavi
{"title":"The BeeBiome data portal provides easy access to bee microbiome information.","authors":"Valentine Rech de Laval, Benjamin Dainat, Philippe Engel, Marc Robinson-Rechavi","doi":"10.1186/s12859-025-06229-7","DOIUrl":"10.1186/s12859-025-06229-7","url":null,"abstract":"<p><p>Bees can be colonized by a large diversity of microbes, including beneficial gut symbionts and detrimental pathogens, with implications for bee health. Over the last few years, researchers around the world have collected a huge amount of genomic and transcriptomic data about the composition, genomic content, and gene expression of bee-associated microbial communities. While each of these datasets by itself has provided important insights, the integration of such datasets provides an unprecedented opportunity to obtain a global picture of the microbes associated with bees and their link to bee health. The challenge of such an approach is that datasets are difficult to find within large generalist repositories and are often not readily accessible, which hinders integrative analyses. Here we present a publicly-available online resource, the BeeBiome data portal ( https://www.beebiome.org ), which provides an overview of and easy access to currently available metagenomic datasets involving bee-associated microbes. Currently the data portal contains 33,678 Sequence Read Archive (SRA) experiments for 278 Apoidea hosts. We present the content and functionalities of this portal. By providing access to all bee microbiomes in a single place, with easy filtering on relevant criteria, BeeBiome will allow faster progress of applied and fundamental research on bee biology and health. It should be a useful tool for researchers, academics, funding agencies, and governments, with beneficial impacts for stakeholders.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"198"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dylan Clark-Boucher, Brent A Coull, Harrison T Reeder, Fenglei Wang, Qi Sun, Jacqueline R Starr, Kyu Ha Lee
{"title":"Group-wise normalization in differential abundance analysis of microbiome samples.","authors":"Dylan Clark-Boucher, Brent A Coull, Harrison T Reeder, Fenglei Wang, Qi Sun, Jacqueline R Starr, Kyu Ha Lee","doi":"10.1186/s12859-025-06235-9","DOIUrl":"10.1186/s12859-025-06235-9","url":null,"abstract":"<p><strong>Background: </strong>A key challenge in differential abundance analysis (DAA) of microbial sequencing data is that the counts for each sample are compositional, resulting in potentially biased comparisons of the absolute abundance across study groups. Normalization-based DAA methods rely on external normalization factors that account for compositionality by standardizing the counts onto a common numerical scale. However, existing normalization methods have struggled to maintain the false discovery rate in settings where the variance or compositional bias is large. This article proposes a novel framework for normalization that can reduce bias in DAA by re-conceptualizing normalization as a group-level task. We present two new normalization methods within the group-wise framework: group-wise relative log expression (G-RLE) and fold-truncated sum scaling (FTSS).</p><p><strong>Results: </strong>G-RLE and FTSS achieve higher statistical power for identifying differentially abundant taxa than existing methods in model-based and synthetic data simulation settings. The two novel methods also maintain the false discovery rate in challenging scenarios where existing methods suffer. The best results are obtained from using FTSS normalization with the DAA method MetagenomeSeq.</p><p><strong>Conclusion: </strong>Compared with other methods for normalizing compositional sequence count data prior to DAA, the proposed group-level normalization frameworks offer more robust statistical inference. With a solid mathematical foundation, validated performance in numerical studies, and publicly available software, these new methods can help improve rigor and reproducibility in microbiome research.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"196"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308967/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}