Zeyu Fu, Chunlin Chen, Song Wang, Junping Wang, Shilei Chen
{"title":"iVAE: an interpretable representation learning framework enhances clustering performance for single-cell data.","authors":"Zeyu Fu, Chunlin Chen, Song Wang, Junping Wang, Shilei Chen","doi":"10.1186/s12915-025-02315-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Variational autoencoders (VAEs) serve as essential components in large generative models for extracting latent representations and have gained widespread application in biological domains. Developing VAEs specifically tailored to the unique characteristics of biological data is crucial for advancing future large-scale biological models.</p><p><strong>Results: </strong>Through systematic monitoring of VAE training processes across 31 public single-cell datasets spanning oncological and normal conditions, we discovered that reducing the <math><mi>β</mi></math> value which corresponds to lower disentanglement of VAE significantly improves unsupervised clustering metrics in single-cell data analysis. Based on this finding, we innovatively developed iVAE with an irecon module that, when benchmarked against 8 established dimensionality reduction methods across 5 clustering performance metrics, exhibited superior capabilities in representing single-cell transcriptomic data.</p><p><strong>Conclusions: </strong>The proposed iVAE architecture enhances the interpretability of single-cell data compared to conventional VAE architectures as measured by clustering metrics. Our work establishes a potential foundational VAE architecture for developing specialized large-scale generative models for biological applications.</p>","PeriodicalId":9339,"journal":{"name":"BMC Biology","volume":"23 1","pages":"213"},"PeriodicalIF":4.4000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12261748/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12915-025-02315-7","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Variational autoencoders (VAEs) serve as essential components in large generative models for extracting latent representations and have gained widespread application in biological domains. Developing VAEs specifically tailored to the unique characteristics of biological data is crucial for advancing future large-scale biological models.
Results: Through systematic monitoring of VAE training processes across 31 public single-cell datasets spanning oncological and normal conditions, we discovered that reducing the value which corresponds to lower disentanglement of VAE significantly improves unsupervised clustering metrics in single-cell data analysis. Based on this finding, we innovatively developed iVAE with an irecon module that, when benchmarked against 8 established dimensionality reduction methods across 5 clustering performance metrics, exhibited superior capabilities in representing single-cell transcriptomic data.
Conclusions: The proposed iVAE architecture enhances the interpretability of single-cell data compared to conventional VAE architectures as measured by clustering metrics. Our work establishes a potential foundational VAE architecture for developing specialized large-scale generative models for biological applications.
期刊介绍:
BMC Biology is a broad scope journal covering all areas of biology. Our content includes research articles, new methods and tools. BMC Biology also publishes reviews, Q&A, and commentaries.