Semi-supervised contrastive learning variational autoencoder Integrating single-cell multimodal mosaic datasets.

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Zihao Wang, Zeyu Wu, Minghua Deng
{"title":"Semi-supervised contrastive learning variational autoencoder Integrating single-cell multimodal mosaic datasets.","authors":"Zihao Wang, Zeyu Wu, Minghua Deng","doi":"10.1186/s12859-025-06239-5","DOIUrl":null,"url":null,"abstract":"<p><p>As single-cell sequencing technology became widely used, scientists found that single-modality data alone could not fully meet the research needs of complex biological systems. To address this issue, researchers began simultaneously collect multi-modal single-cell omics data. But different sequencing technologies often result in datasets where one or more data modalities are missing. Therefore, mosaic datasets are more common when we analyze. However, the high dimensionality and sparsity of the data increase the difficulty, and the presence of batch effects poses an additional challenge. To address these challenges, we proposes a flexible integration framework based on Variational Autoencoder called scGCM. The main task of scGCM is to integrate single-cell multimodal mosaic data and eliminate batch effects. This method was conducted on multiple datasets, encompassing different modalities of single-cell data. The results demonstrate that, compared to state-of-the-art multimodal data integration methods, scGCM offers significant advantages in clustering accuracy and data consistency. The source code of scGCM can be accessed at https://github.com/closmouz/scCGM .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"206"},"PeriodicalIF":3.3000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12323256/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06239-5","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

As single-cell sequencing technology became widely used, scientists found that single-modality data alone could not fully meet the research needs of complex biological systems. To address this issue, researchers began simultaneously collect multi-modal single-cell omics data. But different sequencing technologies often result in datasets where one or more data modalities are missing. Therefore, mosaic datasets are more common when we analyze. However, the high dimensionality and sparsity of the data increase the difficulty, and the presence of batch effects poses an additional challenge. To address these challenges, we proposes a flexible integration framework based on Variational Autoencoder called scGCM. The main task of scGCM is to integrate single-cell multimodal mosaic data and eliminate batch effects. This method was conducted on multiple datasets, encompassing different modalities of single-cell data. The results demonstrate that, compared to state-of-the-art multimodal data integration methods, scGCM offers significant advantages in clustering accuracy and data consistency. The source code of scGCM can be accessed at https://github.com/closmouz/scCGM .

Abstract Image

Abstract Image

Abstract Image

半监督对比学习变分自编码器集成单细胞多模态拼接数据集。
随着单细胞测序技术的广泛应用,科学家发现单模态数据已不能完全满足复杂生物系统的研究需求。为了解决这个问题,研究人员开始同时收集多模态单细胞组学数据。但是不同的测序技术常常导致数据集缺少一种或多种数据模式。因此,在我们分析时,马赛克数据集更为常见。然而,数据的高维度和稀疏度增加了难度,并且批处理效应的存在也带来了额外的挑战。为了解决这些挑战,我们提出了一个基于变分自编码器的灵活集成框架,称为scGCM。scGCM的主要任务是整合单细胞多模态拼接数据,消除批量效应。该方法在多个数据集上进行,包括单细胞数据的不同模式。结果表明,与目前最先进的多模态数据集成方法相比,scGCM在聚类精度和数据一致性方面具有显著优势。scGCM的源代码可以在https://github.com/closmouz/scCGM上访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信