{"title":"Rigorous integration of single-cell ATAC-seq data using regularized barycentric mapping","authors":"Shuchen Zhu, Heyang Hua, Shengquan Chen","doi":"10.1038/s42256-025-01099-3","DOIUrl":null,"url":null,"abstract":"Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) deciphers genome-wide chromatin accessibility, providing profound insights into gene regulation mechanisms. With the rapid advance of sequencing technologies, scATAC-seq data typically encompass numerous samples from various conditions, resulting in complex batch effects, thus necessitating reliable integration tools. While numerous batch integration tools exist for single-cell RNA sequencing data, inherent data characteristic differences limit their effectiveness on scATAC-seq data. Existing integration methods for scATAC-seq data suffer from several fundamental limitations, such as disrupting the biological heterogeneity and focusing solely on low-dimensional correction, which may distort data and hinder downstream analysis. Here we propose Fountain, a deep learning framework for scATAC-seq data integration via rigorous barycentric mapping. Barycentric mapping transforms one data distribution to another in a principled and effective manner through optimal transport. By regularizing barycentric mapping with geometric data information, Fountain achieves accurate batch alignment while preserving biological heterogeneity. Comprehensive experiments across diverse real-world datasets demonstrate the advantages of Fountain over existing methods in batch correction and biological conservation. In addition, the trained Fountain model can integrate data from new batches alongside already integrated data without retraining, enabling continuous online data integration. Moreover, Fountain’s reconstruction strategy generates batch-corrected ATAC profiles, improving the capture of cellular heterogeneity and revealing cell-type-specific implications such as expression enrichment analysis and partitioned heritability analysis. Zhu, Hua and Chen propose Fountain, a deep learning framework for batch integration of scATAC-seq data that utilizes regularized barycentric mapping. It preserves biological heterogeneity, enabling online and original dimensionality integration.","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"7 9","pages":"1461-1477"},"PeriodicalIF":23.9000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.nature.com/articles/s42256-025-01099-3","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) deciphers genome-wide chromatin accessibility, providing profound insights into gene regulation mechanisms. With the rapid advance of sequencing technologies, scATAC-seq data typically encompass numerous samples from various conditions, resulting in complex batch effects, thus necessitating reliable integration tools. While numerous batch integration tools exist for single-cell RNA sequencing data, inherent data characteristic differences limit their effectiveness on scATAC-seq data. Existing integration methods for scATAC-seq data suffer from several fundamental limitations, such as disrupting the biological heterogeneity and focusing solely on low-dimensional correction, which may distort data and hinder downstream analysis. Here we propose Fountain, a deep learning framework for scATAC-seq data integration via rigorous barycentric mapping. Barycentric mapping transforms one data distribution to another in a principled and effective manner through optimal transport. By regularizing barycentric mapping with geometric data information, Fountain achieves accurate batch alignment while preserving biological heterogeneity. Comprehensive experiments across diverse real-world datasets demonstrate the advantages of Fountain over existing methods in batch correction and biological conservation. In addition, the trained Fountain model can integrate data from new batches alongside already integrated data without retraining, enabling continuous online data integration. Moreover, Fountain’s reconstruction strategy generates batch-corrected ATAC profiles, improving the capture of cellular heterogeneity and revealing cell-type-specific implications such as expression enrichment analysis and partitioned heritability analysis. Zhu, Hua and Chen propose Fountain, a deep learning framework for batch integration of scATAC-seq data that utilizes regularized barycentric mapping. It preserves biological heterogeneity, enabling online and original dimensionality integration.
期刊介绍:
Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements.
To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects.
Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.