scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder.

ArXiv Pub Date : 2025-03-13

Aixa X Andrade, Son Nguyen, Albert Montillo

{"title":"scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder.","authors":"Aixa X Andrade, Son Nguyen, Albert Montillo","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>scRNA-seq data has the potential to provide new insights into cellular heterogeneity and data acquisition; however, a major challenge is unraveling confounding from technical and biological batch effects. Existing batch correction algorithms suppress and discard these effects, rather than quantifying and modeling them. Here, we present scMEDAL, a framework for single-cell Mixed Effects Deep Autoencoder Learning, which separately models batch-invariant and batch-specific effects using two complementary autoencoder networks. One network is trained through adversarial learning to capture a batch-invariant representation, while a Bayesian autoencoder learns a batch-specific representation. Comprehensive evaluations spanning conditions (e.g., autism, leukemia, and cardiovascular), cell types, and technical and biological effects demonstrate that scMEDAL suppresses batch effects while modeling batch-specific variation, enhancing accuracy and interpretability. Unlike prior approaches, the framework's fixed- and random-effects autoencoders enable retrospective analyses, including predicting a cell's expression as if it had been acquired in a different batch via genomap projections at the cellular level, revealing the impact of biological (e.g., diagnosis) and technical (e.g., acquisition) effects. By combining scMEDAL's batch-agnostic and batch-specific latent spaces, it enables more accurate predictions of disease status, donor group, and cell type, making scMEDAL a valuable framework for gaining deeper insight into data acquisition and cellular heterogeneity.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601787/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

scRNA-seq data has the potential to provide new insights into cellular heterogeneity and data acquisition; however, a major challenge is unraveling confounding from technical and biological batch effects. Existing batch correction algorithms suppress and discard these effects, rather than quantifying and modeling them. Here, we present scMEDAL, a framework for single-cell Mixed Effects Deep Autoencoder Learning, which separately models batch-invariant and batch-specific effects using two complementary autoencoder networks. One network is trained through adversarial learning to capture a batch-invariant representation, while a Bayesian autoencoder learns a batch-specific representation. Comprehensive evaluations spanning conditions (e.g., autism, leukemia, and cardiovascular), cell types, and technical and biological effects demonstrate that scMEDAL suppresses batch effects while modeling batch-specific variation, enhancing accuracy and interpretability. Unlike prior approaches, the framework's fixed- and random-effects autoencoders enable retrospective analyses, including predicting a cell's expression as if it had been acquired in a different batch via genomap projections at the cellular level, revealing the impact of biological (e.g., diagnosis) and technical (e.g., acquisition) effects. By combining scMEDAL's batch-agnostic and batch-specific latent spaces, it enables more accurate predictions of disease status, donor group, and cell type, making scMEDAL a valuable framework for gaining deeper insight into data acquisition and cellular heterogeneity.

本刊更多论文

混合效应深度学习通过量化和可视化批次效应，对单细胞 RNA 测序数据进行可解释的分析。

单细胞 RNA 测序（scRNA-seq）数据经常受到技术或生物批次效应的干扰。现有的深度学习模型可以减轻这些影响，但往往会丢弃特定批次的信息，从而可能失去有价值的生物学见解。我们提出了一种混合效应深度学习（MEDL）自动编码器框架，它能分别对批次不变（固定效应）和批次特定（随机效应）成分进行建模。通过将批次不变的生物状态与批次变化解耦，我们的框架将两者都整合到了预测模型中。我们的方法还能生成同一细胞在不同批次中出现情况的二维可视化图像，从而提高可解释性。同时保留固定效应和随机效应潜空间可提高分类准确性。我们将框架应用于心血管系统（健康心脏）、自闭症谱系障碍（ASD）和急性髓性白血病（AML）三个数据集。健康心脏 "数据集中有 147 个批次，远远超出了通常的数量，因此我们测试了我们的框架处理多个批次的能力。在 ASD 数据集中，我们的方法捕捉到了自闭症患者和健康人之间的供体异质性。在急性髓细胞白血病数据集中，尽管细胞类型缺失，而且患病供体既有健康细胞也有恶性细胞，但我们的方法仍能区分供体的异质性。这些结果凸显了我们的框架在描述固定效应和随机效应、增强批量效应可视化以及提高不同数据集预测准确性方面的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量