scRDiT：通过扩散变换器和加速采样生成单细胞 RNA-seq 数据

arXiv - QuanBio - Genomics Pub Date : 2024-04-09 DOI:arxiv-2404.06153

Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei

{"title":"scRDiT：通过扩散变换器和加速采样生成单细胞 RNA-seq 数据","authors":"Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei","doi":"arxiv-2404.06153","DOIUrl":null,"url":null,"abstract":"Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking\ntechnology extensively utilized in biological research, facilitating the\nexamination of gene expression at the individual cell level within a given\ntissue sample. While numerous tools have been developed for scRNA-seq data\nanalysis, the challenge persists in capturing the distinct features of such\ndata and replicating virtual datasets that share analogous statistical\nproperties. Results: Our study introduces a generative approach termed\nscRNA-seq Diffusion Transformer (scRDiT). This method generates virtual\nscRNA-seq data by leveraging a real dataset. The method is a neural network\nconstructed based on Denoising Diffusion Probabilistic Models (DDPMs) and\nDiffusion Transformers (DiTs). This involves subjecting Gaussian noises to the\nreal dataset through iterative noise-adding steps and ultimately restoring the\nnoises to form scRNA-seq samples. This scheme allows us to learn data features\nfrom actual scRNA-seq samples during model training. Our experiments, conducted\non two distinct scRNA-seq datasets, demonstrate superior performance.\nAdditionally, the model sampling process is expedited by incorporating\nDenoising Diffusion Implicit Models (DDIM). scRDiT presents a unified\nmethodology empowering users to train neural network models with their unique\nscRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq\nsamples. Availability and implementation: https://github.com/DongShengze/scRDiT","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling\",\"authors\":\"Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei\",\"doi\":\"arxiv-2404.06153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking\\ntechnology extensively utilized in biological research, facilitating the\\nexamination of gene expression at the individual cell level within a given\\ntissue sample. While numerous tools have been developed for scRNA-seq data\\nanalysis, the challenge persists in capturing the distinct features of such\\ndata and replicating virtual datasets that share analogous statistical\\nproperties. Results: Our study introduces a generative approach termed\\nscRNA-seq Diffusion Transformer (scRDiT). This method generates virtual\\nscRNA-seq data by leveraging a real dataset. The method is a neural network\\nconstructed based on Denoising Diffusion Probabilistic Models (DDPMs) and\\nDiffusion Transformers (DiTs). This involves subjecting Gaussian noises to the\\nreal dataset through iterative noise-adding steps and ultimately restoring the\\nnoises to form scRNA-seq samples. This scheme allows us to learn data features\\nfrom actual scRNA-seq samples during model training. Our experiments, conducted\\non two distinct scRNA-seq datasets, demonstrate superior performance.\\nAdditionally, the model sampling process is expedited by incorporating\\nDenoising Diffusion Implicit Models (DDIM). scRDiT presents a unified\\nmethodology empowering users to train neural network models with their unique\\nscRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq\\nsamples. Availability and implementation: https://github.com/DongShengze/scRDiT\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2404.06153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.06153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

动机单细胞 RNA 测序（scRNA-seq）是生物研究中广泛应用的一项突破性技术，有助于研究给定组织样本中单个细胞水平的基因表达。虽然已经开发出许多用于 scRNA-seq 数据分析的工具，但在捕捉此类数据的独特特征和复制具有类似统计属性的虚拟数据集方面仍存在挑战。结果：我们的研究引入了一种称为 scRNA-seq Diffusion Transformer（scRDiT）的生成方法。该方法利用真实数据集生成虚拟 scRNA-seq 数据。该方法是基于去噪扩散概率模型（DDPM）和扩散变换器（DiT）构建的神经网络。这包括通过迭代噪声添加步骤对原始数据集进行高斯噪声处理，并最终恢复噪声以形成 scRNA-seq 样本。这种方案使我们能够在模型训练期间从实际的 scRNA-seq 样本中学习数据特征。我们在两个不同的 scRNA-seq 数据集上进行的实验证明了其卓越的性能。此外，通过结合噪声扩散隐含模型（DDIM），我们加快了模型采样过程。scRDiT 提出了一种统一的方法论，使用户能够利用其独特的 scRNA-seq 数据集训练神经网络模型，从而生成大量高质量的 scRNA-seq 样本。可用性和实施：https://github.com/DongShengze/scRDiT

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling

Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual datasets that share analogous statistical properties. Results: Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual scRNA-seq data by leveraging a real dataset. The method is a neural network constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the real dataset through iterative noise-adding steps and ultimately restoring the noises to form scRNA-seq samples. This scheme allows us to learn data features from actual scRNA-seq samples during model training. Our experiments, conducted on two distinct scRNA-seq datasets, demonstrate superior performance. Additionally, the model sampling process is expedited by incorporating Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified methodology empowering users to train neural network models with their unique scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq samples. Availability and implementation: https://github.com/DongShengze/scRDiT

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量