scRDiT:通过扩散变换器和加速采样生成单细胞 RNA-seq 数据

Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei
{"title":"scRDiT:通过扩散变换器和加速采样生成单细胞 RNA-seq 数据","authors":"Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei","doi":"arxiv-2404.06153","DOIUrl":null,"url":null,"abstract":"Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking\ntechnology extensively utilized in biological research, facilitating the\nexamination of gene expression at the individual cell level within a given\ntissue sample. While numerous tools have been developed for scRNA-seq data\nanalysis, the challenge persists in capturing the distinct features of such\ndata and replicating virtual datasets that share analogous statistical\nproperties. Results: Our study introduces a generative approach termed\nscRNA-seq Diffusion Transformer (scRDiT). This method generates virtual\nscRNA-seq data by leveraging a real dataset. The method is a neural network\nconstructed based on Denoising Diffusion Probabilistic Models (DDPMs) and\nDiffusion Transformers (DiTs). This involves subjecting Gaussian noises to the\nreal dataset through iterative noise-adding steps and ultimately restoring the\nnoises to form scRNA-seq samples. This scheme allows us to learn data features\nfrom actual scRNA-seq samples during model training. Our experiments, conducted\non two distinct scRNA-seq datasets, demonstrate superior performance.\nAdditionally, the model sampling process is expedited by incorporating\nDenoising Diffusion Implicit Models (DDIM). scRDiT presents a unified\nmethodology empowering users to train neural network models with their unique\nscRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq\nsamples. Availability and implementation: https://github.com/DongShengze/scRDiT","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling\",\"authors\":\"Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei\",\"doi\":\"arxiv-2404.06153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking\\ntechnology extensively utilized in biological research, facilitating the\\nexamination of gene expression at the individual cell level within a given\\ntissue sample. While numerous tools have been developed for scRNA-seq data\\nanalysis, the challenge persists in capturing the distinct features of such\\ndata and replicating virtual datasets that share analogous statistical\\nproperties. Results: Our study introduces a generative approach termed\\nscRNA-seq Diffusion Transformer (scRDiT). This method generates virtual\\nscRNA-seq data by leveraging a real dataset. The method is a neural network\\nconstructed based on Denoising Diffusion Probabilistic Models (DDPMs) and\\nDiffusion Transformers (DiTs). This involves subjecting Gaussian noises to the\\nreal dataset through iterative noise-adding steps and ultimately restoring the\\nnoises to form scRNA-seq samples. This scheme allows us to learn data features\\nfrom actual scRNA-seq samples during model training. Our experiments, conducted\\non two distinct scRNA-seq datasets, demonstrate superior performance.\\nAdditionally, the model sampling process is expedited by incorporating\\nDenoising Diffusion Implicit Models (DDIM). scRDiT presents a unified\\nmethodology empowering users to train neural network models with their unique\\nscRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq\\nsamples. Availability and implementation: https://github.com/DongShengze/scRDiT\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2404.06153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.06153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动机单细胞 RNA 测序(scRNA-seq)是生物研究中广泛应用的一项突破性技术,有助于研究给定组织样本中单个细胞水平的基因表达。虽然已经开发出许多用于 scRNA-seq 数据分析的工具,但在捕捉此类数据的独特特征和复制具有类似统计属性的虚拟数据集方面仍存在挑战。结果:我们的研究引入了一种称为 scRNA-seq Diffusion Transformer(scRDiT)的生成方法。该方法利用真实数据集生成虚拟 scRNA-seq 数据。该方法是基于去噪扩散概率模型(DDPM)和扩散变换器(DiT)构建的神经网络。这包括通过迭代噪声添加步骤对原始数据集进行高斯噪声处理,并最终恢复噪声以形成 scRNA-seq 样本。这种方案使我们能够在模型训练期间从实际的 scRNA-seq 样本中学习数据特征。我们在两个不同的 scRNA-seq 数据集上进行的实验证明了其卓越的性能。此外,通过结合噪声扩散隐含模型(DDIM),我们加快了模型采样过程。scRDiT 提出了一种统一的方法论,使用户能够利用其独特的 scRNA-seq 数据集训练神经网络模型,从而生成大量高质量的 scRNA-seq 样本。可用性和实施:https://github.com/DongShengze/scRDiT
本文章由计算机程序翻译,如有差异,请以英文原文为准。
scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling
Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual datasets that share analogous statistical properties. Results: Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual scRNA-seq data by leveraging a real dataset. The method is a neural network constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the real dataset through iterative noise-adding steps and ultimately restoring the noises to form scRNA-seq samples. This scheme allows us to learn data features from actual scRNA-seq samples during model training. Our experiments, conducted on two distinct scRNA-seq datasets, demonstrate superior performance. Additionally, the model sampling process is expedited by incorporating Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified methodology empowering users to train neural network models with their unique scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq samples. Availability and implementation: https://github.com/DongShengze/scRDiT
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信