{"title":"scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling","authors":"Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei","doi":"arxiv-2404.06153","DOIUrl":null,"url":null,"abstract":"Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking\ntechnology extensively utilized in biological research, facilitating the\nexamination of gene expression at the individual cell level within a given\ntissue sample. While numerous tools have been developed for scRNA-seq data\nanalysis, the challenge persists in capturing the distinct features of such\ndata and replicating virtual datasets that share analogous statistical\nproperties. Results: Our study introduces a generative approach termed\nscRNA-seq Diffusion Transformer (scRDiT). This method generates virtual\nscRNA-seq data by leveraging a real dataset. The method is a neural network\nconstructed based on Denoising Diffusion Probabilistic Models (DDPMs) and\nDiffusion Transformers (DiTs). This involves subjecting Gaussian noises to the\nreal dataset through iterative noise-adding steps and ultimately restoring the\nnoises to form scRNA-seq samples. This scheme allows us to learn data features\nfrom actual scRNA-seq samples during model training. Our experiments, conducted\non two distinct scRNA-seq datasets, demonstrate superior performance.\nAdditionally, the model sampling process is expedited by incorporating\nDenoising Diffusion Implicit Models (DDIM). scRDiT presents a unified\nmethodology empowering users to train neural network models with their unique\nscRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq\nsamples. Availability and implementation: https://github.com/DongShengze/scRDiT","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.06153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking
technology extensively utilized in biological research, facilitating the
examination of gene expression at the individual cell level within a given
tissue sample. While numerous tools have been developed for scRNA-seq data
analysis, the challenge persists in capturing the distinct features of such
data and replicating virtual datasets that share analogous statistical
properties. Results: Our study introduces a generative approach termed
scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual
scRNA-seq data by leveraging a real dataset. The method is a neural network
constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and
Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the
real dataset through iterative noise-adding steps and ultimately restoring the
noises to form scRNA-seq samples. This scheme allows us to learn data features
from actual scRNA-seq samples during model training. Our experiments, conducted
on two distinct scRNA-seq datasets, demonstrate superior performance.
Additionally, the model sampling process is expedited by incorporating
Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified
methodology empowering users to train neural network models with their unique
scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq
samples. Availability and implementation: https://github.com/DongShengze/scRDiT