DiSCO: deconvoluting spatial transcriptomics via combinatorial optimization with a foundational diffusion model.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2026-05-03 DOI:10.1093/bib/bbag207

Jing Liu, Yahao Wu, Limin Li

{"title":"DiSCO: deconvoluting spatial transcriptomics via combinatorial optimization with a foundational diffusion model.","authors":"Jing Liu, Yahao Wu, Limin Li","doi":"10.1093/bib/bbag207","DOIUrl":null,"url":null,"abstract":"<p><p>Deciphering the cellular composition of spatial spots in spatial transcriptomics (ST) data is fundamental for elucidating the heterogeneity of tissue spatial structures. However, existing models often require retraining for each new deconvolution task, reflecting limitations in both generalization performance and computational efficiency. To address this problem, we design a foundational diffusion model to deconvoluting spatial transcriptomics based on combinatorial optimization, termed DiSCO. DiSCO formulates the deconvolution of ST data as a task-specific deconvolutional combinatorial optimization (CO) problem, wherein single cells (SCs) are assigned to spatial spots to optimally preserve the gene expression profiles of each spot. DiSCO introduces a bipartite graph diffusion model as an optimization solver, specifically designed to be generalizable to any new deconvolutional CO problem. Pretrained on a large number of deconvolution tasks using gene expression profiles of both SCs and spatial spots as inputs, DiSCO learns the distribution of true solutions and generates approximate solutions through sampling, thereby enabling the determination of the cellular composition for each spot. As a generalizable deconvolution solver, the DiSCO is evaluated by experiments on both simulated datasets and real datasets, demonstrating that the pretrained DiSCO model performs effectively and efficiently on datasets with varying resolutions and different numbers of genes, thus highlighting its capacity to effectively generalize to diverse datasets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 3","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2026-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbag207","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Deciphering the cellular composition of spatial spots in spatial transcriptomics (ST) data is fundamental for elucidating the heterogeneity of tissue spatial structures. However, existing models often require retraining for each new deconvolution task, reflecting limitations in both generalization performance and computational efficiency. To address this problem, we design a foundational diffusion model to deconvoluting spatial transcriptomics based on combinatorial optimization, termed DiSCO. DiSCO formulates the deconvolution of ST data as a task-specific deconvolutional combinatorial optimization (CO) problem, wherein single cells (SCs) are assigned to spatial spots to optimally preserve the gene expression profiles of each spot. DiSCO introduces a bipartite graph diffusion model as an optimization solver, specifically designed to be generalizable to any new deconvolutional CO problem. Pretrained on a large number of deconvolution tasks using gene expression profiles of both SCs and spatial spots as inputs, DiSCO learns the distribution of true solutions and generates approximate solutions through sampling, thereby enabling the determination of the cellular composition for each spot. As a generalizable deconvolution solver, the DiSCO is evaluated by experiments on both simulated datasets and real datasets, demonstrating that the pretrained DiSCO model performs effectively and efficiently on datasets with varying resolutions and different numbers of genes, thus highlighting its capacity to effectively generalize to diverse datasets.

查看原文本刊更多论文

DiSCO：通过基本扩散模型的组合优化来反卷积空间转录组学。

破译空间转录组学（ST）数据中空间点的细胞组成是阐明组织空间结构异质性的基础。然而，对于每个新的反卷积任务，现有模型通常需要重新训练，这反映了泛化性能和计算效率的局限性。为了解决这个问题，我们设计了一个基于组合优化的基本扩散模型来反卷积空间转录组学，称为DiSCO。DiSCO将ST数据的反卷积表述为任务特异性反卷积组合优化（CO）问题，其中单个细胞（SCs）被分配到空间点，以最佳地保存每个点的基因表达谱。DiSCO引入了一个二部图扩散模型作为优化求解器，专门设计用于推广到任何新的反卷积CO问题。DiSCO使用SCs和空间斑点的基因表达谱作为输入，对大量反卷积任务进行预训练，学习真实解的分布，并通过采样生成近似解，从而确定每个斑点的细胞组成。作为一种可泛化的反卷积求解器，DiSCO在模拟数据集和真实数据集上进行了实验评估，结果表明，预训练的DiSCO模型在不同分辨率和不同基因数量的数据集上都能有效地执行，从而突出了其有效泛化到不同数据集的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.