GatorST：用于空间转录组数据分析的多功能对比元学习框架。

bioRxiv : the preprint server for biology Pub Date : 2025-07-19 DOI:10.1101/2025.07.01.662625

Song Wang, Yuxi Liu, Zhenhao Zhang, Qin Ma, Qianqian Song, Jiang Bian

{"title":"GatorST：用于空间转录组数据分析的多功能对比元学习框架。","authors":"Song Wang, Yuxi Liu, Zhenhao Zhang, Qin Ma, Qianqian Song, Jiang Bian","doi":"10.1101/2025.07.01.662625","DOIUrl":null,"url":null,"abstract":"Introduction: Recent advances in spatial transcriptomics (ST) technologies have revolutionized our understanding of cellular functions by providing gene expression profiles with rich spatial context. Effectively learning spatial representations is crucial for downstream analyses and requires robust integration of spatial information with transcriptomic data. While existing methods have shown promise, they often fail to adequately capture both local (neighbor-level) and global (tissue-wide) spatial contexts. Moreover, they tend to rely heavily on augmentation strategies, which can introduce noise and instability.Objectives: This study aims to introduce and demonstrate a novel, versatile framework called GatorST, which explicitly combines graph-based modeling with advanced learning strategies to generate spatially informed representations of ST data. GatorST is designed to improve various downstream tasks, including identification of spatial domains, gene expression imputation, batch effect removal, and trajectory inference.Methods: GatorST constructs a spot-spot graph by connecting each node to its k nearest spatial neighbors and extracts two-hop neighborhood subgraphs to capture local context. At the global level, gene expression profiles are clustered using soft K-means to generate pseudo-labels, which serve as weak supervision signals within a contrastive learning framework. This process encourages the alignment of embeddings with shared pseudo-labels while separating those with different labels. GatorST further adopts an episodic training strategy inspired by meta-learning, wherein each episode consists of a support set for contrastive optimization and a disjoint query set for embedding classification, guided by the pseudo-labeled data. This design enables the model to classify unseen samples based on learned embeddings, thereby enhancing its generalization to new spatial contexts.Results: Comprehensive comparisons with fifteen state-of-the-art methods across fourteen spatial transcriptomics datasets demonstrate that GatorST consistently achieves superior performance in identifying spatial domains, imputing gene expressions, and removing batch effects. The results showcase the versatility and strong generalization capabilities of GatorST across diverse tissue types and experimental settings.Conclusion: GatorST effectively integrates spatial topology and global gene expression through graph-based modeling, pseudo-labeling, and contrastive meta-learning. This framework generates biologically meaningful representations and significantly improves key downstream tasks, including spatial domain identification, gene expression imputation, batch effect removal, and trajectory inference.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12265632/pdf/","citationCount":"0","resultStr":"{\"title\":\"GatorST: A Versatile Contrastive Meta-Learning Framework for Spatial Transcriptomic Data Analysis.\",\"authors\":\"Song Wang, Yuxi Liu, Zhenhao Zhang, Qin Ma, Qianqian Song, Jiang Bian\",\"doi\":\"10.1101/2025.07.01.662625\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Recent advances in spatial transcriptomics (ST) technologies have revolutionized our understanding of cellular functions by providing gene expression profiles with rich spatial context. Effectively learning spatial representations is crucial for downstream analyses and requires robust integration of spatial information with transcriptomic data. While existing methods have shown promise, they often fail to adequately capture both local (neighbor-level) and global (tissue-wide) spatial contexts. Moreover, they tend to rely heavily on augmentation strategies, which can introduce noise and instability.Objectives: This study aims to introduce and demonstrate a novel, versatile framework called GatorST, which explicitly combines graph-based modeling with advanced learning strategies to generate spatially informed representations of ST data. GatorST is designed to improve various downstream tasks, including identification of spatial domains, gene expression imputation, batch effect removal, and trajectory inference.Methods: GatorST constructs a spot-spot graph by connecting each node to its k nearest spatial neighbors and extracts two-hop neighborhood subgraphs to capture local context. At the global level, gene expression profiles are clustered using soft K-means to generate pseudo-labels, which serve as weak supervision signals within a contrastive learning framework. This process encourages the alignment of embeddings with shared pseudo-labels while separating those with different labels. GatorST further adopts an episodic training strategy inspired by meta-learning, wherein each episode consists of a support set for contrastive optimization and a disjoint query set for embedding classification, guided by the pseudo-labeled data. This design enables the model to classify unseen samples based on learned embeddings, thereby enhancing its generalization to new spatial contexts.Results: Comprehensive comparisons with fifteen state-of-the-art methods across fourteen spatial transcriptomics datasets demonstrate that GatorST consistently achieves superior performance in identifying spatial domains, imputing gene expressions, and removing batch effects. The results showcase the versatility and strong generalization capabilities of GatorST across diverse tissue types and experimental settings.Conclusion: GatorST effectively integrates spatial topology and global gene expression through graph-based modeling, pseudo-labeling, and contrastive meta-learning. This framework generates biologically meaningful representations and significantly improves key downstream tasks, including spatial domain identification, gene expression imputation, batch effect removal, and trajectory inference.\",\"PeriodicalId\":519960,\"journal\":{\"name\":\"bioRxiv : the preprint server for biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12265632/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv : the preprint server for biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2025.07.01.662625\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.07.01.662625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

空间转录组学（ST）技术的最新进展通过提供具有丰富空间背景的基因表达谱，彻底改变了我们对细胞功能的理解。有效地学习空间表征对于下游分析至关重要，并且需要将空间信息与转录组学数据进行强大的整合。虽然现有的方法已经显示出希望，但它们往往无法充分捕获局部（邻居级）和全局（组织范围）的空间上下文。此外，它们往往严重依赖于可能引入噪声和不稳定性的增强策略。目的：本研究旨在介绍并展示一种名为GatorST的新型通用框架，该框架明确地将基于图的建模与先进的学习策略相结合，以生成ST数据的空间知情表示。GatorST旨在改善各种下游任务，包括空间域识别、基因表达imputation、批次效应去除和轨迹推断。方法：GatorST通过将每个节点与其k个最近的空间邻居连接，构建点-点图，提取两跳邻域子图来捕获局部上下文。在全局水平上，基因表达谱使用软k均值聚类来生成伪标签，这些伪标签在对比学习框架中充当弱监督信号。这个过程鼓励对具有共享伪标签的嵌入进行对齐，同时分离具有不同标签的嵌入。GatorST进一步采用了受元学习启发的情景训练策略，其中每一集由一个用于对比优化的支持集和一个用于嵌入分类的不相交查询集组成，以伪标记数据为指导。这种设计使模型能够基于学习嵌入对未见样本进行分类，从而增强其对新空间上下文的泛化能力。结果：与14个空间转录组学数据集的15种最先进的方法进行综合比较表明，GatorST在识别空间域、输入基因表达和去除批次效应方面始终具有卓越的性能。结果显示GatorST在不同组织类型和实验设置中的通用性和强大的泛化能力。结论：GatorST通过基于图的建模、伪标记和对比元学习有效地整合了空间拓扑和全局基因表达。该框架生成了具有生物学意义的表示，并显著改善了关键的下游任务，包括空间域识别、基因表达插入、批次效应去除和轨迹推断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GatorST: A Versatile Contrastive Meta-Learning Framework for Spatial Transcriptomic Data Analysis.

Introduction: Recent advances in spatial transcriptomics (ST) technologies have revolutionized our understanding of cellular functions by providing gene expression profiles with rich spatial context. Effectively learning spatial representations is crucial for downstream analyses and requires robust integration of spatial information with transcriptomic data. While existing methods have shown promise, they often fail to adequately capture both local (neighbor-level) and global (tissue-wide) spatial contexts. Moreover, they tend to rely heavily on augmentation strategies, which can introduce noise and instability.

Objectives: This study aims to introduce and demonstrate a novel, versatile framework called GatorST, which explicitly combines graph-based modeling with advanced learning strategies to generate spatially informed representations of ST data. GatorST is designed to improve various downstream tasks, including identification of spatial domains, gene expression imputation, batch effect removal, and trajectory inference.

Methods: GatorST constructs a spot-spot graph by connecting each node to its k nearest spatial neighbors and extracts two-hop neighborhood subgraphs to capture local context. At the global level, gene expression profiles are clustered using soft K-means to generate pseudo-labels, which serve as weak supervision signals within a contrastive learning framework. This process encourages the alignment of embeddings with shared pseudo-labels while separating those with different labels. GatorST further adopts an episodic training strategy inspired by meta-learning, wherein each episode consists of a support set for contrastive optimization and a disjoint query set for embedding classification, guided by the pseudo-labeled data. This design enables the model to classify unseen samples based on learned embeddings, thereby enhancing its generalization to new spatial contexts.

Results: Comprehensive comparisons with fifteen state-of-the-art methods across fourteen spatial transcriptomics datasets demonstrate that GatorST consistently achieves superior performance in identifying spatial domains, imputing gene expressions, and removing batch effects. The results showcase the versatility and strong generalization capabilities of GatorST across diverse tissue types and experimental settings.

Conclusion: GatorST effectively integrates spatial topology and global gene expression through graph-based modeling, pseudo-labeling, and contrastive meta-learning. This framework generates biologically meaningful representations and significantly improves key downstream tasks, including spatial domain identification, gene expression imputation, batch effect removal, and trajectory inference.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

bioRxiv : the preprint server for biology

自引率

0.00%

发文量