Wei Zhang , Xinci Liu , Tong Chen , Wenxin Xu , Collin Sakal , Ximing Nie , Long Wang , Xinyue Li
{"title":"Bridging imaging and genomics: Domain knowledge guided spatial transcriptomics analysis","authors":"Wei Zhang , Xinci Liu , Tong Chen , Wenxin Xu , Collin Sakal , Ximing Nie , Long Wang , Xinyue Li","doi":"10.1016/j.inffus.2025.103746","DOIUrl":null,"url":null,"abstract":"<div><div>Spatial Transcriptomics (ST) provides spatially resolved gene expression distributions mapped onto high-resolution Whole Slide Images (WSIs), revealing the association between cellular morphology and gene expression profiles. However, the high costs and equipment constraints associated with ST data collection have led to a scarcity of ST datasets. Moreover, existing ST datasets often exhibit sparse gene expression distributions, which limit the accuracy and generalizability of gene expression prediction models derived from WSIs. To address these challenges, we propose DomainST (Domain knowledge-guided Spatial Transcriptomics analysis), a novel framework that leverages domain knowledge through Large Language Models (LLMs) to extract effective gene representations and utilizes foundation models to obtain robust image features for enhanced spatial gene expression prediction. Specifically, we utilize public gene reference databases to retrieve comprehensive gene summaries and employ LLMs to refine gene descriptions and generate informative gene embeddings. Concurrently, we apply medical visual-language foundation models to distill robust image representations at multiple scales, capturing the spatial context of WSIs. We further design a multimodal mixture of experts fusion module to effectively integrate multimodal data, leveraging complementary information across modalities. Extensive experiments conducted on three public ST datasets indicate that our method consistently outperforms state-of-the-art (SOTA) methods, with increases ranging from 6.7 % to 13.7 % in PCC@50 across all datasets compared to the SOTA, demonstrating the effectiveness of combining foundation models and LLM-derived domain knowledge for gene expression prediction. Our code and gene features are available at <span><span>https://github.com/coffeeNtv/DomainST</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103746"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008085","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Spatial Transcriptomics (ST) provides spatially resolved gene expression distributions mapped onto high-resolution Whole Slide Images (WSIs), revealing the association between cellular morphology and gene expression profiles. However, the high costs and equipment constraints associated with ST data collection have led to a scarcity of ST datasets. Moreover, existing ST datasets often exhibit sparse gene expression distributions, which limit the accuracy and generalizability of gene expression prediction models derived from WSIs. To address these challenges, we propose DomainST (Domain knowledge-guided Spatial Transcriptomics analysis), a novel framework that leverages domain knowledge through Large Language Models (LLMs) to extract effective gene representations and utilizes foundation models to obtain robust image features for enhanced spatial gene expression prediction. Specifically, we utilize public gene reference databases to retrieve comprehensive gene summaries and employ LLMs to refine gene descriptions and generate informative gene embeddings. Concurrently, we apply medical visual-language foundation models to distill robust image representations at multiple scales, capturing the spatial context of WSIs. We further design a multimodal mixture of experts fusion module to effectively integrate multimodal data, leveraging complementary information across modalities. Extensive experiments conducted on three public ST datasets indicate that our method consistently outperforms state-of-the-art (SOTA) methods, with increases ranging from 6.7 % to 13.7 % in PCC@50 across all datasets compared to the SOTA, demonstrating the effectiveness of combining foundation models and LLM-derived domain knowledge for gene expression prediction. Our code and gene features are available at https://github.com/coffeeNtv/DomainST.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.