{"title":"Spatial histology and gene-expression representation and generative learning via online self-distillation contrastive learning.","authors":"Qianyi Yan, Xuan Li, Jiangnan Cui, Jianming Rong, Jingsong Zhang, Pingting Gao, Yaochen Xu, Fufang Qiu, Chunman Zuo","doi":"10.1093/bib/bbaf317","DOIUrl":null,"url":null,"abstract":"<p><p>Spatial transcriptomics quantifies spatial molecular profiles alongside histology, enabling computational prediction of spatial gene expression distribution directly from whole slide images. Inspired by image-to-text alignment and generation, we introduce Magic, a self-training contrastive learning model designed for histology-to-gene expression prediction. Magic (i) employs contrastive learning to derive shared embeddings for histology and gene expression while utilizing a momentum-based module to generate pseudo-targets to reduce the impact of noise; and (ii) leverages a transformer-based decoder to predict the expression of 300 genes based on histological features. Trained on 75 760 spots from 56 breast cancer slices and validated on 11 026 spots from five independent slices, Magic outperforms existing methods in aligning and generating histology-gene expression data, achieving a 10% improvement over the second-best approach. Furthermore, Magic demonstrates robust generalization, effectively predicting gene expression in colorectal cancer samples and The Cancer Genome Atlas (TCGA) datasets through zero-shot learning. Notably, Magic's predicted gene expression captures interpatient differences, highlighting its strong potential for clinical applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12229093/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf317","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Spatial transcriptomics quantifies spatial molecular profiles alongside histology, enabling computational prediction of spatial gene expression distribution directly from whole slide images. Inspired by image-to-text alignment and generation, we introduce Magic, a self-training contrastive learning model designed for histology-to-gene expression prediction. Magic (i) employs contrastive learning to derive shared embeddings for histology and gene expression while utilizing a momentum-based module to generate pseudo-targets to reduce the impact of noise; and (ii) leverages a transformer-based decoder to predict the expression of 300 genes based on histological features. Trained on 75 760 spots from 56 breast cancer slices and validated on 11 026 spots from five independent slices, Magic outperforms existing methods in aligning and generating histology-gene expression data, achieving a 10% improvement over the second-best approach. Furthermore, Magic demonstrates robust generalization, effectively predicting gene expression in colorectal cancer samples and The Cancer Genome Atlas (TCGA) datasets through zero-shot learning. Notably, Magic's predicted gene expression captures interpatient differences, highlighting its strong potential for clinical applications.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.