Rian Pratama, Jason Hilton, J Michael Cherry, Giltae Song
{"title":"Gene Spatial Integration: enhancing spatial transcriptomics analysis via deep learning and batch effect mitigation.","authors":"Rian Pratama, Jason Hilton, J Michael Cherry, Giltae Song","doi":"10.1093/bioinformatics/btaf350","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics (ST) is a groundbreaking technique for studying the correlation between cellular organization within a tissue and their physiological and pathological properties. Every facet of spatial information, including cell/spot proximity, distribution, and dimensionality, is significant. Most methods lean heavily on proximity for ST analysis, each resulting in useful insights but still leaving other aspects untapped. In addition, samples procured at different times, different donors, and by different technologies introduce a batch effects problem that hinders the statistical approach employed by most analysis tools. Addressing these challenges, we have developed a deep learning method for analyzing integrated multiple ST data, focusing on the distribution aspect. Furthermore, our method aims to leverage single-cell analysis tools.</p><p><strong>Results: </strong>Our study introduces Gene Spatial Integration (GSI), a data integration pipeline utilizing representation learning approach to extract spatial distribution of genes into the same feature space as gene expression features. We employ Autoencoder network to extract spatial embedding, facilitating the projection of spatial features into gene expression feature space. Our approach allows for seamless integration of multiple samples with minimum detriment, increasing the performance of the ST data analysis tool. We show application of our method on human DLPFC dataset. Our method consistently improves the performance of the clustering of Seurat tools, with the most significant increase observed in sample 151673, almost doubling the ARI score from 0.225 to 0.405. We also combine our pipeline with the clustering of GraphST, achieving a significantly higher ARI score in sample 151672 from 0.614 to 0.795. This result reveals the potential of gene distribution spatial aspect, also emphasizes the impact of integration and batch effect removal in developing a refined analysis in understanding tissue characteristics.</p><p><strong>Availability: </strong>Implementation of GSI is accessible at https://github.com/Riandanis/Spatial_Integration_GSI.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Spatial transcriptomics (ST) is a groundbreaking technique for studying the correlation between cellular organization within a tissue and their physiological and pathological properties. Every facet of spatial information, including cell/spot proximity, distribution, and dimensionality, is significant. Most methods lean heavily on proximity for ST analysis, each resulting in useful insights but still leaving other aspects untapped. In addition, samples procured at different times, different donors, and by different technologies introduce a batch effects problem that hinders the statistical approach employed by most analysis tools. Addressing these challenges, we have developed a deep learning method for analyzing integrated multiple ST data, focusing on the distribution aspect. Furthermore, our method aims to leverage single-cell analysis tools.
Results: Our study introduces Gene Spatial Integration (GSI), a data integration pipeline utilizing representation learning approach to extract spatial distribution of genes into the same feature space as gene expression features. We employ Autoencoder network to extract spatial embedding, facilitating the projection of spatial features into gene expression feature space. Our approach allows for seamless integration of multiple samples with minimum detriment, increasing the performance of the ST data analysis tool. We show application of our method on human DLPFC dataset. Our method consistently improves the performance of the clustering of Seurat tools, with the most significant increase observed in sample 151673, almost doubling the ARI score from 0.225 to 0.405. We also combine our pipeline with the clustering of GraphST, achieving a significantly higher ARI score in sample 151672 from 0.614 to 0.795. This result reveals the potential of gene distribution spatial aspect, also emphasizes the impact of integration and batch effect removal in developing a refined analysis in understanding tissue characteristics.
Availability: Implementation of GSI is accessible at https://github.com/Riandanis/Spatial_Integration_GSI.
Supplementary information: Supplementary data are available at Bioinformatics online.