Gene Spatial Integration: enhancing spatial transcriptomics analysis via deep learning and batch effect mitigation.

Bioinformatics (Oxford, England) Pub Date : 2025-06-13 DOI:10.1093/bioinformatics/btaf350

Rian Pratama, Jason Hilton, J Michael Cherry, Giltae Song

{"title":"Gene Spatial Integration: enhancing spatial transcriptomics analysis via deep learning and batch effect mitigation.","authors":"Rian Pratama, Jason Hilton, J Michael Cherry, Giltae Song","doi":"10.1093/bioinformatics/btaf350","DOIUrl":null,"url":null,"abstract":"Motivation: Spatial transcriptomics (ST) is a groundbreaking technique for studying the correlation between cellular organization within a tissue and their physiological and pathological properties. Every facet of spatial information, including cell/spot proximity, distribution, and dimensionality, is significant. Most methods lean heavily on proximity for ST analysis, each resulting in useful insights but still leaving other aspects untapped. In addition, samples procured at different times, different donors, and by different technologies introduce a batch effects problem that hinders the statistical approach employed by most analysis tools. Addressing these challenges, we have developed a deep learning method for analyzing integrated multiple ST data, focusing on the distribution aspect. Furthermore, our method aims to leverage single-cell analysis tools.Results: Our study introduces Gene Spatial Integration (GSI), a data integration pipeline utilizing representation learning approach to extract spatial distribution of genes into the same feature space as gene expression features. We employ Autoencoder network to extract spatial embedding, facilitating the projection of spatial features into gene expression feature space. Our approach allows for seamless integration of multiple samples with minimum detriment, increasing the performance of the ST data analysis tool. We show application of our method on human DLPFC dataset. Our method consistently improves the performance of the clustering of Seurat tools, with the most significant increase observed in sample 151673, almost doubling the ARI score from 0.225 to 0.405. We also combine our pipeline with the clustering of GraphST, achieving a significantly higher ARI score in sample 151672 from 0.614 to 0.795. This result reveals the potential of gene distribution spatial aspect, also emphasizes the impact of integration and batch effect removal in developing a refined analysis in understanding tissue characteristics.Availability: Implementation of GSI is accessible at https://github.com/Riandanis/Spatial_Integration_GSI.Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: Spatial transcriptomics (ST) is a groundbreaking technique for studying the correlation between cellular organization within a tissue and their physiological and pathological properties. Every facet of spatial information, including cell/spot proximity, distribution, and dimensionality, is significant. Most methods lean heavily on proximity for ST analysis, each resulting in useful insights but still leaving other aspects untapped. In addition, samples procured at different times, different donors, and by different technologies introduce a batch effects problem that hinders the statistical approach employed by most analysis tools. Addressing these challenges, we have developed a deep learning method for analyzing integrated multiple ST data, focusing on the distribution aspect. Furthermore, our method aims to leverage single-cell analysis tools.

Results: Our study introduces Gene Spatial Integration (GSI), a data integration pipeline utilizing representation learning approach to extract spatial distribution of genes into the same feature space as gene expression features. We employ Autoencoder network to extract spatial embedding, facilitating the projection of spatial features into gene expression feature space. Our approach allows for seamless integration of multiple samples with minimum detriment, increasing the performance of the ST data analysis tool. We show application of our method on human DLPFC dataset. Our method consistently improves the performance of the clustering of Seurat tools, with the most significant increase observed in sample 151673, almost doubling the ARI score from 0.225 to 0.405. We also combine our pipeline with the clustering of GraphST, achieving a significantly higher ARI score in sample 151672 from 0.614 to 0.795. This result reveals the potential of gene distribution spatial aspect, also emphasizes the impact of integration and batch effect removal in developing a refined analysis in understanding tissue characteristics.

Availability: Implementation of GSI is accessible at https://github.com/Riandanis/Spatial_Integration_GSI.

Supplementary information: Supplementary data are available at Bioinformatics online.

查看原文本刊更多论文

基因空间整合：通过深度学习和批次效应缓解增强空间转录组学分析。

动机：空间转录组学（ST）是一项开创性的技术，用于研究组织内细胞组织与其生理和病理特性之间的关系。空间信息的每个方面，包括单元/点的接近度、分布和维度，都是重要的。大多数方法严重依赖于ST分析的接近性，每种方法都会产生有用的见解，但仍未开发其他方面。此外，在不同的时间、不同的供体和不同的技术中获得的样本引入了批量效应问题，这阻碍了大多数分析工具所采用的统计方法。为了应对这些挑战，我们开发了一种深度学习方法来分析集成的多个ST数据，重点关注分布方面。此外，我们的方法旨在利用单细胞分析工具。结果：我们的研究引入了基因空间集成（GSI），这是一种利用表示学习方法将基因的空间分布提取到与基因表达特征相同的特征空间中的数据集成管道。我们采用Autoencoder网络提取空间嵌入，便于空间特征投影到基因表达特征空间中。我们的方法允许以最小的损害无缝集成多个样本，提高ST数据分析工具的性能。我们展示了我们的方法在人类DLPFC数据集上的应用。我们的方法持续提高了Seurat工具的聚类性能，在样本151673中观察到最显著的提高，ARI得分几乎翻了一番，从0.225提高到0.405。我们还将我们的管道与GraphST聚类相结合，在样本151672中获得了明显更高的ARI分数，从0.614到0.795。这一结果揭示了基因分布空间方面的潜力，也强调了整合和批次效应去除在发展精细化分析以理解组织特征方面的影响。可获得性：GSI的实施可在https://github.com/Riandanis/Spatial_Integration_GSI.Supplementary信息上获得；补充数据可在Bioinformatics在线上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量