Unsupervised pattern identification in spatial gene expression atlas reveals mouse brain regions beyond established ontology.

IF 9.4 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Robert Cahill, Yu Wang, R Patrick Xian, Alex J Lee, Hongkui Zeng, Bin Yu, Bosiljka Tasic, Reza Abbasi-Asl
{"title":"Unsupervised pattern identification in spatial gene expression atlas reveals mouse brain regions beyond established ontology.","authors":"Robert Cahill, Yu Wang, R Patrick Xian, Alex J Lee, Hongkui Zeng, Bin Yu, Bosiljka Tasic, Reza Abbasi-Asl","doi":"10.1073/pnas.2319804121","DOIUrl":null,"url":null,"abstract":"<p><p>The rapid growth of large-scale spatial gene expression data demands efficient and reliable computational tools to extract major trends of gene expression in their native spatial context. Here, we used stability-driven unsupervised learning (i.e., staNMF) to identify principal patterns (PPs) of 3D gene expression profiles and understand spatial gene distribution and anatomical localization at the whole mouse brain level. Our subsequent spatial correlation analysis systematically compared the PPs to known anatomical regions and ontology from the Allen Mouse Brain Atlas using spatial neighborhoods. We demonstrate that our stable and spatially coherent PPs, whose linear combinations accurately approximate the spatial gene data, are highly correlated with combinations of expert-annotated brain regions. These PPs yield a brain ontology based purely on spatial gene expression. Our PP identification approach outperforms principal component analysis and typical clustering algorithms on the same task. Moreover, we show that the stable PPs reveal marked regional imbalance of brainwide genetic architecture, leading to region-specific marker genes and gene coexpression networks. Our findings highlight the advantages of stability-driven machine learning for plausible biological discovery from dense spatial gene expression data, streamlining tasks that are infeasible by conventional manual approaches.</p>","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":null,"pages":null},"PeriodicalIF":9.4000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11406299/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2319804121","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid growth of large-scale spatial gene expression data demands efficient and reliable computational tools to extract major trends of gene expression in their native spatial context. Here, we used stability-driven unsupervised learning (i.e., staNMF) to identify principal patterns (PPs) of 3D gene expression profiles and understand spatial gene distribution and anatomical localization at the whole mouse brain level. Our subsequent spatial correlation analysis systematically compared the PPs to known anatomical regions and ontology from the Allen Mouse Brain Atlas using spatial neighborhoods. We demonstrate that our stable and spatially coherent PPs, whose linear combinations accurately approximate the spatial gene data, are highly correlated with combinations of expert-annotated brain regions. These PPs yield a brain ontology based purely on spatial gene expression. Our PP identification approach outperforms principal component analysis and typical clustering algorithms on the same task. Moreover, we show that the stable PPs reveal marked regional imbalance of brainwide genetic architecture, leading to region-specific marker genes and gene coexpression networks. Our findings highlight the advantages of stability-driven machine learning for plausible biological discovery from dense spatial gene expression data, streamlining tasks that are infeasible by conventional manual approaches.

空间基因表达图谱中的无监督模式识别揭示了既定本体之外的小鼠大脑区域。
大规模空间基因表达数据的快速增长需要高效可靠的计算工具来提取基因在其原生空间环境中的主要表达趋势。在这里,我们使用稳定性驱动的无监督学习(即staNMF)来识别三维基因表达谱的主要模式(PPs),了解小鼠全脑水平的空间基因分布和解剖定位。我们随后进行的空间相关性分析利用空间邻域系统地将主模式与艾伦小鼠脑图谱中的已知解剖区域和本体进行了比较。结果表明,我们的PPs具有稳定性和空间一致性,其线性组合与空间基因数据精确近似,并且与专家标注的脑区组合高度相关。这些PP产生了纯粹基于空间基因表达的大脑本体。在同一任务中,我们的PP识别方法优于主成分分析和典型的聚类算法。此外,我们还发现,稳定的PPs揭示了全脑基因结构的明显区域不平衡,从而产生了区域特异性标记基因和基因共表达网络。我们的研究结果凸显了稳定性驱动的机器学习在从密集的空间基因表达数据中发现可信生物学方面的优势,简化了传统人工方法无法完成的任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.00
自引率
0.90%
发文量
3575
审稿时长
2.5 months
期刊介绍: The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信