PCA-UMAP Manifold 上的扩散:数据结构保存对高维单细胞 RNA 测序数据去噪的影响

Biology Pub Date : 2024-07-09 DOI:10.3390/biology13070512
Padron-Manrique Cristian, Vázquez-Jiménez Aarón, Esquivel-Hernandez Diego Armando, Martínez-López Yoscelina Estrella, Neri-Rosario Daniel, Girón-Villalobos David, Mixcoha Edgar, Sánchez-Castañeda Jean Paul, Resendis-Antonio Osbaldo
{"title":"PCA-UMAP Manifold 上的扩散:数据结构保存对高维单细胞 RNA 测序数据去噪的影响","authors":"Padron-Manrique Cristian, Vázquez-Jiménez Aarón, Esquivel-Hernandez Diego Armando, Martínez-López Yoscelina Estrella, Neri-Rosario Daniel, Girón-Villalobos David, Mixcoha Edgar, Sánchez-Castañeda Jean Paul, Resendis-Antonio Osbaldo","doi":"10.3390/biology13070512","DOIUrl":null,"url":null,"abstract":"Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the ‘curse of dimensionality’, leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.","PeriodicalId":504576,"journal":{"name":"Biology","volume":"52 16","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diffusion on PCA-UMAP Manifold: The Impact of Data Structure Preservation to Denoise High-Dimensional Single-Cell RNA Sequencing Data\",\"authors\":\"Padron-Manrique Cristian, Vázquez-Jiménez Aarón, Esquivel-Hernandez Diego Armando, Martínez-López Yoscelina Estrella, Neri-Rosario Daniel, Girón-Villalobos David, Mixcoha Edgar, Sánchez-Castañeda Jean Paul, Resendis-Antonio Osbaldo\",\"doi\":\"10.3390/biology13070512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the ‘curse of dimensionality’, leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.\",\"PeriodicalId\":504576,\"journal\":{\"name\":\"Biology\",\"volume\":\"52 16\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/biology13070512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/biology13070512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

单细胞转录组学(scRNA-seq)正在给生物学研究带来革命性的变化,但它也面临着转录本捕获效率低和噪音大等挑战。为了应对这些挑战,人们使用了邻域平均或图扩散等方法。这些方法通常依赖于低维流形的 k 近邻图。然而,scRNA-seq 数据受到 "维度诅咒 "的影响,在使用估算方法时会导致数据过度平滑。为了克服这一问题,sc-PHENIX 采用了 PCA-UMAP 扩散方法,这种方法能更好地保留数据结构,并允许精细使用 PCA 维度和扩散参数(如 k 近邻、马尔可夫矩阵的指数化),以最大限度地减少噪声的引入。sc-PHENIX 通过各种 scRNA-seq 数据集的验证,显著减轻了过度平滑的问题,并展示了更好的细胞表型表示。将 sc-PHENIX 应用于多细胞肿瘤球状体数据集时,它识别出了已知的极端表型状态,展示了其有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Diffusion on PCA-UMAP Manifold: The Impact of Data Structure Preservation to Denoise High-Dimensional Single-Cell RNA Sequencing Data
Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the ‘curse of dimensionality’, leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信