分析狼疮性肾炎样本单细胞RNA-Seq数据的新方法

Brian Kegerreis, A. Grammer, P. Lipsky
{"title":"分析狼疮性肾炎样本单细胞RNA-Seq数据的新方法","authors":"Brian Kegerreis, A. Grammer, P. Lipsky","doi":"10.1136/LUPUS-2018-LSM.32","DOIUrl":null,"url":null,"abstract":"Background Single-cell RNA-Seq (scRNA-seq) has the potential to increase our understanding of cell populations in lupus. Recently, kidney scRNA-Seq data from lupus nephritis (LN) patients has provided the opportunity to determine the heterogeneity of cells within the affected kidney. However, since individual cells were not identified phenotypically, it is necessary to identify populations computationally. The unique technical challenges of scRNA-Seq data make it difficult to approach this analysis with conventional unsupervised bioinformatics techniques. The implementation of natural language processing (NLP) -inspired techniques, however, makes it possible to identify meaningful clusters of cells without prior knowledge of the cell types present in the sample. Methods We have developed a recursive, unsupervised, heuristic technique (StarShipTM) to dynamically perform top-down, divisive clustering on scRNA-Seq data. First, the cells are mapped onto an n-dimensional unit sphere, where n is the number of available genes. The angles between all cells are used to construct a cosine distance metric: 1-cos(θ). The cosine distance is used to carry out k-means or k-medoids clustering, with k set to 2 for each iteration. At each split of the data, the algorithm evaluates whether it has sorted the remaining cells into meaningful populations and stops making splits when a user-defined criterion is met. Once all clusters are finalized, a Mann-Whitney U test determines genes that distinguish clusters or groups of clusters from other cells. This algorithm was validated using publicly available peripheral blood mononuclear cell (PBMC) scRNA-Seq data from 10X Genomics and tested in scRNA-Seq data from LN patients from the NIAMS AMP RA/SLE initiative. Adjusted Rand Index (ARI) was used to compare generated partitions to known cell types in the PBMC data. Results StarShipTM was used to classify 250 PBMC (50 each of CD14 monocytes, CD19 B cells, CD4 helper T cells, CD8 T cells, and CD56 NK cells). Using dynamic spherical k-means, 6 clusters were generated that closely corresponded to the known cell types (figure 1). For comparison, hierarchical clustering and one-off spherical k-means with k set to 5 were carried out. Hierarchical clustering had an ARI of 0.45, one-off spherical k-means had an ARI of 0.89, and dynamic spherical k-means had an ARI of 0.86. Conclusions This method can effectively partition unknown cells from scRNA-Seq data sets into biologically relevant clusters without prior knowledge of the number of cell types present. The similarity between the performance of the StarShipTM algorithm and one-off k-means, which does incorporate this prior knowledge, highlights the value of this dynamic technique. A full analysis of the AMP LN data is forthcoming. Acknowledgments Research supported by the RILITE Foundation.","PeriodicalId":117843,"journal":{"name":"Big Data Analyses","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BD-08 A novel approach to analyze single cell RNA-Seq data from lupus nephritis samples\",\"authors\":\"Brian Kegerreis, A. Grammer, P. Lipsky\",\"doi\":\"10.1136/LUPUS-2018-LSM.32\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background Single-cell RNA-Seq (scRNA-seq) has the potential to increase our understanding of cell populations in lupus. Recently, kidney scRNA-Seq data from lupus nephritis (LN) patients has provided the opportunity to determine the heterogeneity of cells within the affected kidney. However, since individual cells were not identified phenotypically, it is necessary to identify populations computationally. The unique technical challenges of scRNA-Seq data make it difficult to approach this analysis with conventional unsupervised bioinformatics techniques. The implementation of natural language processing (NLP) -inspired techniques, however, makes it possible to identify meaningful clusters of cells without prior knowledge of the cell types present in the sample. Methods We have developed a recursive, unsupervised, heuristic technique (StarShipTM) to dynamically perform top-down, divisive clustering on scRNA-Seq data. First, the cells are mapped onto an n-dimensional unit sphere, where n is the number of available genes. The angles between all cells are used to construct a cosine distance metric: 1-cos(θ). The cosine distance is used to carry out k-means or k-medoids clustering, with k set to 2 for each iteration. At each split of the data, the algorithm evaluates whether it has sorted the remaining cells into meaningful populations and stops making splits when a user-defined criterion is met. Once all clusters are finalized, a Mann-Whitney U test determines genes that distinguish clusters or groups of clusters from other cells. This algorithm was validated using publicly available peripheral blood mononuclear cell (PBMC) scRNA-Seq data from 10X Genomics and tested in scRNA-Seq data from LN patients from the NIAMS AMP RA/SLE initiative. Adjusted Rand Index (ARI) was used to compare generated partitions to known cell types in the PBMC data. Results StarShipTM was used to classify 250 PBMC (50 each of CD14 monocytes, CD19 B cells, CD4 helper T cells, CD8 T cells, and CD56 NK cells). Using dynamic spherical k-means, 6 clusters were generated that closely corresponded to the known cell types (figure 1). For comparison, hierarchical clustering and one-off spherical k-means with k set to 5 were carried out. Hierarchical clustering had an ARI of 0.45, one-off spherical k-means had an ARI of 0.89, and dynamic spherical k-means had an ARI of 0.86. Conclusions This method can effectively partition unknown cells from scRNA-Seq data sets into biologically relevant clusters without prior knowledge of the number of cell types present. The similarity between the performance of the StarShipTM algorithm and one-off k-means, which does incorporate this prior knowledge, highlights the value of this dynamic technique. A full analysis of the AMP LN data is forthcoming. Acknowledgments Research supported by the RILITE Foundation.\",\"PeriodicalId\":117843,\"journal\":{\"name\":\"Big Data Analyses\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big Data Analyses\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/LUPUS-2018-LSM.32\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Analyses","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/LUPUS-2018-LSM.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

单细胞RNA-Seq (scRNA-seq)有可能增加我们对狼疮细胞群的了解。最近,来自狼疮性肾炎(LN)患者的肾脏scRNA-Seq数据提供了确定受影响肾脏内细胞异质性的机会。然而,由于单个细胞没有被表型鉴定,因此有必要通过计算来鉴定群体。scRNA-Seq数据的独特技术挑战使得传统的无监督生物信息学技术难以接近这种分析。然而,受自然语言处理(NLP)启发的技术的实现使得在没有事先了解样本中存在的细胞类型的情况下识别有意义的细胞簇成为可能。我们开发了一种递归、无监督、启发式技术(StarShipTM),对scRNA-Seq数据动态执行自上而下的分裂聚类。首先,将细胞映射到一个n维单位球上,其中n是可用基因的数量。所有单元格之间的角度用于构造余弦距离度量:1-cos(θ)。余弦距离用于k-means或k-medoids聚类,每次迭代将k设为2。在每次分割数据时,算法都会评估是否已将剩余的单元格分类为有意义的总体,并在满足用户定义的标准时停止分割。一旦所有的细胞簇被最终确定,曼-惠特尼U测试就会确定将细胞簇或细胞簇群与其他细胞区分开来的基因。该算法使用10X Genomics公开的外周血单个核细胞(PBMC) scRNA-Seq数据进行验证,并在NIAMS AMP RA/SLE计划LN患者的scRNA-Seq数据中进行测试。调整后的Rand Index (ARI)用于将生成的分区与PBMC数据中的已知单元格类型进行比较。结果使用StarShipTM对250个PBMC细胞进行分类(CD14单核细胞、CD19 B细胞、CD4辅助性T细胞、CD8 T细胞和CD56 NK细胞各50个)。使用动态球形k-means,生成了6个与已知细胞类型密切对应的聚类(图1)。为了进行比较,我们进行了分层聚类和k设置为5的一次性球形k-means。分层聚类的ARI为0.45,一次性球形k-means的ARI为0.89,动态球形k-means的ARI为0.86。结论该方法可以有效地将scRNA-Seq数据集中的未知细胞划分为生物学相关的簇,而无需事先了解存在的细胞类型的数量。StarShipTM算法的性能与一次性k-means算法之间的相似性突出了这种动态技术的价值,后者确实包含了这种先验知识。AMP LN数据的全面分析即将发布。由RILITE基金会支持的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
BD-08 A novel approach to analyze single cell RNA-Seq data from lupus nephritis samples
Background Single-cell RNA-Seq (scRNA-seq) has the potential to increase our understanding of cell populations in lupus. Recently, kidney scRNA-Seq data from lupus nephritis (LN) patients has provided the opportunity to determine the heterogeneity of cells within the affected kidney. However, since individual cells were not identified phenotypically, it is necessary to identify populations computationally. The unique technical challenges of scRNA-Seq data make it difficult to approach this analysis with conventional unsupervised bioinformatics techniques. The implementation of natural language processing (NLP) -inspired techniques, however, makes it possible to identify meaningful clusters of cells without prior knowledge of the cell types present in the sample. Methods We have developed a recursive, unsupervised, heuristic technique (StarShipTM) to dynamically perform top-down, divisive clustering on scRNA-Seq data. First, the cells are mapped onto an n-dimensional unit sphere, where n is the number of available genes. The angles between all cells are used to construct a cosine distance metric: 1-cos(θ). The cosine distance is used to carry out k-means or k-medoids clustering, with k set to 2 for each iteration. At each split of the data, the algorithm evaluates whether it has sorted the remaining cells into meaningful populations and stops making splits when a user-defined criterion is met. Once all clusters are finalized, a Mann-Whitney U test determines genes that distinguish clusters or groups of clusters from other cells. This algorithm was validated using publicly available peripheral blood mononuclear cell (PBMC) scRNA-Seq data from 10X Genomics and tested in scRNA-Seq data from LN patients from the NIAMS AMP RA/SLE initiative. Adjusted Rand Index (ARI) was used to compare generated partitions to known cell types in the PBMC data. Results StarShipTM was used to classify 250 PBMC (50 each of CD14 monocytes, CD19 B cells, CD4 helper T cells, CD8 T cells, and CD56 NK cells). Using dynamic spherical k-means, 6 clusters were generated that closely corresponded to the known cell types (figure 1). For comparison, hierarchical clustering and one-off spherical k-means with k set to 5 were carried out. Hierarchical clustering had an ARI of 0.45, one-off spherical k-means had an ARI of 0.89, and dynamic spherical k-means had an ARI of 0.86. Conclusions This method can effectively partition unknown cells from scRNA-Seq data sets into biologically relevant clusters without prior knowledge of the number of cell types present. The similarity between the performance of the StarShipTM algorithm and one-off k-means, which does incorporate this prior knowledge, highlights the value of this dynamic technique. A full analysis of the AMP LN data is forthcoming. Acknowledgments Research supported by the RILITE Foundation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信