snn - cliq++:改进的基于图的细胞聚类方法

Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference Pub Date : 2020-08-20 DOI:10.1145/3407703.3407731

Lilu Guo, S. Bu, Yongjin Gan, Jianbo Lu, Xiaoshu Zhu

{"title":"snn - cliq++:改进的基于图的细胞聚类方法","authors":"Lilu Guo, S. Bu, Yongjin Gan, Jianbo Lu, Xiaoshu Zhu","doi":"10.1145/3407703.3407731","DOIUrl":null,"url":null,"abstract":"Cell typing using sing-cell RNA-seq data is the basis of precision medicine, life development & evolution, and drug research & development, etc. However, those data is characterized by ultrahigh dimensions, small samples, no labeling, and high noise, which bring challenges to traditional clustering methods, e.g. poor cell typing performance, high computational cost, and difficulty in parameter adjustment. SNN-Cliq is an outstanding clustering algorithm for cell typing proposed in 2015, with unique characters of simple and efficient computing process, good scalability and insensitiveness of parameters. Based on the frame of previous works [5], three improvements were proposed in our new method, namely SNN-Cliq++. Firstly, we replaced Euclidean distance with Spearman correlation coefficient to measurement the similarity between each cells pairs. Secondly, we optimize parameter k constrained by min |clusterNum-trueNum|, note that this process does not cost much time. Thirdly, we add negative indicate matric to forbid connection between cells which have top negative Spearman correlation coefficient. In extensive datasets, results reveal new algorithms has remarkable improvement than original, NMI rises 20.5% and ARI rises 28.6% in average.","PeriodicalId":284603,"journal":{"name":"Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SNN-Cliq++: Improved Cell Clustering Method Based on Graph\",\"authors\":\"Lilu Guo, S. Bu, Yongjin Gan, Jianbo Lu, Xiaoshu Zhu\",\"doi\":\"10.1145/3407703.3407731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cell typing using sing-cell RNA-seq data is the basis of precision medicine, life development & evolution, and drug research & development, etc. However, those data is characterized by ultrahigh dimensions, small samples, no labeling, and high noise, which bring challenges to traditional clustering methods, e.g. poor cell typing performance, high computational cost, and difficulty in parameter adjustment. SNN-Cliq is an outstanding clustering algorithm for cell typing proposed in 2015, with unique characters of simple and efficient computing process, good scalability and insensitiveness of parameters. Based on the frame of previous works [5], three improvements were proposed in our new method, namely SNN-Cliq++. Firstly, we replaced Euclidean distance with Spearman correlation coefficient to measurement the similarity between each cells pairs. Secondly, we optimize parameter k constrained by min |clusterNum-trueNum|, note that this process does not cost much time. Thirdly, we add negative indicate matric to forbid connection between cells which have top negative Spearman correlation coefficient. In extensive datasets, results reveal new algorithms has remarkable improvement than original, NMI rises 20.5% and ARI rises 28.6% in average.\",\"PeriodicalId\":284603,\"journal\":{\"name\":\"Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3407703.3407731\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3407703.3407731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

利用单细胞RNA-seq数据进行细胞分型是精准医学、生命发育与进化、药物研发等领域的基础。然而，这些数据具有超高维数、小样本、无标记、高噪声等特点，给传统的聚类方法带来了细胞分型性能差、计算成本高、参数调整困难等挑战。SNN-Cliq是2015年提出的一种优秀的细胞分型聚类算法，具有计算过程简单高效、可扩展性好、参数不敏感等特点。在前人研究[5]框架的基础上，本文提出了snn - cliq++三方面的改进。首先，我们用Spearman相关系数代替欧几里得距离来度量每个细胞对之间的相似性。其次，我们在min |clusterNum-trueNum|的约束下优化参数k，注意这个过程花费的时间并不多。第三，我们添加负指示矩阵来禁止具有最高负Spearman相关系数的细胞之间的连接。在广泛的数据集中，结果表明新算法比原算法有显著的改进，NMI平均提高20.5%，ARI平均提高28.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SNN-Cliq++: Improved Cell Clustering Method Based on Graph

Cell typing using sing-cell RNA-seq data is the basis of precision medicine, life development & evolution, and drug research & development, etc. However, those data is characterized by ultrahigh dimensions, small samples, no labeling, and high noise, which bring challenges to traditional clustering methods, e.g. poor cell typing performance, high computational cost, and difficulty in parameter adjustment. SNN-Cliq is an outstanding clustering algorithm for cell typing proposed in 2015, with unique characters of simple and efficient computing process, good scalability and insensitiveness of parameters. Based on the frame of previous works [5], three improvements were proposed in our new method, namely SNN-Cliq++. Firstly, we replaced Euclidean distance with Spearman correlation coefficient to measurement the similarity between each cells pairs. Secondly, we optimize parameter k constrained by min |clusterNum-trueNum|, note that this process does not cost much time. Thirdly, we add negative indicate matric to forbid connection between cells which have top negative Spearman correlation coefficient. In extensive datasets, results reveal new algorithms has remarkable improvement than original, NMI rises 20.5% and ARI rises 28.6% in average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference

自引率

0.00%

发文量