snn - cliq++:改进的基于图的细胞聚类方法

Lilu Guo, S. Bu, Yongjin Gan, Jianbo Lu, Xiaoshu Zhu
{"title":"snn - cliq++:改进的基于图的细胞聚类方法","authors":"Lilu Guo, S. Bu, Yongjin Gan, Jianbo Lu, Xiaoshu Zhu","doi":"10.1145/3407703.3407731","DOIUrl":null,"url":null,"abstract":"Cell typing using sing-cell RNA-seq data is the basis of precision medicine, life development & evolution, and drug research & development, etc. However, those data is characterized by ultrahigh dimensions, small samples, no labeling, and high noise, which bring challenges to traditional clustering methods, e.g. poor cell typing performance, high computational cost, and difficulty in parameter adjustment. SNN-Cliq is an outstanding clustering algorithm for cell typing proposed in 2015, with unique characters of simple and efficient computing process, good scalability and insensitiveness of parameters. Based on the frame of previous works [5], three improvements were proposed in our new method, namely SNN-Cliq++. Firstly, we replaced Euclidean distance with Spearman correlation coefficient to measurement the similarity between each cells pairs. Secondly, we optimize parameter k constrained by min |clusterNum-trueNum|, note that this process does not cost much time. Thirdly, we add negative indicate matric to forbid connection between cells which have top negative Spearman correlation coefficient. In extensive datasets, results reveal new algorithms has remarkable improvement than original, NMI rises 20.5% and ARI rises 28.6% in average.","PeriodicalId":284603,"journal":{"name":"Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SNN-Cliq++: Improved Cell Clustering Method Based on Graph\",\"authors\":\"Lilu Guo, S. Bu, Yongjin Gan, Jianbo Lu, Xiaoshu Zhu\",\"doi\":\"10.1145/3407703.3407731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cell typing using sing-cell RNA-seq data is the basis of precision medicine, life development & evolution, and drug research & development, etc. However, those data is characterized by ultrahigh dimensions, small samples, no labeling, and high noise, which bring challenges to traditional clustering methods, e.g. poor cell typing performance, high computational cost, and difficulty in parameter adjustment. SNN-Cliq is an outstanding clustering algorithm for cell typing proposed in 2015, with unique characters of simple and efficient computing process, good scalability and insensitiveness of parameters. Based on the frame of previous works [5], three improvements were proposed in our new method, namely SNN-Cliq++. Firstly, we replaced Euclidean distance with Spearman correlation coefficient to measurement the similarity between each cells pairs. Secondly, we optimize parameter k constrained by min |clusterNum-trueNum|, note that this process does not cost much time. Thirdly, we add negative indicate matric to forbid connection between cells which have top negative Spearman correlation coefficient. In extensive datasets, results reveal new algorithms has remarkable improvement than original, NMI rises 20.5% and ARI rises 28.6% in average.\",\"PeriodicalId\":284603,\"journal\":{\"name\":\"Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3407703.3407731\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 Artificial Intelligence and Complex Systems Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3407703.3407731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

利用单细胞RNA-seq数据进行细胞分型是精准医学、生命发育与进化、药物研发等领域的基础。然而,这些数据具有超高维数、小样本、无标记、高噪声等特点,给传统的聚类方法带来了细胞分型性能差、计算成本高、参数调整困难等挑战。SNN-Cliq是2015年提出的一种优秀的细胞分型聚类算法,具有计算过程简单高效、可扩展性好、参数不敏感等特点。在前人研究[5]框架的基础上,本文提出了snn - cliq++三方面的改进。首先,我们用Spearman相关系数代替欧几里得距离来度量每个细胞对之间的相似性。其次,我们在min |clusterNum-trueNum|的约束下优化参数k,注意这个过程花费的时间并不多。第三,我们添加负指示矩阵来禁止具有最高负Spearman相关系数的细胞之间的连接。在广泛的数据集中,结果表明新算法比原算法有显著的改进,NMI平均提高20.5%,ARI平均提高28.6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SNN-Cliq++: Improved Cell Clustering Method Based on Graph
Cell typing using sing-cell RNA-seq data is the basis of precision medicine, life development & evolution, and drug research & development, etc. However, those data is characterized by ultrahigh dimensions, small samples, no labeling, and high noise, which bring challenges to traditional clustering methods, e.g. poor cell typing performance, high computational cost, and difficulty in parameter adjustment. SNN-Cliq is an outstanding clustering algorithm for cell typing proposed in 2015, with unique characters of simple and efficient computing process, good scalability and insensitiveness of parameters. Based on the frame of previous works [5], three improvements were proposed in our new method, namely SNN-Cliq++. Firstly, we replaced Euclidean distance with Spearman correlation coefficient to measurement the similarity between each cells pairs. Secondly, we optimize parameter k constrained by min |clusterNum-trueNum|, note that this process does not cost much time. Thirdly, we add negative indicate matric to forbid connection between cells which have top negative Spearman correlation coefficient. In extensive datasets, results reveal new algorithms has remarkable improvement than original, NMI rises 20.5% and ARI rises 28.6% in average.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信