{"title":"利用矢量量化和自规则化进行大规模数据聚类的自适应邻域图学习","authors":"","doi":"10.1016/j.asoc.2024.112256","DOIUrl":null,"url":null,"abstract":"<div><div>In traditional adaptive neighbors graph learning (ANGL)-based clustering, the time complexity is more than <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span>, where <span><math><mi>n</mi></math></span> is the number of data points, which is not scalable for large-scale data problems in real applications. Subsequently, ANGL adds a balance regularization to its objective function to avoid the sparse over-fitting problem in the learned similarity graph matrix. Still, the regularization may leads to many weak connections between data points in different clusters. To address these problems, we propose a new fast clustering method, namely, <strong>A</strong>daptive <strong>N</strong>eighbors <strong>G</strong>raph <strong>L</strong>earning for <strong>L</strong>arge-Scale <strong>D</strong>ata <strong>C</strong>lustering using Vector Quantization and Self-Regularization (ANGL-LDC), to perform vector quantization (VQ) on original data and feed the obtained VQ data as the input in the <span><math><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow></math></span> similarity graph matrix learning. Hence, the <span><math><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow></math></span> similarity graph matrix learning problem is simplified to weighted <span><math><mrow><mi>m</mi><mo>×</mo><mi>m</mi></mrow></math></span> <span><math><mrow><mo>(</mo><mi>m</mi><mo>≪</mo><mi>n</mi><mo>)</mo></mrow></math></span> graph learning problem, where <span><math><mi>m</mi></math></span> is the number of distinct points and weight is the duplicate times of distinct points in VQ data. Consequently, the time complexity of ANGL-LDC is much lower than that of ANGL. At the same time, we propose a new ANGL objective function with a graph connection self-regularization mechanism, where the ANGL-LDC objective function will get an infinity value if the value of one graph connection is equal to 1. Therefore, ANGL-LDC naturally avoids obtaining the sparse over-fitting problem since we need to minimize the value of ANGL-LDC’s objective function. Experimental results on synthetic and real-world datasets demonstrate the scalability and effectiveness of ANGL-LDC.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Neighbors Graph Learning for Large-Scale Data Clustering using Vector Quantization and Self-Regularization\",\"authors\":\"\",\"doi\":\"10.1016/j.asoc.2024.112256\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In traditional adaptive neighbors graph learning (ANGL)-based clustering, the time complexity is more than <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span>, where <span><math><mi>n</mi></math></span> is the number of data points, which is not scalable for large-scale data problems in real applications. Subsequently, ANGL adds a balance regularization to its objective function to avoid the sparse over-fitting problem in the learned similarity graph matrix. Still, the regularization may leads to many weak connections between data points in different clusters. To address these problems, we propose a new fast clustering method, namely, <strong>A</strong>daptive <strong>N</strong>eighbors <strong>G</strong>raph <strong>L</strong>earning for <strong>L</strong>arge-Scale <strong>D</strong>ata <strong>C</strong>lustering using Vector Quantization and Self-Regularization (ANGL-LDC), to perform vector quantization (VQ) on original data and feed the obtained VQ data as the input in the <span><math><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow></math></span> similarity graph matrix learning. Hence, the <span><math><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow></math></span> similarity graph matrix learning problem is simplified to weighted <span><math><mrow><mi>m</mi><mo>×</mo><mi>m</mi></mrow></math></span> <span><math><mrow><mo>(</mo><mi>m</mi><mo>≪</mo><mi>n</mi><mo>)</mo></mrow></math></span> graph learning problem, where <span><math><mi>m</mi></math></span> is the number of distinct points and weight is the duplicate times of distinct points in VQ data. Consequently, the time complexity of ANGL-LDC is much lower than that of ANGL. At the same time, we propose a new ANGL objective function with a graph connection self-regularization mechanism, where the ANGL-LDC objective function will get an infinity value if the value of one graph connection is equal to 1. Therefore, ANGL-LDC naturally avoids obtaining the sparse over-fitting problem since we need to minimize the value of ANGL-LDC’s objective function. Experimental results on synthetic and real-world datasets demonstrate the scalability and effectiveness of ANGL-LDC.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1568494624010305\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624010305","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Adaptive Neighbors Graph Learning for Large-Scale Data Clustering using Vector Quantization and Self-Regularization
In traditional adaptive neighbors graph learning (ANGL)-based clustering, the time complexity is more than , where is the number of data points, which is not scalable for large-scale data problems in real applications. Subsequently, ANGL adds a balance regularization to its objective function to avoid the sparse over-fitting problem in the learned similarity graph matrix. Still, the regularization may leads to many weak connections between data points in different clusters. To address these problems, we propose a new fast clustering method, namely, Adaptive Neighbors Graph Learning for Large-Scale Data Clustering using Vector Quantization and Self-Regularization (ANGL-LDC), to perform vector quantization (VQ) on original data and feed the obtained VQ data as the input in the similarity graph matrix learning. Hence, the similarity graph matrix learning problem is simplified to weighted graph learning problem, where is the number of distinct points and weight is the duplicate times of distinct points in VQ data. Consequently, the time complexity of ANGL-LDC is much lower than that of ANGL. At the same time, we propose a new ANGL objective function with a graph connection self-regularization mechanism, where the ANGL-LDC objective function will get an infinity value if the value of one graph connection is equal to 1. Therefore, ANGL-LDC naturally avoids obtaining the sparse over-fitting problem since we need to minimize the value of ANGL-LDC’s objective function. Experimental results on synthetic and real-world datasets demonstrate the scalability and effectiveness of ANGL-LDC.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.