一种直观、高效的大数据l-范数主成分分析算法

2019 53rd Annual Conference on Information Sciences and Systems (CISS) Pub Date : 2019-03-01 DOI:10.1109/CISS.2019.8692807

Xiaowei Song

{"title":"一种直观、高效的大数据l-范数主成分分析算法","authors":"Xiaowei Song","doi":"10.1109/CISS.2019.8692807","DOIUrl":null,"url":null,"abstract":"Grassmann average (GA) can coincide with Ll- norm principal component (PC) and is scalable for millions of samples. However, it is unclear whether there exists and how much further speed improvement can be gained by revising the fixed-point optimization-based GA. In this paper, I analyze such optimization process in an intuitive way and propose its improvement, i.e., an online algorithm without any iterations. I show that it can be most efficient in the sense that it only visits each sample once per PC, with minimal memory requirement, unlike GA or MATLAB svds. It is proved to be convergent for big data.","PeriodicalId":123696,"journal":{"name":"2019 53rd Annual Conference on Information Sciences and Systems (CISS)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An intuitive and most efficient Ll-norm principal component analysis algorithm for big data\",\"authors\":\"Xiaowei Song\",\"doi\":\"10.1109/CISS.2019.8692807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Grassmann average (GA) can coincide with Ll- norm principal component (PC) and is scalable for millions of samples. However, it is unclear whether there exists and how much further speed improvement can be gained by revising the fixed-point optimization-based GA. In this paper, I analyze such optimization process in an intuitive way and propose its improvement, i.e., an online algorithm without any iterations. I show that it can be most efficient in the sense that it only visits each sample once per PC, with minimal memory requirement, unlike GA or MATLAB svds. It is proved to be convergent for big data.\",\"PeriodicalId\":123696,\"journal\":{\"name\":\"2019 53rd Annual Conference on Information Sciences and Systems (CISS)\",\"volume\":\"132 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 53rd Annual Conference on Information Sciences and Systems (CISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISS.2019.8692807\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 53rd Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS.2019.8692807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

格拉斯曼平均(GA)可以与l范数主成分(PC)重合，并且在数百万个样本中具有可扩展性。然而，目前尚不清楚是否存在，以及通过修改基于定点优化的遗传算法可以获得多少进一步的速度提高。在本文中，我对这一优化过程进行了直观的分析，并提出了改进方案，即一种不需要任何迭代的在线算法。我表明，它可以是最有效的，因为它每台PC只访问每个样本一次，内存需求最小，不像GA或MATLAB svds。事实证明，对于大数据，它是收敛的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An intuitive and most efficient Ll-norm principal component analysis algorithm for big data

Grassmann average (GA) can coincide with Ll- norm principal component (PC) and is scalable for millions of samples. However, it is unclear whether there exists and how much further speed improvement can be gained by revising the fixed-point optimization-based GA. In this paper, I analyze such optimization process in an intuitive way and propose its improvement, i.e., an online algorithm without any iterations. I show that it can be most efficient in the sense that it only visits each sample once per PC, with minimal memory requirement, unlike GA or MATLAB svds. It is proved to be convergent for big data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 53rd Annual Conference on Information Sciences and Systems (CISS)

自引率

0.00%

发文量