{"title":"一种高效可靠的基于公差的主成分分析算法","authors":"Michael Yeh, Ming Gu","doi":"10.1109/ICDMW58026.2022.00088","DOIUrl":null,"url":null,"abstract":"Principal component analysis (PCA) is an important method for dimensionality reduction in data science and machine learning. However, it is expensive for large matrices when only a few components are needed. Existing fast PCA algorithms typically assume the user will supply the number of components needed, but in practice, they may not know this number beforehand. Thus, it is important to have fast PCA algorithms depending on a tolerance. We develop one such algorithm that runs quickly for matrices with rapidly decaying singular values, provide approximation error bounds that are within a constant factor away from optimal, and demonstrate its utility with data from a variety of applications.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Efficient and Reliable Tolerance- Based Algorithm for Principal Component Analysis\",\"authors\":\"Michael Yeh, Ming Gu\",\"doi\":\"10.1109/ICDMW58026.2022.00088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Principal component analysis (PCA) is an important method for dimensionality reduction in data science and machine learning. However, it is expensive for large matrices when only a few components are needed. Existing fast PCA algorithms typically assume the user will supply the number of components needed, but in practice, they may not know this number beforehand. Thus, it is important to have fast PCA algorithms depending on a tolerance. We develop one such algorithm that runs quickly for matrices with rapidly decaying singular values, provide approximation error bounds that are within a constant factor away from optimal, and demonstrate its utility with data from a variety of applications.\",\"PeriodicalId\":146687,\"journal\":{\"name\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW58026.2022.00088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Efficient and Reliable Tolerance- Based Algorithm for Principal Component Analysis
Principal component analysis (PCA) is an important method for dimensionality reduction in data science and machine learning. However, it is expensive for large matrices when only a few components are needed. Existing fast PCA algorithms typically assume the user will supply the number of components needed, but in practice, they may not know this number beforehand. Thus, it is important to have fast PCA algorithms depending on a tolerance. We develop one such algorithm that runs quickly for matrices with rapidly decaying singular values, provide approximation error bounds that are within a constant factor away from optimal, and demonstrate its utility with data from a variety of applications.