{"title":"A continuous optimization algorithm for column subset selection","authors":"A. Mathur, S. Moka, Z. Botev","doi":"10.36334/modsim.2023.mathur","DOIUrl":null,"url":null,"abstract":": Recent advances in the technological ability to capture and collect data have meant that high-dimensional datasets are now ubiquitous in the fields of engineering, economics, finance, biology, and health sciences to name a few. In the case where the data collected is not labeled it is often desirable to obtain an accurate low-rank approximation for the data which is relatively low-cost to obtain and memory efficient. Such an approximation is useful to speed up downstream matrix computations that are often required in large-scale learning algorithms. The Column Subset Selection Problem (CSSP) is a tool to generate low-rank approximations based on a subset of data instances or features from the dataset. The chosen subset of instances or features are commonly referred to as “landmark” points. The choice of landmark points determines how accurate the low-rank approximation is. More specifically, the challenge in the CSSP is to select the best k columns of a data matrix X ∈ R m × n that span its column space. That is, for any binary vector s ∈ { 0 , 1 } n , compute","PeriodicalId":390064,"journal":{"name":"MODSIM2023, 25th International Congress on Modelling and Simulation.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MODSIM2023, 25th International Congress on Modelling and Simulation.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36334/modsim.2023.mathur","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
: Recent advances in the technological ability to capture and collect data have meant that high-dimensional datasets are now ubiquitous in the fields of engineering, economics, finance, biology, and health sciences to name a few. In the case where the data collected is not labeled it is often desirable to obtain an accurate low-rank approximation for the data which is relatively low-cost to obtain and memory efficient. Such an approximation is useful to speed up downstream matrix computations that are often required in large-scale learning algorithms. The Column Subset Selection Problem (CSSP) is a tool to generate low-rank approximations based on a subset of data instances or features from the dataset. The chosen subset of instances or features are commonly referred to as “landmark” points. The choice of landmark points determines how accurate the low-rank approximation is. More specifically, the challenge in the CSSP is to select the best k columns of a data matrix X ∈ R m × n that span its column space. That is, for any binary vector s ∈ { 0 , 1 } n , compute