{"title":"Best Subset Solution Path for Linear Dimension Reduction Models using Continuous Optimization","authors":"Benoit Liquet, Sarat Moka, Samuel Muller","doi":"arxiv-2403.20007","DOIUrl":null,"url":null,"abstract":"The selection of best variables is a challenging problem in supervised and\nunsupervised learning, especially in high dimensional contexts where the number\nof variables is usually much larger than the number of observations. In this\npaper, we focus on two multivariate statistical methods: principal components\nanalysis and partial least squares. Both approaches are popular linear\ndimension-reduction methods with numerous applications in several fields\nincluding in genomics, biology, environmental science, and engineering. In\nparticular, these approaches build principal components, new variables that are\ncombinations of all the original variables. A main drawback of principal\ncomponents is the difficulty to interpret them when the number of variables is\nlarge. To define principal components from the most relevant variables, we\npropose to cast the best subset solution path method into principal component\nanalysis and partial least square frameworks. We offer a new alternative by\nexploiting a continuous optimization algorithm for best subset solution path.\nEmpirical studies show the efficacy of our approach for providing the best\nsubset solution path. The usage of our algorithm is further exposed through the\nanalysis of two real datasets. The first dataset is analyzed using the\nprinciple component analysis while the analysis of the second dataset is based\non partial least square framework.","PeriodicalId":501323,"journal":{"name":"arXiv - STAT - Other Statistics","volume":"122 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Other Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.20007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The selection of best variables is a challenging problem in supervised and
unsupervised learning, especially in high dimensional contexts where the number
of variables is usually much larger than the number of observations. In this
paper, we focus on two multivariate statistical methods: principal components
analysis and partial least squares. Both approaches are popular linear
dimension-reduction methods with numerous applications in several fields
including in genomics, biology, environmental science, and engineering. In
particular, these approaches build principal components, new variables that are
combinations of all the original variables. A main drawback of principal
components is the difficulty to interpret them when the number of variables is
large. To define principal components from the most relevant variables, we
propose to cast the best subset solution path method into principal component
analysis and partial least square frameworks. We offer a new alternative by
exploiting a continuous optimization algorithm for best subset solution path.
Empirical studies show the efficacy of our approach for providing the best
subset solution path. The usage of our algorithm is further exposed through the
analysis of two real datasets. The first dataset is analyzed using the
principle component analysis while the analysis of the second dataset is based
on partial least square framework.