{"title":"具有自动特征分组的稀疏回归保群解路径算法","authors":"Bin Gu, Guodong Liu, Heng Huang","doi":"10.1145/3097983.3098010","DOIUrl":null,"url":null,"abstract":"Feature selection is one of the most important data mining research topics with many applications. In practical problems, features often have group structure to effect the outcomes. Thus, it is crucial to automatically identify homogenous groups of features for high-dimensional data analysis. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is an important sparse regression approach with automatic feature grouping and selection by ℓ1 norm and pairwise ℓ∞ norm. However, due to over-complex representation of the penalty (especially the pairwise ℓ∞ norm), so far OSCAR has no solution path algorithm which is mostly useful for tuning the model. To address this challenge, in this paper, we propose a groups-keeping solution path algorithm to solve the OSCAR model (OscarGKPath). Given a set of homogenous groups of features and an accuracy bound ε, OscarGKPath can fit the solutions in an interval of regularization parameters while keeping the feature groups. The entire solution path can be obtained by combining multiple such intervals. We prove that all solutions in the solution path produced by OscarGKPath can strictly satisfy the given accuracy bound ε. The experimental results on benchmark datasets not only confirm the effectiveness of our OscarGKPath algorithm, but also show the superiority of our OscarGKPath in cross validation compared with the existing batch algorithm.","PeriodicalId":314049,"journal":{"name":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Groups-Keeping Solution Path Algorithm for Sparse Regression with Automatic Feature Grouping\",\"authors\":\"Bin Gu, Guodong Liu, Heng Huang\",\"doi\":\"10.1145/3097983.3098010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection is one of the most important data mining research topics with many applications. In practical problems, features often have group structure to effect the outcomes. Thus, it is crucial to automatically identify homogenous groups of features for high-dimensional data analysis. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is an important sparse regression approach with automatic feature grouping and selection by ℓ1 norm and pairwise ℓ∞ norm. However, due to over-complex representation of the penalty (especially the pairwise ℓ∞ norm), so far OSCAR has no solution path algorithm which is mostly useful for tuning the model. To address this challenge, in this paper, we propose a groups-keeping solution path algorithm to solve the OSCAR model (OscarGKPath). Given a set of homogenous groups of features and an accuracy bound ε, OscarGKPath can fit the solutions in an interval of regularization parameters while keeping the feature groups. The entire solution path can be obtained by combining multiple such intervals. We prove that all solutions in the solution path produced by OscarGKPath can strictly satisfy the given accuracy bound ε. The experimental results on benchmark datasets not only confirm the effectiveness of our OscarGKPath algorithm, but also show the superiority of our OscarGKPath in cross validation compared with the existing batch algorithm.\",\"PeriodicalId\":314049,\"journal\":{\"name\":\"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3097983.3098010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3097983.3098010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Groups-Keeping Solution Path Algorithm for Sparse Regression with Automatic Feature Grouping
Feature selection is one of the most important data mining research topics with many applications. In practical problems, features often have group structure to effect the outcomes. Thus, it is crucial to automatically identify homogenous groups of features for high-dimensional data analysis. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is an important sparse regression approach with automatic feature grouping and selection by ℓ1 norm and pairwise ℓ∞ norm. However, due to over-complex representation of the penalty (especially the pairwise ℓ∞ norm), so far OSCAR has no solution path algorithm which is mostly useful for tuning the model. To address this challenge, in this paper, we propose a groups-keeping solution path algorithm to solve the OSCAR model (OscarGKPath). Given a set of homogenous groups of features and an accuracy bound ε, OscarGKPath can fit the solutions in an interval of regularization parameters while keeping the feature groups. The entire solution path can be obtained by combining multiple such intervals. We prove that all solutions in the solution path produced by OscarGKPath can strictly satisfy the given accuracy bound ε. The experimental results on benchmark datasets not only confirm the effectiveness of our OscarGKPath algorithm, but also show the superiority of our OscarGKPath in cross validation compared with the existing batch algorithm.