Gregory M. Moore, Charles Bergeron, Kristin P. Bennett
{"title":"Nonsmooth Bilevel Programming for Hyperparameter Selection","authors":"Gregory M. Moore, Charles Bergeron, Kristin P. Bennett","doi":"10.1109/ICDMW.2009.74","DOIUrl":null,"url":null,"abstract":"We propose a nonsmooth bilevel programming method for training linear learning models with hyperparameters optimized via $T$-fold cross-validation (CV). This algorithm scales well in the sample size. The method handles loss functions with embedded maxima such as in support vector machines. Current practice constructs models over a predefined grid of hyperparameter combinations and selects the best one, an inefficient heuristic. Innovating over previous bilevel CV approaches, this paper represents an advance towards the goal of self-tuning supervised data mining as well as a significant innovation in scalable bilevel programming algorithms. Using the bilevel CV formulation, the lower-level problems are treated as unconstrained optimization problems and are replaced with their optimality conditions. The resulting nonlinear program is nonsmooth and nonconvex. We develop a novel bilevel programming algorithm to solve this class of problems, and apply it to linear least-squares support vector regression having hyperparameters $C$ (tradeoff) and $\\epsilon$ (loss insensitivity). This new approach outperforms grid search and prior smooth bilevel CV methods in terms of modeling performance. Increased speed foresees modeling with an increased number of hyperparameters.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2009.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
We propose a nonsmooth bilevel programming method for training linear learning models with hyperparameters optimized via $T$-fold cross-validation (CV). This algorithm scales well in the sample size. The method handles loss functions with embedded maxima such as in support vector machines. Current practice constructs models over a predefined grid of hyperparameter combinations and selects the best one, an inefficient heuristic. Innovating over previous bilevel CV approaches, this paper represents an advance towards the goal of self-tuning supervised data mining as well as a significant innovation in scalable bilevel programming algorithms. Using the bilevel CV formulation, the lower-level problems are treated as unconstrained optimization problems and are replaced with their optimality conditions. The resulting nonlinear program is nonsmooth and nonconvex. We develop a novel bilevel programming algorithm to solve this class of problems, and apply it to linear least-squares support vector regression having hyperparameters $C$ (tradeoff) and $\epsilon$ (loss insensitivity). This new approach outperforms grid search and prior smooth bilevel CV methods in terms of modeling performance. Increased speed foresees modeling with an increased number of hyperparameters.