Nonsmooth Bilevel Programming for Hyperparameter Selection

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI:10.1109/ICDMW.2009.74

Gregory M. Moore, Charles Bergeron, Kristin P. Bennett

{"title":"Nonsmooth Bilevel Programming for Hyperparameter Selection","authors":"Gregory M. Moore, Charles Bergeron, Kristin P. Bennett","doi":"10.1109/ICDMW.2009.74","DOIUrl":null,"url":null,"abstract":"We propose a nonsmooth bilevel programming method for training linear learning models with hyperparameters optimized via $T$-fold cross-validation (CV). This algorithm scales well in the sample size. The method handles loss functions with embedded maxima such as in support vector machines. Current practice constructs models over a predefined grid of hyperparameter combinations and selects the best one, an inefficient heuristic. Innovating over previous bilevel CV approaches, this paper represents an advance towards the goal of self-tuning supervised data mining as well as a significant innovation in scalable bilevel programming algorithms. Using the bilevel CV formulation, the lower-level problems are treated as unconstrained optimization problems and are replaced with their optimality conditions. The resulting nonlinear program is nonsmooth and nonconvex. We develop a novel bilevel programming algorithm to solve this class of problems, and apply it to linear least-squares support vector regression having hyperparameters $C$ (tradeoff) and $\\epsilon$ (loss insensitivity). This new approach outperforms grid search and prior smooth bilevel CV methods in terms of modeling performance. Increased speed foresees modeling with an increased number of hyperparameters.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2009.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

We propose a nonsmooth bilevel programming method for training linear learning models with hyperparameters optimized via $T$-fold cross-validation (CV). This algorithm scales well in the sample size. The method handles loss functions with embedded maxima such as in support vector machines. Current practice constructs models over a predefined grid of hyperparameter combinations and selects the best one, an inefficient heuristic. Innovating over previous bilevel CV approaches, this paper represents an advance towards the goal of self-tuning supervised data mining as well as a significant innovation in scalable bilevel programming algorithms. Using the bilevel CV formulation, the lower-level problems are treated as unconstrained optimization problems and are replaced with their optimality conditions. The resulting nonlinear program is nonsmooth and nonconvex. We develop a novel bilevel programming algorithm to solve this class of problems, and apply it to linear least-squares support vector regression having hyperparameters $C$ (tradeoff) and $\epsilon$ (loss insensitivity). This new approach outperforms grid search and prior smooth bilevel CV methods in terms of modeling performance. Increased speed foresees modeling with an increased number of hyperparameters.

查看原文本刊更多论文

超参数选择的非光滑双层规划

我们提出了一种非光滑双层规划方法，用于训练具有超参数的线性学习模型，该模型通过$T$-fold交叉验证(CV)进行优化。该算法在样本量上具有很好的伸缩性。该方法处理具有嵌入最大值的损失函数，例如在支持向量机中。目前的做法是在预定义的超参数组合网格上构建模型并选择最佳模型，这是一种低效的启发式方法。在之前的双层CV方法上进行了创新，本文代表了自调优监督数据挖掘目标的进步，以及可扩展双层编程算法的重大创新。利用双层CV公式，将低级问题作为无约束优化问题处理，并用其最优性条件代替。所得到的非线性程序是非光滑和非凸的。我们开发了一种新的双层规划算法来解决这类问题，并将其应用于具有超参数$C$(权衡)和$\epsilon$(损失不敏感性)的线性最小二乘支持向量回归。这种新方法在建模性能方面优于网格搜索和先验光滑双层CV方法。速度的提高预示着超参数数量的增加。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量