TPOT-SH: A Faster Optimization Algorithm to Solve the AutoML Problem on Large Datasets

2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2019-11-01 DOI:10.1109/ICTAI.2019.00072

Laurent Parmentier, Olivier Nicol, Laetitia Vermeulen-Jourdan, Marie-Éléonore Kessaci

{"title":"TPOT-SH: A Faster Optimization Algorithm to Solve the AutoML Problem on Large Datasets","authors":"Laurent Parmentier, Olivier Nicol, Laetitia Vermeulen-Jourdan, Marie-Éléonore Kessaci","doi":"10.1109/ICTAI.2019.00072","DOIUrl":null,"url":null,"abstract":"Data are omnipresent nowadays and contain knowledge and patterns that machine learning (ML) algorithms can extract so as to take decisions or perform a task without explicit instructions. To achieve that, these algorithms learn a mathematical model using sample data. However, there are numerous ML algorithms, all learning different models of reality. Furthermore, the behavior of these algorithms can be altered by modifying some of their plethora of hyperparameters. Cleverly tuning these algorithms is costly but essential to reach decent performance. Yet it requires a lot of expertise and remains hard even for experts who tend to resort to exploration-only approaches like random search and grid search. The field of AutoML has consequently emerged in the quest for automatized machine learning processes that would be less expensive than brute force searches. In this paper we continue the research initiated on the Tree-based Pipeline Optimization Tool (TPOT), an AutoML based on Evolutionary Algorithms (EA). EAs are typically slow to converge which makes TPOT incapable of scaling to large datasets. As a consequence, we introduce TPOT-SH inspired from the concept of Successive Halving used in Multi-Armed Bandit problems. This solution allows TPOT to explore the search space faster and have much better performance on larger datasets.","PeriodicalId":346657,"journal":{"name":"2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2019.00072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Data are omnipresent nowadays and contain knowledge and patterns that machine learning (ML) algorithms can extract so as to take decisions or perform a task without explicit instructions. To achieve that, these algorithms learn a mathematical model using sample data. However, there are numerous ML algorithms, all learning different models of reality. Furthermore, the behavior of these algorithms can be altered by modifying some of their plethora of hyperparameters. Cleverly tuning these algorithms is costly but essential to reach decent performance. Yet it requires a lot of expertise and remains hard even for experts who tend to resort to exploration-only approaches like random search and grid search. The field of AutoML has consequently emerged in the quest for automatized machine learning processes that would be less expensive than brute force searches. In this paper we continue the research initiated on the Tree-based Pipeline Optimization Tool (TPOT), an AutoML based on Evolutionary Algorithms (EA). EAs are typically slow to converge which makes TPOT incapable of scaling to large datasets. As a consequence, we introduce TPOT-SH inspired from the concept of Successive Halving used in Multi-Armed Bandit problems. This solution allows TPOT to explore the search space faster and have much better performance on larger datasets.

查看原文本刊更多论文

TPOT-SH:一种解决大型数据集上AutoML问题的快速优化算法

如今，数据无处不在，包含机器学习(ML)算法可以提取的知识和模式，以便在没有明确指令的情况下做出决策或执行任务。为了实现这一点，这些算法使用样本数据学习数学模型。然而，有许多ML算法，都学习不同的现实模型。此外，这些算法的行为可以通过修改它们过多的超参数来改变。巧妙地调整这些算法是昂贵的，但对于达到良好的性能是必要的。然而，它需要大量的专业知识，即使对于那些倾向于采用随机搜索和网格搜索等探索方法的专家来说，它仍然很难。因此，AutoML领域出现在寻求自动化机器学习过程的过程中，这将比暴力搜索更便宜。本文继续对基于树的管道优化工具(TPOT)的研究，这是一种基于进化算法(EA)的自动化工具。ea通常收敛缓慢，这使得TPOT无法扩展到大型数据集。因此，我们引入了TPOT-SH，其灵感来自于多武装强盗问题中使用的连续减半概念。该解决方案允许TPOT更快地探索搜索空间，并且在更大的数据集上具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量