TPOT-SH: A Faster Optimization Algorithm to Solve the AutoML Problem on Large Datasets

Laurent Parmentier, Olivier Nicol, Laetitia Vermeulen-Jourdan, Marie-Éléonore Kessaci
{"title":"TPOT-SH: A Faster Optimization Algorithm to Solve the AutoML Problem on Large Datasets","authors":"Laurent Parmentier, Olivier Nicol, Laetitia Vermeulen-Jourdan, Marie-Éléonore Kessaci","doi":"10.1109/ICTAI.2019.00072","DOIUrl":null,"url":null,"abstract":"Data are omnipresent nowadays and contain knowledge and patterns that machine learning (ML) algorithms can extract so as to take decisions or perform a task without explicit instructions. To achieve that, these algorithms learn a mathematical model using sample data. However, there are numerous ML algorithms, all learning different models of reality. Furthermore, the behavior of these algorithms can be altered by modifying some of their plethora of hyperparameters. Cleverly tuning these algorithms is costly but essential to reach decent performance. Yet it requires a lot of expertise and remains hard even for experts who tend to resort to exploration-only approaches like random search and grid search. The field of AutoML has consequently emerged in the quest for automatized machine learning processes that would be less expensive than brute force searches. In this paper we continue the research initiated on the Tree-based Pipeline Optimization Tool (TPOT), an AutoML based on Evolutionary Algorithms (EA). EAs are typically slow to converge which makes TPOT incapable of scaling to large datasets. As a consequence, we introduce TPOT-SH inspired from the concept of Successive Halving used in Multi-Armed Bandit problems. This solution allows TPOT to explore the search space faster and have much better performance on larger datasets.","PeriodicalId":346657,"journal":{"name":"2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2019.00072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Data are omnipresent nowadays and contain knowledge and patterns that machine learning (ML) algorithms can extract so as to take decisions or perform a task without explicit instructions. To achieve that, these algorithms learn a mathematical model using sample data. However, there are numerous ML algorithms, all learning different models of reality. Furthermore, the behavior of these algorithms can be altered by modifying some of their plethora of hyperparameters. Cleverly tuning these algorithms is costly but essential to reach decent performance. Yet it requires a lot of expertise and remains hard even for experts who tend to resort to exploration-only approaches like random search and grid search. The field of AutoML has consequently emerged in the quest for automatized machine learning processes that would be less expensive than brute force searches. In this paper we continue the research initiated on the Tree-based Pipeline Optimization Tool (TPOT), an AutoML based on Evolutionary Algorithms (EA). EAs are typically slow to converge which makes TPOT incapable of scaling to large datasets. As a consequence, we introduce TPOT-SH inspired from the concept of Successive Halving used in Multi-Armed Bandit problems. This solution allows TPOT to explore the search space faster and have much better performance on larger datasets.
TPOT-SH:一种解决大型数据集上AutoML问题的快速优化算法
如今,数据无处不在,包含机器学习(ML)算法可以提取的知识和模式,以便在没有明确指令的情况下做出决策或执行任务。为了实现这一点,这些算法使用样本数据学习数学模型。然而,有许多ML算法,都学习不同的现实模型。此外,这些算法的行为可以通过修改它们过多的超参数来改变。巧妙地调整这些算法是昂贵的,但对于达到良好的性能是必要的。然而,它需要大量的专业知识,即使对于那些倾向于采用随机搜索和网格搜索等探索方法的专家来说,它仍然很难。因此,AutoML领域出现在寻求自动化机器学习过程的过程中,这将比暴力搜索更便宜。本文继续对基于树的管道优化工具(TPOT)的研究,这是一种基于进化算法(EA)的自动化工具。ea通常收敛缓慢,这使得TPOT无法扩展到大型数据集。因此,我们引入了TPOT-SH,其灵感来自于多武装强盗问题中使用的连续减半概念。该解决方案允许TPOT更快地探索搜索空间,并且在更大的数据集上具有更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信