F. Zagatti, L. C. Silva, Lucas Nildaimon Dos Santos Silva, B. S. Sette, Helena de Medeiros Caseli, D. Lucrédio, D. F. Silva
{"title":"MetaPrep: Data preparation pipelines recommendation via meta-learning","authors":"F. Zagatti, L. C. Silva, Lucas Nildaimon Dos Santos Silva, B. S. Sette, Helena de Medeiros Caseli, D. Lucrédio, D. F. Silva","doi":"10.1109/ICMLA52953.2021.00194","DOIUrl":null,"url":null,"abstract":"Data preparation is a mandatory phase in the machine learning pipeline. The goal of data preparation is to convert noisy and disordered data into refined data that can be used by the algorithms. However, data preparation is time-consuming and requires specialized knowledge about the data and algorithms. Therefore, automating data preparation is essential to decrease the effort made by data scientists to develop satisfactory models. Despite its relevance, current AutoML platforms disregard or make simple hardcoded data preparation pipelines. Trying to fill this gap, we present a meta-learning-based recommendation system for data preparation. Our system recommends five pipelines, ranked by their relevance, making it useful for users with varying degrees of experience. Using the top-1 pipeline we demonstrated that our proposal allows a better performance of an AutoML system. Furthermore, the accuracy rates of our method were comparable to those achieved by a reinforcement-learning-based algorithm with the same goal, but it was up to two orders of magnitude faster. Moreover, we tested our method in a real-world application and evaluated its benefits and limitations in this scenario.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"26 1","pages":"1197-1202"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data preparation is a mandatory phase in the machine learning pipeline. The goal of data preparation is to convert noisy and disordered data into refined data that can be used by the algorithms. However, data preparation is time-consuming and requires specialized knowledge about the data and algorithms. Therefore, automating data preparation is essential to decrease the effort made by data scientists to develop satisfactory models. Despite its relevance, current AutoML platforms disregard or make simple hardcoded data preparation pipelines. Trying to fill this gap, we present a meta-learning-based recommendation system for data preparation. Our system recommends five pipelines, ranked by their relevance, making it useful for users with varying degrees of experience. Using the top-1 pipeline we demonstrated that our proposal allows a better performance of an AutoML system. Furthermore, the accuracy rates of our method were comparable to those achieved by a reinforcement-learning-based algorithm with the same goal, but it was up to two orders of magnitude faster. Moreover, we tested our method in a real-world application and evaluated its benefits and limitations in this scenario.