{"title":"To tune or not to tune? An approach for recommending important hyperparameters for classification and clustering algorithms","authors":"Radwa El Shawi, Mohamadjavad Bahman, Sherif Sakr","doi":"10.1016/j.future.2024.107524","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning algorithms are widely employed across various applications and fields. Novel technologies in automated machine learning ease the complexity of algorithm selection and hyperparameter optimization process. Tuning hyperparameters plays a crucial role in determining the performance of machine learning models. While many optimization techniques have achieved remarkable success in hyperparameter tuning, even surpassing human experts’ performance, relying solely on these black-box techniques can deprive practitioners of insights into the relative importance of different hyperparameters. In this paper, we investigate the importance of hyperparameter tuning by establishing a relationship between machine learning model performance and their corresponding hyperparameters. Our focus is primarily on classification and clustering tasks. We conduct experiments on benchmark datasets using six traditional classification and clustering algorithms, along with one deep learning model. Our findings empower users to make informed decisions regarding the necessity of engaging in time-consuming tuning processes. We highlight the most important hyperparameters and provide guidance on selecting an appropriate configuration space. The results of our experiments confirm that the hyperparameters identified as important are indeed crucial for performance. Overall, our study offers a quantitative basis for guiding automated hyperparameter optimization efforts and contributes to the development of better-automated machine learning frameworks.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"163 ","pages":"Article 107524"},"PeriodicalIF":6.2000,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24004886","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning algorithms are widely employed across various applications and fields. Novel technologies in automated machine learning ease the complexity of algorithm selection and hyperparameter optimization process. Tuning hyperparameters plays a crucial role in determining the performance of machine learning models. While many optimization techniques have achieved remarkable success in hyperparameter tuning, even surpassing human experts’ performance, relying solely on these black-box techniques can deprive practitioners of insights into the relative importance of different hyperparameters. In this paper, we investigate the importance of hyperparameter tuning by establishing a relationship between machine learning model performance and their corresponding hyperparameters. Our focus is primarily on classification and clustering tasks. We conduct experiments on benchmark datasets using six traditional classification and clustering algorithms, along with one deep learning model. Our findings empower users to make informed decisions regarding the necessity of engaging in time-consuming tuning processes. We highlight the most important hyperparameters and provide guidance on selecting an appropriate configuration space. The results of our experiments confirm that the hyperparameters identified as important are indeed crucial for performance. Overall, our study offers a quantitative basis for guiding automated hyperparameter optimization efforts and contributes to the development of better-automated machine learning frameworks.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.