扩大 Optuna 的规模：P2P 分布式超参数优化

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Concurrency and Computation-Practice & Experience Pub Date : 2025-03-03 DOI:10.1002/cpe.70008

Loïc Cudennec

{"title":"扩大 Optuna 的规模：P2P 分布式超参数优化","authors":"Loïc Cudennec","doi":"10.1002/cpe.70008","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In machine learning (ML), hyperparameter optimization (HPO) is the process of choosing a tuple of values that ensures an efficient deployment and training of an AI model. In practice, HPO not only applies to ML tuning but can also be used to tune complex numerical simulations. In this context, a numerical model of a given object is created to be used in realistic simulations. This model is defined by a set of values describing properties such as the geometry of the object or other unknown parameters related to physical quantities. While HPO for ML usually requires finding a few parameters, a numerical model can involve the tuning of more than a hundred parameters. As a consequence, a large number of tuples have to be explored and evaluated before finding a relevant solution, offering new challenges in high-performance computing for efficiently driving the optimization. In this work we rely on the Optuna HPO framework, primarily designed for ML tasks and including state-of-the-art sampling and pruning algorithms. We report on its use to optimize a complex numerical model onto a 1024-core machine. We suggest 1.5M tuples and evaluate 5M simulations using different Optuna-distributed layouts to build several tradeoffs between performance and energy consumption metrics. In order to further scale up the optimization process onto resources, we introduce OptunaP2P, an extension of Optuna based on the peer-to-peer paradigm. This allows to remove any bottleneck in the management of the shared knowledge between optimization processes. With OptunaP2P, we were able to compute up to 3 times faster compared to the regular Optuna-distributed implementation and to obtain close-to-similar results in terms of quality in this reduced time-frame.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 4-5","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scaling Up Optuna: P2P Distributed Hyperparameters Optimization\",\"authors\":\"Loïc Cudennec\",\"doi\":\"10.1002/cpe.70008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In machine learning (ML), hyperparameter optimization (HPO) is the process of choosing a tuple of values that ensures an efficient deployment and training of an AI model. In practice, HPO not only applies to ML tuning but can also be used to tune complex numerical simulations. In this context, a numerical model of a given object is created to be used in realistic simulations. This model is defined by a set of values describing properties such as the geometry of the object or other unknown parameters related to physical quantities. While HPO for ML usually requires finding a few parameters, a numerical model can involve the tuning of more than a hundred parameters. As a consequence, a large number of tuples have to be explored and evaluated before finding a relevant solution, offering new challenges in high-performance computing for efficiently driving the optimization. In this work we rely on the Optuna HPO framework, primarily designed for ML tasks and including state-of-the-art sampling and pruning algorithms. We report on its use to optimize a complex numerical model onto a 1024-core machine. We suggest 1.5M tuples and evaluate 5M simulations using different Optuna-distributed layouts to build several tradeoffs between performance and energy consumption metrics. In order to further scale up the optimization process onto resources, we introduce OptunaP2P, an extension of Optuna based on the peer-to-peer paradigm. This allows to remove any bottleneck in the management of the shared knowledge between optimization processes. With OptunaP2P, we were able to compute up to 3 times faster compared to the regular Optuna-distributed implementation and to obtain close-to-similar results in terms of quality in this reduced time-frame.</p>\\n </div>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"37 4-5\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70008\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70008","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

在机器学习（ML）中，超参数优化（HPO）是选择一组值以确保有效部署和训练人工智能模型的过程。在实践中，HPO不仅适用于ML调优，还可以用于调优复杂的数值模拟。在这种情况下，创建给定对象的数值模型以用于现实模拟。该模型由一组描述属性的值定义，例如对象的几何形状或与物理量相关的其他未知参数。ML的HPO通常需要找到几个参数，而数值模型可能涉及到100多个参数的调优。因此，在找到相关的解决方案之前，必须对大量的元组进行探索和评估，这为高效地推动优化提供了高性能计算中的新挑战。在这项工作中，我们依赖于Optuna HPO框架，该框架主要是为ML任务设计的，包括最先进的采样和修剪算法。我们报告了在1024核机器上使用它来优化复杂的数值模型。我们建议使用150万个元组，并使用不同的optuna分布式布局评估5M个模拟，以在性能和能耗指标之间建立几个权衡。为了进一步将优化过程扩展到资源上，我们引入了OptunaP2P，这是Optuna的一种基于点对点范式的扩展。这允许消除优化过程之间共享知识管理中的任何瓶颈。使用OptunaP2P，我们的计算速度比常规的optuna分布式实现快3倍，并且在缩短的时间内获得接近相似的质量结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scaling Up Optuna: P2P Distributed Hyperparameters Optimization

In machine learning (ML), hyperparameter optimization (HPO) is the process of choosing a tuple of values that ensures an efficient deployment and training of an AI model. In practice, HPO not only applies to ML tuning but can also be used to tune complex numerical simulations. In this context, a numerical model of a given object is created to be used in realistic simulations. This model is defined by a set of values describing properties such as the geometry of the object or other unknown parameters related to physical quantities. While HPO for ML usually requires finding a few parameters, a numerical model can involve the tuning of more than a hundred parameters. As a consequence, a large number of tuples have to be explored and evaluated before finding a relevant solution, offering new challenges in high-performance computing for efficiently driving the optimization. In this work we rely on the Optuna HPO framework, primarily designed for ML tasks and including state-of-the-art sampling and pruning algorithms. We report on its use to optimize a complex numerical model onto a 1024-core machine. We suggest 1.5M tuples and evaluate 5M simulations using different Optuna-distributed layouts to build several tradeoffs between performance and energy consumption metrics. In order to further scale up the optimization process onto resources, we introduce OptunaP2P, an extension of Optuna based on the peer-to-peer paradigm. This allows to remove any bottleneck in the management of the shared knowledge between optimization processes. With OptunaP2P, we were able to compute up to 3 times faster compared to the regular Optuna-distributed implementation and to obtain close-to-similar results in terms of quality in this reduced time-frame.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.