Choosing a suitable acquisition function for batch Bayesian optimization: comparison of serial and Monte Carlo approaches†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery Pub Date : 2025-05-30 DOI:10.1039/D5DD00066A

Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu

{"title":"Choosing a suitable acquisition function for batch Bayesian optimization: comparison of serial and Monte Carlo approaches†","authors":"Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu","doi":"10.1039/D5DD00066A","DOIUrl":null,"url":null,"abstract":"Batch Bayesian optimization is widely used for optimizing expensive experimental processes when several samples can be tested together to save time or cost. A central decision in designing a Bayesian optimization campaign to guide experiments is the choice of a batch acquisition function when little or nothing is known about the landscape of the “black box” function to be optimized. To inform this decision, we first compare the performance of serial and Monte Carlo batch acquisition functions on two mathematical functions that serve as proxies for typical materials synthesis and processing experiments. The two functions, both in six dimensions, are the Ackley function, which epitomizes a “needle-in-haystack” search, and the Hartmann function, which exemplifies a “false optimum” problem. Our study evaluates the serial upper confidence bound with local penalization (UCB/LP) batch acquisition policy against Monte Carlo-based parallel approaches: q-log expected improvement (qlogEI) and q-upper confidence bound (qUCB), where q is the batch size. Tests on Ackley and Hartmann show that UCB/LP and qUCB perform well in noiseless conditions, both outperforming qlogEI. For the Hartmann function with noise, all Monte Carlo functions achieve faster convergence with less sensitivity to initial conditions compared to UCB/LP. We then confirm the findings on an empirical regression model built from experimental data in maximizing power conversion efficiency of flexible perovskite solar cells. Our results suggest that when empirically optimizing a “black-box” function in ≤six dimensions with no prior knowledge of the landscape or noise characteristics, qUCB is best suited as the default to maximize confidence in the modeled optimum while minimizing the number of expensive samples needed.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1751-1762"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00066a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00066a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Batch Bayesian optimization is widely used for optimizing expensive experimental processes when several samples can be tested together to save time or cost. A central decision in designing a Bayesian optimization campaign to guide experiments is the choice of a batch acquisition function when little or nothing is known about the landscape of the “black box” function to be optimized. To inform this decision, we first compare the performance of serial and Monte Carlo batch acquisition functions on two mathematical functions that serve as proxies for typical materials synthesis and processing experiments. The two functions, both in six dimensions, are the Ackley function, which epitomizes a “needle-in-haystack” search, and the Hartmann function, which exemplifies a “false optimum” problem. Our study evaluates the serial upper confidence bound with local penalization (UCB/LP) batch acquisition policy against Monte Carlo-based parallel approaches: q-log expected improvement (qlogEI) and q-upper confidence bound (qUCB), where q is the batch size. Tests on Ackley and Hartmann show that UCB/LP and qUCB perform well in noiseless conditions, both outperforming qlogEI. For the Hartmann function with noise, all Monte Carlo functions achieve faster convergence with less sensitivity to initial conditions compared to UCB/LP. We then confirm the findings on an empirical regression model built from experimental data in maximizing power conversion efficiency of flexible perovskite solar cells. Our results suggest that when empirically optimizing a “black-box” function in ≤six dimensions with no prior knowledge of the landscape or noise characteristics, qUCB is best suited as the default to maximize confidence in the modeled optimum while minimizing the number of expensive samples needed.

Abstract Image

查看原文本刊更多论文

为批处理贝叶斯优化选择合适的采集函数：串行和蒙特卡罗方法的比较

批量贝叶斯优化被广泛用于优化昂贵的实验过程，当多个样品可以同时测试以节省时间或成本时。设计贝叶斯优化活动来指导实验的一个中心决策是，当对要优化的“黑盒”函数的情况知之甚少或一无所知时，选择批量获取函数。为了说明这一决定，我们首先比较了串行和蒙特卡罗批采集函数在两个数学函数上的性能，这两个数学函数作为典型材料合成和加工实验的代理。这两个函数都是六维的，一个是Ackley函数，它是“大海捞针”搜索的缩影，另一个是Hartmann函数，它是“假最优”问题的例证。我们的研究评估了具有局部惩罚（UCB/LP）批获取策略的串行上置信度界与基于蒙特卡罗的并行方法：q-log期望改进（qlogEI）和q-上置信度界（qUCB），其中q是批大小。Ackley和Hartmann的测试表明，UCB/LP和qUCB在无噪声条件下表现良好，均优于qlogEI。对于带噪声的Hartmann函数，与UCB/LP相比，所有蒙特卡罗函数收敛速度更快，对初始条件的灵敏度更低。然后，我们利用实验数据建立的经验回归模型来验证柔性钙钛矿太阳能电池的功率转换效率最大化。我们的研究结果表明，当在不了解景观或噪声特征的情况下，在≤6个维度上对“黑箱”函数进行经验优化时，quucb最适合作为默认值，以最大化模型优化的置信度，同时最小化所需的昂贵样本数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital discovery

CiteScore

2.80

自引率

0.00%

发文量