Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu
{"title":"为批处理贝叶斯优化选择合适的采集函数:串行和蒙特卡罗方法的比较","authors":"Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu","doi":"10.1039/D5DD00066A","DOIUrl":null,"url":null,"abstract":"<p >Batch Bayesian optimization is widely used for optimizing expensive experimental processes when several samples can be tested together to save time or cost. A central decision in designing a Bayesian optimization campaign to guide experiments is the choice of a batch acquisition function when little or nothing is known about the landscape of the “black box” function to be optimized. To inform this decision, we first compare the performance of serial and Monte Carlo batch acquisition functions on two mathematical functions that serve as proxies for typical materials synthesis and processing experiments. The two functions, both in six dimensions, are the Ackley function, which epitomizes a “needle-in-haystack” search, and the Hartmann function, which exemplifies a “false optimum” problem. Our study evaluates the serial upper confidence bound with local penalization (UCB/LP) batch acquisition policy against Monte Carlo-based parallel approaches: <em>q</em>-log expected improvement (<em>q</em>logEI) and <em>q</em>-upper confidence bound (<em>q</em>UCB), where <em>q</em> is the batch size. Tests on Ackley and Hartmann show that UCB/LP and <em>q</em>UCB perform well in noiseless conditions, both outperforming <em>q</em>logEI. For the Hartmann function with noise, all Monte Carlo functions achieve faster convergence with less sensitivity to initial conditions compared to UCB/LP. We then confirm the findings on an empirical regression model built from experimental data in maximizing power conversion efficiency of flexible perovskite solar cells. Our results suggest that when empirically optimizing a “black-box” function in ≤six dimensions with no prior knowledge of the landscape or noise characteristics, <em>q</em>UCB is best suited as the default to maximize confidence in the modeled optimum while minimizing the number of expensive samples needed.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1751-1762"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00066a?page=search","citationCount":"0","resultStr":"{\"title\":\"Choosing a suitable acquisition function for batch Bayesian optimization: comparison of serial and Monte Carlo approaches†\",\"authors\":\"Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu\",\"doi\":\"10.1039/D5DD00066A\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Batch Bayesian optimization is widely used for optimizing expensive experimental processes when several samples can be tested together to save time or cost. A central decision in designing a Bayesian optimization campaign to guide experiments is the choice of a batch acquisition function when little or nothing is known about the landscape of the “black box” function to be optimized. To inform this decision, we first compare the performance of serial and Monte Carlo batch acquisition functions on two mathematical functions that serve as proxies for typical materials synthesis and processing experiments. The two functions, both in six dimensions, are the Ackley function, which epitomizes a “needle-in-haystack” search, and the Hartmann function, which exemplifies a “false optimum” problem. Our study evaluates the serial upper confidence bound with local penalization (UCB/LP) batch acquisition policy against Monte Carlo-based parallel approaches: <em>q</em>-log expected improvement (<em>q</em>logEI) and <em>q</em>-upper confidence bound (<em>q</em>UCB), where <em>q</em> is the batch size. Tests on Ackley and Hartmann show that UCB/LP and <em>q</em>UCB perform well in noiseless conditions, both outperforming <em>q</em>logEI. For the Hartmann function with noise, all Monte Carlo functions achieve faster convergence with less sensitivity to initial conditions compared to UCB/LP. We then confirm the findings on an empirical regression model built from experimental data in maximizing power conversion efficiency of flexible perovskite solar cells. Our results suggest that when empirically optimizing a “black-box” function in ≤six dimensions with no prior knowledge of the landscape or noise characteristics, <em>q</em>UCB is best suited as the default to maximize confidence in the modeled optimum while minimizing the number of expensive samples needed.</p>\",\"PeriodicalId\":72816,\"journal\":{\"name\":\"Digital discovery\",\"volume\":\" 7\",\"pages\":\" 1751-1762\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00066a?page=search\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00066a\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00066a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Choosing a suitable acquisition function for batch Bayesian optimization: comparison of serial and Monte Carlo approaches†
Batch Bayesian optimization is widely used for optimizing expensive experimental processes when several samples can be tested together to save time or cost. A central decision in designing a Bayesian optimization campaign to guide experiments is the choice of a batch acquisition function when little or nothing is known about the landscape of the “black box” function to be optimized. To inform this decision, we first compare the performance of serial and Monte Carlo batch acquisition functions on two mathematical functions that serve as proxies for typical materials synthesis and processing experiments. The two functions, both in six dimensions, are the Ackley function, which epitomizes a “needle-in-haystack” search, and the Hartmann function, which exemplifies a “false optimum” problem. Our study evaluates the serial upper confidence bound with local penalization (UCB/LP) batch acquisition policy against Monte Carlo-based parallel approaches: q-log expected improvement (qlogEI) and q-upper confidence bound (qUCB), where q is the batch size. Tests on Ackley and Hartmann show that UCB/LP and qUCB perform well in noiseless conditions, both outperforming qlogEI. For the Hartmann function with noise, all Monte Carlo functions achieve faster convergence with less sensitivity to initial conditions compared to UCB/LP. We then confirm the findings on an empirical regression model built from experimental data in maximizing power conversion efficiency of flexible perovskite solar cells. Our results suggest that when empirically optimizing a “black-box” function in ≤six dimensions with no prior knowledge of the landscape or noise characteristics, qUCB is best suited as the default to maximize confidence in the modeled optimum while minimizing the number of expensive samples needed.