更正“reformulation reactive Design for Data-Efficient Machine Learning”

IF 13.1 1区化学 Q1 CHEMISTRY, PHYSICAL

ACS Catalysis Pub Date : 2025-02-04 DOI:10.1021/acscatal.5c00556

Toby Lewis-Atwell, Daniel Beechey, Özgür Şimşek, Matthew N. Grayson

{"title":"更正“reformulation reactive Design for Data-Efficient Machine Learning”","authors":"Toby Lewis-Atwell, Daniel Beechey, Özgür Şimşek, Matthew N. Grayson","doi":"10.1021/acscatal.5c00556","DOIUrl":null,"url":null,"abstract":"A bug in the source code that was used for our original experiments in this work affected some of the results for the E2 and SN2 data sets in a minor way. These data sets taken from the literature contain reactions that are duplicated by symmetry, and therefore we removed these duplicated reactions from our training sets before running the barrier optimization experiments. However, a single missing line of code meant that the low-level barriers for these two data sets were not filtered along with the reactions and high-level barriers. This scrambled the relationship between the low- and high-level barriers, artificially reducing their correlation. Whereas the squares of the Pearson correlation coefficients between the low- and high-level barriers should have been 0.90 and 0.88 for the E2 and SN2 data sets, respectively, they were reduced to 0.56 and 0.65. With the fix implemented, we reran the experiments on the E2 and SN2 data sets and found that (as would be expected from the higher correlations) the data efficiency of the algorithms that use the low-level barriers is improved. In the original version of this work, values in the E2 and SN2 rows of the “ML Search” and “Bayes Opt.” columns of Tables 1, 2 and 3 should be changed to those given here. A very small change in the discussion is required to reflect these results. In the second paragraph on page 13511, the sentence that reads “Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with Bayesian optimization using the same features following very closely behind” should be changed to “Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with practically no difference between the ML search and Bayesian optimization using the same features”. Additionally, in this same paragraph, there are two sentences that are slightly unclear and should be updated. The sentence that begins with “We observe that the performances of these two algorithms do not deteriorate ...” should be changed to begin with “We observe that the performances of the ML and Bayesian algorithms do not deteriorate ...”. Finally, the last sentence of this paragraph that starts with “However, our proposed ML method still shows the best performance ...” should be changed to “However, our proposed ML method of using low-level barriers still shows the best performance ...”. The results of all other algorithms and data sets are not affected, and none of the conclusions of the paper are affected. The proposed method of using low-level barriers remains equally valid, since even with artificially more weakly correlated low-level data, our approach is still more data-efficient compared with not using the low-level data. In the Supporting Information, Section S1.3, on pages 4 and 5, we have corrected the Pearson correlation coefficients and mean absolute error metrics relating to the low- and high-level barriers of the E2 and SN2 data sets. We have also updated Figure S3 to show the correct relationships between the barriers of these data sets without the artificial decorrelation of the low-level barriers. In Sections S5.3 and S5.4, pages 19 and 22, we have updated Tables S11 and S13 to have the corrected values from the full results of the “ML Search” and “Bayes. Opt.” algorithms that used the low-level E2 and SN2 barriers. We further emphasize the advantage of our overall approach by including an additional section in the Supporting Information (new Section S7, pages 36–39) reporting the original results for the E2 and SN2 data sets. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscatal.5c00556. Additional details of data sets and plots of barrier distributions and correlations with low-level barriers; methods for all alternative search algorithms; results from ML model assessments with and without chemical features; hyperparamter ranges for all models throughout this work; tables of average numbers of samples used by each ML algorithm for each target barrier in each data set; additional results from analysis of machine learning model during search procedure: feature importances, mean absolute error scores, and results from using scrambled features; original results for the E2 and SN2 data sets (PDF) Correction for\n“Reformulating Reactivity Design\nfor Data-Efficient Machine Learning” 2 views 0 shares 0 downloads Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html. This article has not yet been cited by other publications.","PeriodicalId":9,"journal":{"name":"ACS Catalysis ","volume":"134 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Correction for “Reformulating Reactivity Design for Data-Efficient Machine Learning”\",\"authors\":\"Toby Lewis-Atwell, Daniel Beechey, Özgür Şimşek, Matthew N. Grayson\",\"doi\":\"10.1021/acscatal.5c00556\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A bug in the source code that was used for our original experiments in this work affected some of the results for the E2 and SN2 data sets in a minor way. These data sets taken from the literature contain reactions that are duplicated by symmetry, and therefore we removed these duplicated reactions from our training sets before running the barrier optimization experiments. However, a single missing line of code meant that the low-level barriers for these two data sets were not filtered along with the reactions and high-level barriers. This scrambled the relationship between the low- and high-level barriers, artificially reducing their correlation. Whereas the squares of the Pearson correlation coefficients between the low- and high-level barriers should have been 0.90 and 0.88 for the E2 and SN2 data sets, respectively, they were reduced to 0.56 and 0.65. With the fix implemented, we reran the experiments on the E2 and SN2 data sets and found that (as would be expected from the higher correlations) the data efficiency of the algorithms that use the low-level barriers is improved. In the original version of this work, values in the E2 and SN2 rows of the “ML Search” and “Bayes Opt.” columns of Tables 1, 2 and 3 should be changed to those given here. A very small change in the discussion is required to reflect these results. In the second paragraph on page 13511, the sentence that reads “Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with Bayesian optimization using the same features following very closely behind” should be changed to “Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with practically no difference between the ML search and Bayesian optimization using the same features”. Additionally, in this same paragraph, there are two sentences that are slightly unclear and should be updated. The sentence that begins with “We observe that the performances of these two algorithms do not deteriorate ...” should be changed to begin with “We observe that the performances of the ML and Bayesian algorithms do not deteriorate ...”. Finally, the last sentence of this paragraph that starts with “However, our proposed ML method still shows the best performance ...” should be changed to “However, our proposed ML method of using low-level barriers still shows the best performance ...”. The results of all other algorithms and data sets are not affected, and none of the conclusions of the paper are affected. The proposed method of using low-level barriers remains equally valid, since even with artificially more weakly correlated low-level data, our approach is still more data-efficient compared with not using the low-level data. In the Supporting Information, Section S1.3, on pages 4 and 5, we have corrected the Pearson correlation coefficients and mean absolute error metrics relating to the low- and high-level barriers of the E2 and SN2 data sets. We have also updated Figure S3 to show the correct relationships between the barriers of these data sets without the artificial decorrelation of the low-level barriers. In Sections S5.3 and S5.4, pages 19 and 22, we have updated Tables S11 and S13 to have the corrected values from the full results of the “ML Search” and “Bayes. Opt.” algorithms that used the low-level E2 and SN2 barriers. We further emphasize the advantage of our overall approach by including an additional section in the Supporting Information (new Section S7, pages 36–39) reporting the original results for the E2 and SN2 data sets. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscatal.5c00556. Additional details of data sets and plots of barrier distributions and correlations with low-level barriers; methods for all alternative search algorithms; results from ML model assessments with and without chemical features; hyperparamter ranges for all models throughout this work; tables of average numbers of samples used by each ML algorithm for each target barrier in each data set; additional results from analysis of machine learning model during search procedure: feature importances, mean absolute error scores, and results from using scrambled features; original results for the E2 and SN2 data sets (PDF) Correction for\\n“Reformulating Reactivity Design\\nfor Data-Efficient Machine Learning” 2 views 0 shares 0 downloads Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html. This article has not yet been cited by other publications.\",\"PeriodicalId\":9,\"journal\":{\"name\":\"ACS Catalysis \",\"volume\":\"134 1\",\"pages\":\"\"},\"PeriodicalIF\":13.1000,\"publicationDate\":\"2025-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Catalysis \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acscatal.5c00556\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Catalysis ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acscatal.5c00556","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

摘要

在我们最初的实验中使用的源代码中的一个错误以一种较小的方式影响了E2和SN2数据集的一些结果。这些从文献中提取的数据集包含对称重复的反应，因此我们在运行屏障优化实验之前从训练集中删除了这些重复的反应。然而，缺少一行代码意味着这两个数据集的低级屏障没有与反应和高级屏障一起被过滤。这搅乱了高低壁垒之间的关系，人为地降低了它们之间的相关性。而对于E2和SN2数据集，低水平和高水平障碍之间的Pearson相关系数的平方应该分别为0.90和0.88，它们被减少到0.56和0.65。实现了修复后，我们在E2和SN2数据集上重新运行了实验，发现（正如从更高的相关性中所期望的那样）使用低水平屏障的算法的数据效率得到了提高。在这项工作的原始版本中，表1、2和3的“ML Search”和“Bayes Opt.”列的E2和SN2行的值应更改为此处给出的值。为了反映这些结果，需要在讨论中进行非常小的更改。在第13511页的第二段中，“再次看到，我们基于ml的搜索方法使用低级计算障碍作为输入特征，需要最少数量的采样反应，而贝叶斯优化使用紧随其后的相同特征”这句话应该改为“再次看到，我们基于ml的搜索方法使用低级计算障碍作为输入特征，需要最少数量的采样反应。”使用相同特征的机器学习搜索和贝叶斯优化之间几乎没有区别”。此外，在同一段中，有两句话有点不清楚，应该更新。以“我们观察到这两种算法的性能没有恶化……”应该改为“我们观察到ML和贝叶斯算法的性能并没有恶化……”。最后，这段话的最后一句话以“然而，我们提出的ML方法仍然显示出最好的性能……”应该改为“然而，我们提出的使用低级屏障的ML方法仍然显示出最佳性能……”。所有其他算法和数据集的结果不受影响，本文的结论也不受影响。所提出的使用低级别屏障的方法仍然同样有效，因为即使使用人为地更弱相关的低级别数据，与不使用低级别数据相比，我们的方法仍然具有更高的数据效率。在第4页和第5页的支持信息，第S1.3节中，我们已经更正了与E2和SN2数据集的低和高障碍相关的Pearson相关系数和平均绝对误差度量。我们还更新了图S3，以显示这些数据集的屏障之间的正确关系，而无需人为地解除低层屏障的关联。在第19和22页的第S5.3和S5.4节中，我们更新了表S11和S13，以从“ML搜索”和“贝叶斯”的完整结果中获得更正值。使用低级E2和SN2屏障的算法。我们进一步强调了我们整体方法的优势，在支持信息（新的S7部分，第36-39页）中增加了一个部分，报告E2和SN2数据集的原始结果。支持信息可在https://pubs.acs.org/doi/10.1021/acscatal.5c00556免费获取。数据集的附加细节和屏障分布图以及与低水平屏障的相关性；所有备选搜索算法的方法；具有和不具有化学特征的ML模型评估结果；在整个工作中所有模型的超参数范围；每个ML算法在每个数据集中对每个目标障碍使用的平均样本数表；搜索过程中机器学习模型分析的其他结果：特征重要性、平均绝对误差分数和使用打乱特征的结果；E2和SN2数据集的原始结果（PDF）“为数据高效机器学习重新制定反应性设计”的更正2次观看0次分享0次下载大多数电子支持信息文件无需订阅ACS Web版本即可获得。这些文件可以通过文章下载用于研究用途（如果相关文章有公共使用许可链接，该许可可以允许其他用途）。如有其他用途，可通过RightsLink权限系统http://pubs.acs.org/page/copyright/permissions.html向ACS申请。这篇文章尚未被其他出版物引用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Correction for “Reformulating Reactivity Design for Data-Efficient Machine Learning”

A bug in the source code that was used for our original experiments in this work affected some of the results for the E2 and S_N2 data sets in a minor way. These data sets taken from the literature contain reactions that are duplicated by symmetry, and therefore we removed these duplicated reactions from our training sets before running the barrier optimization experiments. However, a single missing line of code meant that the low-level barriers for these two data sets were not filtered along with the reactions and high-level barriers. This scrambled the relationship between the low- and high-level barriers, artificially reducing their correlation. Whereas the squares of the Pearson correlation coefficients between the low- and high-level barriers should have been 0.90 and 0.88 for the E2 and S_N2 data sets, respectively, they were reduced to 0.56 and 0.65. With the fix implemented, we reran the experiments on the E2 and S_N2 data sets and found that (as would be expected from the higher correlations) the data efficiency of the algorithms that use the low-level barriers is improved. In the original version of this work, values in the E2 and S_N2 rows of the “ML Search” and “Bayes Opt.” columns of Tables 1, 2 and 3 should be changed to those given here. A very small change in the discussion is required to reflect these results. In the second paragraph on page 13511, the sentence that reads “Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with Bayesian optimization using the same features following very closely behind” should be changed to “Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with practically no difference between the ML search and Bayesian optimization using the same features”. Additionally, in this same paragraph, there are two sentences that are slightly unclear and should be updated. The sentence that begins with “We observe that the performances of these two algorithms do not deteriorate ...” should be changed to begin with “We observe that the performances of the ML and Bayesian algorithms do not deteriorate ...”. Finally, the last sentence of this paragraph that starts with “However, our proposed ML method still shows the best performance ...” should be changed to “However, our proposed ML method of using low-level barriers still shows the best performance ...”. The results of all other algorithms and data sets are not affected, and none of the conclusions of the paper are affected. The proposed method of using low-level barriers remains equally valid, since even with artificially more weakly correlated low-level data, our approach is still more data-efficient compared with not using the low-level data. In the Supporting Information, Section S1.3, on pages 4 and 5, we have corrected the Pearson correlation coefficients and mean absolute error metrics relating to the low- and high-level barriers of the E2 and S_N2 data sets. We have also updated Figure S3 to show the correct relationships between the barriers of these data sets without the artificial decorrelation of the low-level barriers. In Sections S5.3 and S5.4, pages 19 and 22, we have updated Tables S11 and S13 to have the corrected values from the full results of the “ML Search” and “Bayes. Opt.” algorithms that used the low-level E2 and S_N2 barriers. We further emphasize the advantage of our overall approach by including an additional section in the Supporting Information (new Section S7, pages 36–39) reporting the original results for the E2 and S_N2 data sets. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscatal.5c00556. Additional details of data sets and plots of barrier distributions and correlations with low-level barriers; methods for all alternative search algorithms; results from ML model assessments with and without chemical features; hyperparamter ranges for all models throughout this work; tables of average numbers of samples used by each ML algorithm for each target barrier in each data set; additional results from analysis of machine learning model during search procedure: feature importances, mean absolute error scores, and results from using scrambled features; original results for the E2 and S_N2 data sets (PDF) Correction for “Reformulating Reactivity Design for Data-Efficient Machine Learning” 2 views 0 shares 0 downloads Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html. This article has not yet been cited by other publications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACS Catalysis CHEMISTRY, PHYSICAL-

CiteScore

20.80

自引率

6.20%

发文量

1253

审稿时长

1.5 months

期刊介绍： ACS Catalysis is an esteemed journal that publishes original research in the fields of heterogeneous catalysis, molecular catalysis, and biocatalysis. It offers broad coverage across diverse areas such as life sciences, organometallics and synthesis, photochemistry and electrochemistry, drug discovery and synthesis, materials science, environmental protection, polymer discovery and synthesis, and energy and fuels. The scope of the journal is to showcase innovative work in various aspects of catalysis. This includes new reactions and novel synthetic approaches utilizing known catalysts, the discovery or modification of new catalysts, elucidation of catalytic mechanisms through cutting-edge investigations, practical enhancements of existing processes, as well as conceptual advances in the field. Contributions to ACS Catalysis can encompass both experimental and theoretical research focused on catalytic molecules, macromolecules, and materials that exhibit catalytic turnover.