Toby Lewis-Atwell, Daniel Beechey, Özgür Şimşek, Matthew N. Grayson
{"title":"更正“reformulation reactive Design for Data-Efficient Machine Learning”","authors":"Toby Lewis-Atwell, Daniel Beechey, Özgür Şimşek, Matthew N. Grayson","doi":"10.1021/acscatal.5c00556","DOIUrl":null,"url":null,"abstract":"A bug in the source code that was used for our original experiments in this work affected some of the results for the E2 and S<sub>N</sub>2 data sets in a minor way. These data sets taken from the literature contain reactions that are duplicated by symmetry, and therefore we removed these duplicated reactions from our training sets before running the barrier optimization experiments. However, a single missing line of code meant that the low-level barriers for these two data sets were not filtered along with the reactions and high-level barriers. This scrambled the relationship between the low- and high-level barriers, artificially reducing their correlation. Whereas the squares of the Pearson correlation coefficients between the low- and high-level barriers should have been 0.90 and 0.88 for the E2 and S<sub>N</sub>2 data sets, respectively, they were reduced to 0.56 and 0.65. With the fix implemented, we reran the experiments on the E2 and S<sub>N</sub>2 data sets and found that (as would be expected from the higher correlations) the data efficiency of the algorithms that use the low-level barriers is improved. In the original version of this work, values in the E2 and S<sub>N</sub>2 rows of the “ML Search” and “Bayes Opt.” columns of Tables 1, 2 and 3 should be changed to those given here. A very small change in the discussion is required to reflect these results. In the second paragraph on page 13511, the sentence that reads <i>“Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with Bayesian optimization using the same features following very closely behind”</i> should be changed to <i>“Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with practically no difference between the ML search and Bayesian optimization using the same features”</i>. Additionally, in this same paragraph, there are two sentences that are slightly unclear and should be updated. The sentence that begins with <i>“We observe that the performances of these two algorithms do not deteriorate ...”</i> should be changed to begin with <i>“We observe that the performances of the ML and Bayesian algorithms do not deteriorate ...”</i>. Finally, the last sentence of this paragraph that starts with <i>“However, our proposed ML method still shows the best performance ...”</i> should be changed to <i>“However, our proposed ML method of using low-level barriers still shows the best performance ...”</i>. The results of all other algorithms and data sets are not affected, and none of the conclusions of the paper are affected. The proposed method of using low-level barriers remains equally valid, since even with artificially more weakly correlated low-level data, our approach is still more data-efficient compared with not using the low-level data. In the Supporting Information, Section S1.3, on pages 4 and 5, we have corrected the Pearson correlation coefficients and mean absolute error metrics relating to the low- and high-level barriers of the E2 and S<sub>N</sub>2 data sets. We have also updated Figure S3 to show the correct relationships between the barriers of these data sets without the artificial decorrelation of the low-level barriers. In Sections S5.3 and S5.4, pages 19 and 22, we have updated Tables S11 and S13 to have the corrected values from the full results of the “ML Search” and “Bayes. Opt.” algorithms that used the low-level E2 and S<sub>N</sub>2 barriers. We further emphasize the advantage of our overall approach by including an additional section in the Supporting Information (new Section S7, pages 36–39) reporting the original results for the E2 and S<sub>N</sub>2 data sets. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscatal.5c00556. Additional details of data sets and plots of barrier distributions and correlations with low-level barriers; methods for all alternative search algorithms; results from ML model assessments with and without chemical features; hyperparamter ranges for all models throughout this work; tables of average numbers of samples used by each ML algorithm for each target barrier in each data set; additional results from analysis of machine learning model during search procedure: feature importances, mean absolute error scores, and results from using scrambled features; original results for the E2 and S<sub>N</sub>2 data sets (PDF) Correction for\n“Reformulating Reactivity Design\nfor Data-Efficient Machine Learning” <span> 2 </span><span> views </span> <span> 0 </span><span> shares </span> <span> 0 </span><span> downloads </span> Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html. This article has not yet been cited by other publications.","PeriodicalId":9,"journal":{"name":"ACS Catalysis ","volume":"134 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Correction for “Reformulating Reactivity Design for Data-Efficient Machine Learning”\",\"authors\":\"Toby Lewis-Atwell, Daniel Beechey, Özgür Şimşek, Matthew N. Grayson\",\"doi\":\"10.1021/acscatal.5c00556\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A bug in the source code that was used for our original experiments in this work affected some of the results for the E2 and S<sub>N</sub>2 data sets in a minor way. These data sets taken from the literature contain reactions that are duplicated by symmetry, and therefore we removed these duplicated reactions from our training sets before running the barrier optimization experiments. However, a single missing line of code meant that the low-level barriers for these two data sets were not filtered along with the reactions and high-level barriers. This scrambled the relationship between the low- and high-level barriers, artificially reducing their correlation. Whereas the squares of the Pearson correlation coefficients between the low- and high-level barriers should have been 0.90 and 0.88 for the E2 and S<sub>N</sub>2 data sets, respectively, they were reduced to 0.56 and 0.65. With the fix implemented, we reran the experiments on the E2 and S<sub>N</sub>2 data sets and found that (as would be expected from the higher correlations) the data efficiency of the algorithms that use the low-level barriers is improved. In the original version of this work, values in the E2 and S<sub>N</sub>2 rows of the “ML Search” and “Bayes Opt.” columns of Tables 1, 2 and 3 should be changed to those given here. A very small change in the discussion is required to reflect these results. In the second paragraph on page 13511, the sentence that reads <i>“Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with Bayesian optimization using the same features following very closely behind”</i> should be changed to <i>“Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with practically no difference between the ML search and Bayesian optimization using the same features”</i>. Additionally, in this same paragraph, there are two sentences that are slightly unclear and should be updated. The sentence that begins with <i>“We observe that the performances of these two algorithms do not deteriorate ...”</i> should be changed to begin with <i>“We observe that the performances of the ML and Bayesian algorithms do not deteriorate ...”</i>. Finally, the last sentence of this paragraph that starts with <i>“However, our proposed ML method still shows the best performance ...”</i> should be changed to <i>“However, our proposed ML method of using low-level barriers still shows the best performance ...”</i>. The results of all other algorithms and data sets are not affected, and none of the conclusions of the paper are affected. The proposed method of using low-level barriers remains equally valid, since even with artificially more weakly correlated low-level data, our approach is still more data-efficient compared with not using the low-level data. In the Supporting Information, Section S1.3, on pages 4 and 5, we have corrected the Pearson correlation coefficients and mean absolute error metrics relating to the low- and high-level barriers of the E2 and S<sub>N</sub>2 data sets. We have also updated Figure S3 to show the correct relationships between the barriers of these data sets without the artificial decorrelation of the low-level barriers. In Sections S5.3 and S5.4, pages 19 and 22, we have updated Tables S11 and S13 to have the corrected values from the full results of the “ML Search” and “Bayes. Opt.” algorithms that used the low-level E2 and S<sub>N</sub>2 barriers. We further emphasize the advantage of our overall approach by including an additional section in the Supporting Information (new Section S7, pages 36–39) reporting the original results for the E2 and S<sub>N</sub>2 data sets. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscatal.5c00556. Additional details of data sets and plots of barrier distributions and correlations with low-level barriers; methods for all alternative search algorithms; results from ML model assessments with and without chemical features; hyperparamter ranges for all models throughout this work; tables of average numbers of samples used by each ML algorithm for each target barrier in each data set; additional results from analysis of machine learning model during search procedure: feature importances, mean absolute error scores, and results from using scrambled features; original results for the E2 and S<sub>N</sub>2 data sets (PDF) Correction for\\n“Reformulating Reactivity Design\\nfor Data-Efficient Machine Learning” <span> 2 </span><span> views </span> <span> 0 </span><span> shares </span> <span> 0 </span><span> downloads </span> Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html. This article has not yet been cited by other publications.\",\"PeriodicalId\":9,\"journal\":{\"name\":\"ACS Catalysis \",\"volume\":\"134 1\",\"pages\":\"\"},\"PeriodicalIF\":13.1000,\"publicationDate\":\"2025-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Catalysis \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acscatal.5c00556\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Catalysis ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acscatal.5c00556","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
Correction for “Reformulating Reactivity Design for Data-Efficient Machine Learning”
A bug in the source code that was used for our original experiments in this work affected some of the results for the E2 and SN2 data sets in a minor way. These data sets taken from the literature contain reactions that are duplicated by symmetry, and therefore we removed these duplicated reactions from our training sets before running the barrier optimization experiments. However, a single missing line of code meant that the low-level barriers for these two data sets were not filtered along with the reactions and high-level barriers. This scrambled the relationship between the low- and high-level barriers, artificially reducing their correlation. Whereas the squares of the Pearson correlation coefficients between the low- and high-level barriers should have been 0.90 and 0.88 for the E2 and SN2 data sets, respectively, they were reduced to 0.56 and 0.65. With the fix implemented, we reran the experiments on the E2 and SN2 data sets and found that (as would be expected from the higher correlations) the data efficiency of the algorithms that use the low-level barriers is improved. In the original version of this work, values in the E2 and SN2 rows of the “ML Search” and “Bayes Opt.” columns of Tables 1, 2 and 3 should be changed to those given here. A very small change in the discussion is required to reflect these results. In the second paragraph on page 13511, the sentence that reads “Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with Bayesian optimization using the same features following very closely behind” should be changed to “Again it is seen that our ML-based search approach using the low-level calculated barriers as an input feature requires the smallest numbers of sampled reactions, with practically no difference between the ML search and Bayesian optimization using the same features”. Additionally, in this same paragraph, there are two sentences that are slightly unclear and should be updated. The sentence that begins with “We observe that the performances of these two algorithms do not deteriorate ...” should be changed to begin with “We observe that the performances of the ML and Bayesian algorithms do not deteriorate ...”. Finally, the last sentence of this paragraph that starts with “However, our proposed ML method still shows the best performance ...” should be changed to “However, our proposed ML method of using low-level barriers still shows the best performance ...”. The results of all other algorithms and data sets are not affected, and none of the conclusions of the paper are affected. The proposed method of using low-level barriers remains equally valid, since even with artificially more weakly correlated low-level data, our approach is still more data-efficient compared with not using the low-level data. In the Supporting Information, Section S1.3, on pages 4 and 5, we have corrected the Pearson correlation coefficients and mean absolute error metrics relating to the low- and high-level barriers of the E2 and SN2 data sets. We have also updated Figure S3 to show the correct relationships between the barriers of these data sets without the artificial decorrelation of the low-level barriers. In Sections S5.3 and S5.4, pages 19 and 22, we have updated Tables S11 and S13 to have the corrected values from the full results of the “ML Search” and “Bayes. Opt.” algorithms that used the low-level E2 and SN2 barriers. We further emphasize the advantage of our overall approach by including an additional section in the Supporting Information (new Section S7, pages 36–39) reporting the original results for the E2 and SN2 data sets. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acscatal.5c00556. Additional details of data sets and plots of barrier distributions and correlations with low-level barriers; methods for all alternative search algorithms; results from ML model assessments with and without chemical features; hyperparamter ranges for all models throughout this work; tables of average numbers of samples used by each ML algorithm for each target barrier in each data set; additional results from analysis of machine learning model during search procedure: feature importances, mean absolute error scores, and results from using scrambled features; original results for the E2 and SN2 data sets (PDF) Correction for
“Reformulating Reactivity Design
for Data-Efficient Machine Learning” 2 views 0 shares 0 downloads Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html. This article has not yet been cited by other publications.
期刊介绍:
ACS Catalysis is an esteemed journal that publishes original research in the fields of heterogeneous catalysis, molecular catalysis, and biocatalysis. It offers broad coverage across diverse areas such as life sciences, organometallics and synthesis, photochemistry and electrochemistry, drug discovery and synthesis, materials science, environmental protection, polymer discovery and synthesis, and energy and fuels.
The scope of the journal is to showcase innovative work in various aspects of catalysis. This includes new reactions and novel synthetic approaches utilizing known catalysts, the discovery or modification of new catalysts, elucidation of catalytic mechanisms through cutting-edge investigations, practical enhancements of existing processes, as well as conceptual advances in the field. Contributions to ACS Catalysis can encompass both experimental and theoretical research focused on catalytic molecules, macromolecules, and materials that exhibit catalytic turnover.