{"title":"The Importance of Atomic Charges for Predicting Site-Selective Ir-, Ru-, and Rh-Catalyzed C–H Borylations","authors":"Shannon M. Stephens, Kyle M. Lambert","doi":"10.1021/acs.joc.5c00343","DOIUrl":null,"url":null,"abstract":"A supervised machine learning model has been developed that allows for the prediction of site selectivity in late-stage C–H borylations. Model development was accomplished using literature data for the site-selective (≥95%) C–H borylation of 189 unique arene, heteroarene, and aliphatic substrates that feature a total of 971 possible sp<sup>2</sup> or sp<sup>3</sup> C–H borylation sites. The reported experimental data was supplemented with additional chemoinformatic descriptors, computed atomic charges at the C–H borylation sites, and data from parameterization of catalytically active tris-boryl complexes resulting from the combination of seven different Ir-, Ru-, and Rh-based precatalysts with eight different ligands. Of the over 1600 parameters investigated, the computed atomic charges (e.g., Hirshfeld, ChelpG, and Mulliken charges) on the hydrogen and carbon atoms at the site of borylation were identified as the most important features that allow for the successful prediction of whether a particular C–H bond will undergo a site-selective borylation. The overall accuracy of the developed model was 88.9% ± 2.5% with precision, recall, and F1 scores of 92–95% for the nonborylating sites and 65–75% for the sites of borylation. The model was demonstrated to be generalizable to molecules outside of the training/test sets with an additional validation set of 12 electronically and structurally diverse systems.","PeriodicalId":57,"journal":{"name":"Journal of Organic Chemistry","volume":"20 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Organic Chemistry","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1021/acs.joc.5c00343","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ORGANIC","Score":null,"Total":0}
引用次数: 0
Abstract
A supervised machine learning model has been developed that allows for the prediction of site selectivity in late-stage C–H borylations. Model development was accomplished using literature data for the site-selective (≥95%) C–H borylation of 189 unique arene, heteroarene, and aliphatic substrates that feature a total of 971 possible sp2 or sp3 C–H borylation sites. The reported experimental data was supplemented with additional chemoinformatic descriptors, computed atomic charges at the C–H borylation sites, and data from parameterization of catalytically active tris-boryl complexes resulting from the combination of seven different Ir-, Ru-, and Rh-based precatalysts with eight different ligands. Of the over 1600 parameters investigated, the computed atomic charges (e.g., Hirshfeld, ChelpG, and Mulliken charges) on the hydrogen and carbon atoms at the site of borylation were identified as the most important features that allow for the successful prediction of whether a particular C–H bond will undergo a site-selective borylation. The overall accuracy of the developed model was 88.9% ± 2.5% with precision, recall, and F1 scores of 92–95% for the nonborylating sites and 65–75% for the sites of borylation. The model was demonstrated to be generalizable to molecules outside of the training/test sets with an additional validation set of 12 electronically and structurally diverse systems.
期刊介绍:
Journal of Organic Chemistry welcomes original contributions of fundamental research in all branches of the theory and practice of organic chemistry. In selecting manuscripts for publication, the editors place emphasis on the quality and novelty of the work, as well as the breadth of interest to the organic chemistry community.