{"title":"Integrating a multitask graph neural network with DFT calculations for site-selectivity prediction of arenes and mechanistic knowledge generation","authors":"Xinran Chen, Zi-Jing Zhang, Xin Hong, Lutz Ackermann","doi":"10.1038/s44160-025-00770-2","DOIUrl":null,"url":null,"abstract":"The accurate prediction of reaction performance based on empirical knowledge paves the way to efficient molecule design. Compared with the human-summarized reaction knowledge of a focal dataset, the machine-learned quantitative structure–performance relationship of larger-scale datasets is more effective at accessing the entire chemical space. Here we report a multitask learning workflow combined with a mechanism-informed graph neural network to predict site selectivity for ruthenium-catalysed C–H functionalization of arenes. The multitask architecture enables the acquisition of related knowledge from the simultaneous learning tasks. The embedded reaction graph bridges the gap between previous mechanistic studies and reaction representation. Along with this mechanistic embedding, the developed multitask model demonstrates excellent interpolative and extrapolative ability on the reported dataset composed of 256 reactions, achieving an average site-selectivity prediction accuracy of 0.934 with a standard deviation of 0.007. The prediction scope ranges from simple to fused arenes and was even extended to heterocyclic indole derivatives in the additional out of sample tests containing 14 unseen instances. Furthermore, interpretation of the model promotes the development of a para-selective mechanistic model verified by density functional theory calculations. A multitask graph neural network is developed with mechanism-informed reaction graphs for site-selectivity prediction of ruthenium-catalysed C‒H functionalization of arenes. The extrapolative prediction ability of the model is verified by experimental tests. Interpretation of the model deepens our understanding of the origins of the site selectivity.","PeriodicalId":74251,"journal":{"name":"Nature synthesis","volume":"4 7","pages":"877-887"},"PeriodicalIF":20.0000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s44160-025-00770-2.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature synthesis","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s44160-025-00770-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The accurate prediction of reaction performance based on empirical knowledge paves the way to efficient molecule design. Compared with the human-summarized reaction knowledge of a focal dataset, the machine-learned quantitative structure–performance relationship of larger-scale datasets is more effective at accessing the entire chemical space. Here we report a multitask learning workflow combined with a mechanism-informed graph neural network to predict site selectivity for ruthenium-catalysed C–H functionalization of arenes. The multitask architecture enables the acquisition of related knowledge from the simultaneous learning tasks. The embedded reaction graph bridges the gap between previous mechanistic studies and reaction representation. Along with this mechanistic embedding, the developed multitask model demonstrates excellent interpolative and extrapolative ability on the reported dataset composed of 256 reactions, achieving an average site-selectivity prediction accuracy of 0.934 with a standard deviation of 0.007. The prediction scope ranges from simple to fused arenes and was even extended to heterocyclic indole derivatives in the additional out of sample tests containing 14 unseen instances. Furthermore, interpretation of the model promotes the development of a para-selective mechanistic model verified by density functional theory calculations. A multitask graph neural network is developed with mechanism-informed reaction graphs for site-selectivity prediction of ruthenium-catalysed C‒H functionalization of arenes. The extrapolative prediction ability of the model is verified by experimental tests. Interpretation of the model deepens our understanding of the origins of the site selectivity.