{"title":"Transfer learning from custom-tailored virtual molecular databases to real-world organic photosensitizers for catalytic activity prediction.","authors":"Naoki Noto, Taiki Nagano, Mikito Fujinami, Ryosuke Kojima, Susumu Saito","doi":"10.1038/s42004-025-01678-w","DOIUrl":null,"url":null,"abstract":"<p><p>The scarcity of experimental training data restricts the integration of machine learning into catalysis research. Here, we report on the effectiveness of graph convolutional network (GCN) models pretrained on a molecular topological index, which is not used in typical organic synthesis, for estimating the catalytic activity, a task that usually requires high levels of human expertise. For pretraining, we used custom-tailored virtual molecular databases that can be readily constructed using either a systematic generation method or a molecular generator developed in our group. Although 94%-99% of the employed virtual molecules are unregistered in the PubChem database, the resulting pretrained GCN models improve the prediction of catalytic activity for real-world organic photosensitizers. The results demonstrate the efficiency of the present transfer-learning strategy, which leverages readily obtainable information from self-generated virtual molecules.</p>","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":"8 1","pages":"288"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488964/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1038/s42004-025-01678-w","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The scarcity of experimental training data restricts the integration of machine learning into catalysis research. Here, we report on the effectiveness of graph convolutional network (GCN) models pretrained on a molecular topological index, which is not used in typical organic synthesis, for estimating the catalytic activity, a task that usually requires high levels of human expertise. For pretraining, we used custom-tailored virtual molecular databases that can be readily constructed using either a systematic generation method or a molecular generator developed in our group. Although 94%-99% of the employed virtual molecules are unregistered in the PubChem database, the resulting pretrained GCN models improve the prediction of catalytic activity for real-world organic photosensitizers. The results demonstrate the efficiency of the present transfer-learning strategy, which leverages readily obtainable information from self-generated virtual molecules.
期刊介绍:
Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.