{"title":"将学习从定制的虚拟分子数据库转移到现实世界的有机光敏剂催化活性预测。","authors":"Naoki Noto, Taiki Nagano, Mikito Fujinami, Ryosuke Kojima, Susumu Saito","doi":"10.1038/s42004-025-01678-w","DOIUrl":null,"url":null,"abstract":"<p><p>The scarcity of experimental training data restricts the integration of machine learning into catalysis research. Here, we report on the effectiveness of graph convolutional network (GCN) models pretrained on a molecular topological index, which is not used in typical organic synthesis, for estimating the catalytic activity, a task that usually requires high levels of human expertise. For pretraining, we used custom-tailored virtual molecular databases that can be readily constructed using either a systematic generation method or a molecular generator developed in our group. Although 94%-99% of the employed virtual molecules are unregistered in the PubChem database, the resulting pretrained GCN models improve the prediction of catalytic activity for real-world organic photosensitizers. The results demonstrate the efficiency of the present transfer-learning strategy, which leverages readily obtainable information from self-generated virtual molecules.</p>","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":"8 1","pages":"288"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488964/pdf/","citationCount":"0","resultStr":"{\"title\":\"Transfer learning from custom-tailored virtual molecular databases to real-world organic photosensitizers for catalytic activity prediction.\",\"authors\":\"Naoki Noto, Taiki Nagano, Mikito Fujinami, Ryosuke Kojima, Susumu Saito\",\"doi\":\"10.1038/s42004-025-01678-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The scarcity of experimental training data restricts the integration of machine learning into catalysis research. Here, we report on the effectiveness of graph convolutional network (GCN) models pretrained on a molecular topological index, which is not used in typical organic synthesis, for estimating the catalytic activity, a task that usually requires high levels of human expertise. For pretraining, we used custom-tailored virtual molecular databases that can be readily constructed using either a systematic generation method or a molecular generator developed in our group. Although 94%-99% of the employed virtual molecules are unregistered in the PubChem database, the resulting pretrained GCN models improve the prediction of catalytic activity for real-world organic photosensitizers. The results demonstrate the efficiency of the present transfer-learning strategy, which leverages readily obtainable information from self-generated virtual molecules.</p>\",\"PeriodicalId\":10529,\"journal\":{\"name\":\"Communications Chemistry\",\"volume\":\"8 1\",\"pages\":\"288\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488964/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1038/s42004-025-01678-w\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1038/s42004-025-01678-w","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Transfer learning from custom-tailored virtual molecular databases to real-world organic photosensitizers for catalytic activity prediction.
The scarcity of experimental training data restricts the integration of machine learning into catalysis research. Here, we report on the effectiveness of graph convolutional network (GCN) models pretrained on a molecular topological index, which is not used in typical organic synthesis, for estimating the catalytic activity, a task that usually requires high levels of human expertise. For pretraining, we used custom-tailored virtual molecular databases that can be readily constructed using either a systematic generation method or a molecular generator developed in our group. Although 94%-99% of the employed virtual molecules are unregistered in the PubChem database, the resulting pretrained GCN models improve the prediction of catalytic activity for real-world organic photosensitizers. The results demonstrate the efficiency of the present transfer-learning strategy, which leverages readily obtainable information from self-generated virtual molecules.
期刊介绍:
Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.