Thomas Nevolianis, Jan G. Rittig, Alexander Mitsos, Kai Leonhard
{"title":"预测甲苯/水分配系数的多保真度图神经网络","authors":"Thomas Nevolianis, Jan G. Rittig, Alexander Mitsos, Kai Leonhard","doi":"10.1186/s13321-025-01057-6","DOIUrl":null,"url":null,"abstract":"<div><p>Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available data, we apply multi-fidelity learning approaches leveraging a quantum chemical dataset (low fidelity) of approximately 9000 entries generated by COSMO-RS and an experimental dataset (high fidelity) of about 250 entries collected from the literature. We explore the <i>transfer learning</i>, <i>feature-augmented learning</i>, and <i>multi-target learning</i> approaches in combination with graph neural networks, validating them on two external datasets: one with molecules similar to training data (EXT-Zamora) and one with more challenging molecules (EXT-SAMPL9). Our results show that <i>multi-target learning</i> significantly improves predictive accuracy, achieving a root-mean-square error of 0.44 <span>\\(\\log {P}\\)</span> units for the EXT-Zamora, compared to a root-mean-square error of 0.63 <span>\\(\\log {P}\\)</span> units for single-task models. For the EXT-SAMPL9 dataset, <i>multi-target learning</i> achieves a root-mean-square error of 1.02 <span>\\(\\log {P}\\)</span> units, indicating reasonable performance even for more complex molecular structures. These findings highlight the potential of multi-fidelity learning approaches that leverage quantum chemical data to improve toluene/water partition coefficient predictions and address challenges posed by limited experimental data. We expect the applicability of the methods used beyond just toluene/water partition coefficients.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01057-6","citationCount":"0","resultStr":"{\"title\":\"Multi-fidelity graph neural networks for predicting toluene/water partition coefficients\",\"authors\":\"Thomas Nevolianis, Jan G. Rittig, Alexander Mitsos, Kai Leonhard\",\"doi\":\"10.1186/s13321-025-01057-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available data, we apply multi-fidelity learning approaches leveraging a quantum chemical dataset (low fidelity) of approximately 9000 entries generated by COSMO-RS and an experimental dataset (high fidelity) of about 250 entries collected from the literature. We explore the <i>transfer learning</i>, <i>feature-augmented learning</i>, and <i>multi-target learning</i> approaches in combination with graph neural networks, validating them on two external datasets: one with molecules similar to training data (EXT-Zamora) and one with more challenging molecules (EXT-SAMPL9). Our results show that <i>multi-target learning</i> significantly improves predictive accuracy, achieving a root-mean-square error of 0.44 <span>\\\\(\\\\log {P}\\\\)</span> units for the EXT-Zamora, compared to a root-mean-square error of 0.63 <span>\\\\(\\\\log {P}\\\\)</span> units for single-task models. For the EXT-SAMPL9 dataset, <i>multi-target learning</i> achieves a root-mean-square error of 1.02 <span>\\\\(\\\\log {P}\\\\)</span> units, indicating reasonable performance even for more complex molecular structures. These findings highlight the potential of multi-fidelity learning approaches that leverage quantum chemical data to improve toluene/water partition coefficient predictions and address challenges posed by limited experimental data. We expect the applicability of the methods used beyond just toluene/water partition coefficients.</p></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01057-6\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-025-01057-6\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01057-6","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Multi-fidelity graph neural networks for predicting toluene/water partition coefficients
Accurate prediction of toluene/water partition coefficients of neutral species is crucial in drug discovery and separation processes; however, data-driven modeling of these coefficients remains challenging due to limited available experimental data. To address the limitation of available data, we apply multi-fidelity learning approaches leveraging a quantum chemical dataset (low fidelity) of approximately 9000 entries generated by COSMO-RS and an experimental dataset (high fidelity) of about 250 entries collected from the literature. We explore the transfer learning, feature-augmented learning, and multi-target learning approaches in combination with graph neural networks, validating them on two external datasets: one with molecules similar to training data (EXT-Zamora) and one with more challenging molecules (EXT-SAMPL9). Our results show that multi-target learning significantly improves predictive accuracy, achieving a root-mean-square error of 0.44 \(\log {P}\) units for the EXT-Zamora, compared to a root-mean-square error of 0.63 \(\log {P}\) units for single-task models. For the EXT-SAMPL9 dataset, multi-target learning achieves a root-mean-square error of 1.02 \(\log {P}\) units, indicating reasonable performance even for more complex molecular structures. These findings highlight the potential of multi-fidelity learning approaches that leverage quantum chemical data to improve toluene/water partition coefficient predictions and address challenges posed by limited experimental data. We expect the applicability of the methods used beyond just toluene/water partition coefficients.
期刊介绍:
Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling.
Coverage includes, but is not limited to:
chemical information systems, software and databases, and molecular modelling,
chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases,
computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.