Dealing with a data-limited regime: Combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance
{"title":"Dealing with a data-limited regime: Combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance","authors":"Magdalena Wiercioch , Johannes Kirchmair","doi":"10.1016/j.ailsci.2021.100021","DOIUrl":null,"url":null,"abstract":"<div><p>Aqueous solubility is a key chemical property that drives various processes in chemistry and biology. Its computational prediction is challenging, as evidenced by the fact that it has been a subject of considerable interest for several decades. Recent work has explored fingerprint-based, feature-based and graph-based representations with different machine learning and deep learning methodologies. In general, many traditional methods have been proposed, but they rely heavily on the quality of the rule-based, hand-crafted features. On the other hand, limitations in the quality of aqueous solubility data become a handicap when training deep models. In this study, we have developed a novel structure-aware method for the prediction of aqueous solubility by introducing a new deep network architecture and then employing a transfer learning approach. The model was proven to be competitive, obtaining an RMSE of 0.587 during both cross-validation and a test on an independent dataset. To be more precise, the method is evaluated on molecules downloaded from the Online Chemical Database and Modeling Environment (OCHEM). Beyond aqueous solubility prediction, the strategy presented in this work may be useful for modeling any kind of (chemical or biological) properties for which there is a limited amount of data available for model training.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000210/pdfft?md5=6e2846286bacbae3a9814188cafabd4f&pid=1-s2.0-S2667318521000210-main.pdf","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318521000210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Aqueous solubility is a key chemical property that drives various processes in chemistry and biology. Its computational prediction is challenging, as evidenced by the fact that it has been a subject of considerable interest for several decades. Recent work has explored fingerprint-based, feature-based and graph-based representations with different machine learning and deep learning methodologies. In general, many traditional methods have been proposed, but they rely heavily on the quality of the rule-based, hand-crafted features. On the other hand, limitations in the quality of aqueous solubility data become a handicap when training deep models. In this study, we have developed a novel structure-aware method for the prediction of aqueous solubility by introducing a new deep network architecture and then employing a transfer learning approach. The model was proven to be competitive, obtaining an RMSE of 0.587 during both cross-validation and a test on an independent dataset. To be more precise, the method is evaluated on molecules downloaded from the Online Chemical Database and Modeling Environment (OCHEM). Beyond aqueous solubility prediction, the strategy presented in this work may be useful for modeling any kind of (chemical or biological) properties for which there is a limited amount of data available for model training.
Artificial intelligence in the life sciencesPharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)