Kiril Lanevskij, Remigijus Didziapetris, Andrius Sazonovas
{"title":"Physicochemical QSAR analysis of hERG inhibition revisited: towards a quantitative potency prediction","authors":"Kiril Lanevskij, Remigijus Didziapetris, Andrius Sazonovas","doi":"10.1007/s10822-022-00483-0","DOIUrl":null,"url":null,"abstract":"<div><p>In an earlier study (Didziapetris R & Lanevskij K (2016). J Comput Aided Mol Des. 30:1175–1188) we collected a database of publicly available hERG inhibition data for almost 6700 drug-like molecules and built a probabilistic Gradient Boosting classifier with a minimal set of physicochemical descriptors (log <i>P</i>, p<i>K</i><sub>a</sub>, molecular size and topology parameters). This approach favored interpretability over statistical performance but still achieved an overall classification accuracy of 75%. In the current follow-up work we expanded the database (provided in Supplementary Information) to almost 9400 molecules and performed temporal validation of the model on a set of novel chemicals from recently published lead optimization projects. Validation results showed almost no performance degradation compared to the original study. Additionally, we rebuilt the model using AFT (Accelerated Failure Time) learning objective in XGBoost, which accepts both quantitative and censored data often reported in protein inhibition studies. The new model achieved a similar level of accuracy of discerning hERG blockers from non-blockers at 10 µM threshold, which can be conceived as close to the performance ceiling for methods aiming to describe only non-specific ligand interactions with hERG. Yet, this model outputs quantitative potency values (<i>IC</i><sub>50</sub>) and is not tied to a particular classification cut-off. p<i>IC</i><sub>50</sub> from patch-clamp measurements can be predicted with R<sup>2</sup> ≈ 0.4 and MAE < 0.5, which enables ligand ranking according to their expected potency levels. The employed approach can be valuable for quantitative modeling of various ADME and drug safety endpoints with a high prevalence of censored data.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"36 12","pages":"837 - 849"},"PeriodicalIF":3.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-022-00483-0.pdf","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer-Aided Molecular Design","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10822-022-00483-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 5
Abstract
In an earlier study (Didziapetris R & Lanevskij K (2016). J Comput Aided Mol Des. 30:1175–1188) we collected a database of publicly available hERG inhibition data for almost 6700 drug-like molecules and built a probabilistic Gradient Boosting classifier with a minimal set of physicochemical descriptors (log P, pKa, molecular size and topology parameters). This approach favored interpretability over statistical performance but still achieved an overall classification accuracy of 75%. In the current follow-up work we expanded the database (provided in Supplementary Information) to almost 9400 molecules and performed temporal validation of the model on a set of novel chemicals from recently published lead optimization projects. Validation results showed almost no performance degradation compared to the original study. Additionally, we rebuilt the model using AFT (Accelerated Failure Time) learning objective in XGBoost, which accepts both quantitative and censored data often reported in protein inhibition studies. The new model achieved a similar level of accuracy of discerning hERG blockers from non-blockers at 10 µM threshold, which can be conceived as close to the performance ceiling for methods aiming to describe only non-specific ligand interactions with hERG. Yet, this model outputs quantitative potency values (IC50) and is not tied to a particular classification cut-off. pIC50 from patch-clamp measurements can be predicted with R2 ≈ 0.4 and MAE < 0.5, which enables ligand ranking according to their expected potency levels. The employed approach can be valuable for quantitative modeling of various ADME and drug safety endpoints with a high prevalence of censored data.
期刊介绍:
The Journal of Computer-Aided Molecular Design provides a form for disseminating information on both the theory and the application of computer-based methods in the analysis and design of molecules. The scope of the journal encompasses papers which report new and original research and applications in the following areas:
- theoretical chemistry;
- computational chemistry;
- computer and molecular graphics;
- molecular modeling;
- protein engineering;
- drug design;
- expert systems;
- general structure-property relationships;
- molecular dynamics;
- chemical database development and usage.