Bryn Marie Reimer, Ernest Awoonor-Williams, Andrei A Golosov, Viktor Hornak
{"title":"CovCysPredictor: Predicting Selective Covalently Modifiable Cysteines Using Protein Structure and Interpretable Machine Learning.","authors":"Bryn Marie Reimer, Ernest Awoonor-Williams, Andrei A Golosov, Viktor Hornak","doi":"10.1021/acs.jcim.4c01281","DOIUrl":null,"url":null,"abstract":"<p><p>Targeted covalent inhibition is a powerful therapeutic modality in the drug discoverer's toolbox. Recent advances in covalent drug discovery, in particular, targeting cysteines, have led to significant breakthroughs for traditionally challenging targets such as mutant KRAS, which is implicated in diverse human cancers. However, identifying cysteines for targeted covalent inhibition is a difficult task, as experimental and in silico tools have shown limited accuracy. Using the recently released CovPDB and CovBinderInPDB databases, we have trained and tested interpretable machine learning (ML) models to identify cysteines that are liable to be covalently modified (i.e., \"ligandable\" cysteines). We explored myriad physicochemical features (p<i>K</i><sub>a</sub>, solvent exposure, residue electrostatics, etc.) and protein-ligand pocket descriptors in our ML models. Our final logistic regression model achieved a median F<sub>1</sub> score of 0.73 on held-out test sets. When tested on a small sample of <i>holo</i> proteins, our model also showed reasonable performance, accurately predicting the most ligandable cysteine in most cases. Taken together, these results indicate that we can accurately predict potential ligandable cysteines for targeted covalent drug discovery, privileging cysteines that are more likely to be selective rather than purely reactive. We release this tool to the scientific community as CovCysPredictor.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"544-553"},"PeriodicalIF":5.6000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01281","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Targeted covalent inhibition is a powerful therapeutic modality in the drug discoverer's toolbox. Recent advances in covalent drug discovery, in particular, targeting cysteines, have led to significant breakthroughs for traditionally challenging targets such as mutant KRAS, which is implicated in diverse human cancers. However, identifying cysteines for targeted covalent inhibition is a difficult task, as experimental and in silico tools have shown limited accuracy. Using the recently released CovPDB and CovBinderInPDB databases, we have trained and tested interpretable machine learning (ML) models to identify cysteines that are liable to be covalently modified (i.e., "ligandable" cysteines). We explored myriad physicochemical features (pKa, solvent exposure, residue electrostatics, etc.) and protein-ligand pocket descriptors in our ML models. Our final logistic regression model achieved a median F1 score of 0.73 on held-out test sets. When tested on a small sample of holo proteins, our model also showed reasonable performance, accurately predicting the most ligandable cysteine in most cases. Taken together, these results indicate that we can accurately predict potential ligandable cysteines for targeted covalent drug discovery, privileging cysteines that are more likely to be selective rather than purely reactive. We release this tool to the scientific community as CovCysPredictor.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.