Rafael F Veríssimo,Pedro H F Matias,Mateus R Barbosa,Flávio O S Neto,Brenno A D Neto,Heibbe C B de Oliveira
{"title":"Integrating Machine Learning and SHAP Analysis to Advance the Rational Design of Benzothiadiazole Derivatives with Tailored Photophysical Properties.","authors":"Rafael F Veríssimo,Pedro H F Matias,Mateus R Barbosa,Flávio O S Neto,Brenno A D Neto,Heibbe C B de Oliveira","doi":"10.1021/acs.jcim.4c02414","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02414","url":null,"abstract":"2,1,3-Benzothiadiazole (BTD) derivatives show promise in advanced photophysical applications, but designing molecules with optimal desired properties remains challenging due to complex structure-property relationships. Existing computational methods have a high cost when predicting precise photophysical characteristics. Machine learning with Morgan fingerprints was employed to forecast BTD derivative maximum absorption and emission wavelengths. Three flavors of machine learning models were applied, namely, Random Forest, LigthGBM, and XGBoost. Random forest achieved R2 values of 0.92 for absorption and 0.89 for emission, validated internally with 10-fold cross-validations and externally with recent experimental data. SHapley Additive exPlanations (SHAP) analysis revealed critical design insights, highlighting the tertiary amine presence and solvent polarity as key drivers of red-shifted emissions. By the development of a web-based predictive tool, the potential of machine learning to accelerate molecular design is demonstrated, providing researchers a powerful approach to engineer BTD derivatives with enhanced photophysical properties.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"20 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143893107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure-Directed Pan-Specific T-Cell Receptor-Peptide-Major Histocompatibility Complex Interaction Prediction.","authors":"Letao Gao,Yumeng Zhang,Fang Ge,Shanshan Li,Yuming Guo,Jiangning Song,Dong-Jun Yu","doi":"10.1021/acs.jcim.5c00055","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00055","url":null,"abstract":"T-cell receptors (TCRs) play a pivotal role in the adaptive immune system, and understanding their antigen recognition mechanism remains a critical area of research. With the increasing availability of binding and interaction data between TCRs and peptide-major histocompatibility complexes (pMHCs), data-driven computational methods are emerging as powerful tools with significant potential for advancement. In this study, we collected and curated comprehensive sequence and structure data sets of TCRs from human CD8+ T-cells and cognate epitopes presented by MHC class I molecules. We developed two innovative computational frameworks: SG-TPMI, a lightweight, extensible, and structure-guided model for predicting TCR-pMHC binding specificity, and Seq/Struct-TCS, a pair of models (sequence-based and structure-based) for predicting contact sites within TCR-pMHC complexes. Notably, we directly integrated MHC-I alpha helices (or pseudosequences) and structural information on the protein complex into the prediction models. Our comprehensive modeling approach enabled quantitative investigations of TCR-pMHC interaction mechanisms, empowering SG-TPMI and Struct-TCS to achieve performances comparable to those of state-of-the-art methods. Furthermore, our results highlight the necessity of CDR1 and CDR2 loops as well as MHC restriction in pan-specific TCR-pMHC interaction prediction, providing new insights into TCR recognition. In summary, we not only propose SG-TPMI as an effective computational method for predicting TCR-pMHC binary interactions but also introduce the Seq/Struct-TCS design for predicting TCR interacting sites with peptide or MHC alpha helices.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"36 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IFPTML Multi-Output Model for Anti-Retroviral Compounds Including the Drug Structure and Target Protein Sequence Information.","authors":"Emilia Vásquez-Domínguez,Shan He,Carlos Santolaria,Sonia Arrasate,Humbert González-Díaz","doi":"10.1021/acs.jcim.5c00242","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00242","url":null,"abstract":"Retroviruses such as HIV cause significant diseases in humans and other organisms, making the discovery of antiretroviral (ARV) drugs a critical priority. While databases like ChEMBL contain valuable information, their complexity poses challenges. The data set includes approximately >140,000 assays across eight viruses, encompassing >350 biological activity parameters, >50 target proteins, >80 cell lines, >60 assay organisms, and >770 viral strains. Artificial Intelligence/Machine Learning (AI/ML) models offer a promising approach to accelerate ARV discovery. Recently, we developed AI/ML models for ChEMBL ARV data using the Information Fusion Perturbation Theory and Machine Learning (IFPTML) strategy. However, neither existing AI/ML models nor our prior IFPTML implementation simultaneously incorporates viral protein sequences, strains, cell lines, assay organisms, or virus/human mutations. This limitation renders them ineffective for predicting activity against amino acid sequence variations (e.g., mutations, variants, or emerging strains)─a critical shortcoming given the well-documented prevalence of drug-resistance mutations in marketed ARVs. In this work, we present an enhanced IFPTML model integrating protein sequence descriptors. We computed and incorporated sequence descriptors for all drug target proteins in ChEMBL, derived from proteomes of retroviruses (HIV, FeLV, MMV, SIV, etc.). The model demonstrated robust performance, with sensitivity (Sn), specificity (Sp), and accuracy (Ac) values ranging between 72.0 and 88.0% in both training and validation phases. We analyze its predictions for protein mutations documented in ChEMBL and other literature sources. To our knowledge, this represents the first unified multicondition, multioutput model for ARV discovery that systematically accounts for protein sequence information.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"30 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PROFIS: Design of Target-Focused Libraries by Probing Continuous Fingerprint Space with Recurrent Neural Networks.","authors":"Hubert Rybka,Tomasz Danel,Sabina Podlewska","doi":"10.1021/acs.jcim.5c00698","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00698","url":null,"abstract":"This study introduces PROFIS, a new generative model capable of the design of structurally novel and target-focused compound libraries. The model relies on a recurrent neural network that was trained to decode embedded molecular fingerprints into SMILES strings. To identify potential novel ligands, a biological activity predictor is first trained on the low-dimensional fingerprint embedding space, enabling the identification of high-activity subspaces for a given drug target. The search for latent representations that are expected to yield active structures upon decoding to SMILES is conducted with a Bayesian optimization algorithm. We present the rationale for using SMILES as the output notation of the recurrent neural network and compare its performance with models trained to decode DeepSMILES and SELFIES strings. The paper demonstrates the application of this protocol to generate candidate ligands of the dopamine D2 receptor. It also emphasizes the effectiveness of our approach in scaffold-hopping, which is valuable for designing ligands outside the already explored chemical space. We present how passing engineered molecular fingerprints through PROFIS network can be utilized to generate diverse libraries of analogs for a drug molecule of choice. It is worth noting that the protocol is versatile and it can be employed for any biological target, given the availability of a dataset containing known ligands. The potential for widespread use of PROFIS is secured by scripts shared by the authors on GitHub.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"42 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian Medel-Lacruz,Albert Herrero,Fernando Martín,Enric Herrero,F Javier Luque,Javier Vázquez
{"title":"Synthon-Based Strategies Exploiting Molecular Similarity and Protein-Ligand Interactions for Efficient Screening of Ultra-Large Chemical Libraries.","authors":"Brian Medel-Lacruz,Albert Herrero,Fernando Martín,Enric Herrero,F Javier Luque,Javier Vázquez","doi":"10.1021/acs.jcim.5c00222","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00222","url":null,"abstract":"The rapid expansion of ultralarge chemical libraries has revolutionized drug discovery, providing access to billions of compounds. However, this growth poses relevant challenges for traditional virtual screening (VS) methods. To address these limitations, synthon-based approaches have emerged as scalable alternatives, exploiting combinatorial chemistry principles to prioritize building blocks over enumerated molecules. In this work, we present exaScreen and exaDock, two novel synthon-based methodologies designed for ligand-based and structure-based VS, respectively. In the former case, synthon selection is guided by the 3D hydrophobic/philic distribution pattern in conjunction with a specific synthon alignment protocol based on a quadrupolar expansion over the atoms that participate in the linking bonds between fragments. On the other hand, accommodation to the binding site under a geometrically restrained docking of synthon-based hybrid compounds is used in the selection of the optimal synthon combinations. These strategies exhibit comparable performance to the search performed using fully enumerated libraries in identifying active compounds with significantly lower computational cost, offering computationally efficient strategies for VS in ultralarge chemical spaces.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"1 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revealing New Insights into the Dynamics of Human Aromatase Interacting with Cytochrome P450 Reductase in a Realistic Membrane Environment.","authors":"Sana Manzoor,Thomas S Hofer,Syed Tarique Moin","doi":"10.1021/acs.jcim.5c00103","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00103","url":null,"abstract":"Molecular dynamics simulations were applied to human aromatase (HA) complexed with cytochrome P450 reductase (CPR) within a realistic endoplasmic reticulum membrane environment to evaluate its structural and dynamical properties. CPR was examined to have a specific point mutation (P281T), where proline was substituted by threonine, which is envisaged to demonstrate a far-reaching influence on its structure, dynamics, and electron transfer behavior. Since CPR plays a key role in the electron transfer to HA, catalyzing steroidogenesis, obtaining detailed information on the mutation effect of CPR on HA was crucial. This compelled us to study the interaction of HA with CPR in its wild-type and mutant forms, enabling the investigation of different properties of CPR and its effect on HA dynamics. Pursuing these objectives, different analytical parameters, notably, root-mean-square deviation, root-mean-square fluctuation, dynamic cross-correlation matrix, and principal component analysis were applied to gain insight into the conformational dynamics of the HA/CPR complex. These analyses demonstrated the CPR shifts in the complexes' dynamics, in the context of the effect of CPR mutation effects on HA behavior. Based on this information, the electron transfer within the HA/CPR complex was also envisaged to be influenced by the contrast dynamics between the two complexes as the mutation was evaluated to significantly alter the dynamics of CPR as well as HA. Furthermore, the electron transfer within the complexes was determined by applying Marcus theory of electron transfer, revealing a contrast between the HA/wild-type and mutant CPR complexes. The latter was found to alter the electron transfer efficiency, demonstrating a direct effect of changes in the protein dynamics observed within the HA/mutant CPR complex. This study therefore provides valuable insights into the conformational dynamics of HA in conjunction with CPR, affecting the electron transfer process and their potential implications for understanding estrogen-related physiological conditions influenced by these proteins.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"18 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of Native Environment in Multiheme-Cytochrome Chains of the MtrCAB Complex.","authors":"Sasthi C Mandal,Ronit Sarangi,Atanu Acharya","doi":"10.1021/acs.jcim.4c02382","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02382","url":null,"abstract":"MtrCAB protein complex plays a crucial role in exporting electrons through the outer membrane (OM) to external acceptors. This complex consists of three proteins and contains 20 hemes. Optimal protein-protein interactions and, consequently, heme-heme interactions facilitate efficient electron transfer through the conduit of hemes. The cytochrome MtrA remains mostly inside porin MtrB, and the MtrB barrel contains two calcium ions on its surface. In this study, we investigate the effect of porin-bound calcium ions on the heme-heme distances in the twenty-heme network. We performed all-atom molecular dynamics simulations of the OM-protein complex, MtrCAB, in the presence and absence of the MtrB-bound calcium ions. We observe that the calcium ions bound to MtrB affect the interfacial heme-heme distance when all of the hemes are oxidized and impact one of the heme-heme distances in MtrC when all of the hemes are reduced. In both cases, the absence of calcium ions increases the heme-heme distance, highlighting the crucial role of calcium ions in maintaining the heme network, which is essential for long-range charge transport.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143880200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm for Efficient Superposition and Clustering of Molecular Assemblies Using the Branch-and-Bound Method.","authors":"Yuki Yamamoto","doi":"10.1021/acs.jcim.4c02217","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02217","url":null,"abstract":"The root-mean-square deviation (RMSD) is one of the most common metrics for comparing the similarity of three-dimensional chemical structures. The chemical structure similarity plays an important role in data chemistry because it is closely related to chemical reactivity, physical properties, and bioactivity. Despite the wide applicability of the RMSD, the simultaneous determination of atom mapping and spatial superposition of RMSD remains a challenging problem to solve in polynomial time. We introduce an algorithm called mobbRMSD, which is formulated in molecular-oriented coordinates and uses the branch-and-bound method to obtain an exact solution for the RMSD. mobbRMSD can efficiently handle a wide range of chemical systems, such as molecular liquids, solute solvations, and self-assembly of large molecules, using chemical knowledge such as atom types, chemical bonding, and chirality. In benchmarks involving small molecular aggregates, mobbRMSD extends the limiting system size of existing exact solution methods by almost twice. Furthermore, mobbRMSD demonstrated the ability to analyze the structural similarity of large molecular micelles, which has been difficult with previous methods. We also propose a mobbRMSD-based structural clustering method designed for molecular dynamics trajectories, which improves the computational cost of branch-and-bound methods to asymptotically average the polynomial time as the number of data increases. Our algorithm is freely available at https://github.com/yymmt742/mobbrmsd.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"16 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting Drug-Disease Association Prediction for Drug Repositioning via Dual-Feature Extraction and Cross-Dual-Domain Decoding.","authors":"Enqiang Zhu,Xiang Li,Chanjuan Liu,Nikhil R Pal","doi":"10.1021/acs.jcim.5c00070","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00070","url":null,"abstract":"The extraction of biomedical data has significant academic and practical value in contemporary biomedical sciences. In recent years, drug repositioning, a cost-effective strategy for drug development by discovering new indications for approved drugs, has gained increasing attention. However, many existing drug repositioning methods focus on mining information from adjacent nodes in biomedical networks without considering the potential inter-relationships between the feature spaces of drugs and diseases. This can lead to inaccurate encoding, resulting in biased mined drug-disease association information. To address this limitation, we propose a new model called Dual-Feature Drug Repurposing Neural Network (DFDRNN). DFDRNN allows the mining of two features (similarity and association) from the drug-disease biomedical networks to encode drugs and diseases. A self-attention mechanism is utilized to extract neighbor feature information. It incorporates two dual-feature extraction modules: the single-domain dual-feature extraction (SDDFE) module for extracting features within a single domain (drugs or diseases) and the cross-domain dual-feature extraction (CDDFE) module for extracting features across domains. By utilizing these modules, we ensure more appropriate encoding of drugs and diseases. A cross-dual-domain decoder is also designed to predict drug-disease associations in both domains. Our proposed DFDRNN model outperforms six state-of-the-art methods on four benchmark data sets, achieving an average AUROC of 0.946 and an average AUPR of 0.597. Case studies on three diseases show that the proposed DFDRNN model can be applied in real-world scenarios, demonstrating its significant potential in drug repositioning.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"72 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nathaniel Charest,Gabriel Sinclair,Stephanie A Eytcheson,Daniel T Chang,Todd M Martin,Charles N Lowe,Katie Paul Friedman,Antony J Williams
{"title":"Combined In Vitro and In Silico Workflow to Deliver Robust, Transparent, and Contextually Rigorous Models of Bioactivity.","authors":"Nathaniel Charest,Gabriel Sinclair,Stephanie A Eytcheson,Daniel T Chang,Todd M Martin,Charles N Lowe,Katie Paul Friedman,Antony J Williams","doi":"10.1021/acs.jcim.5c00713","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00713","url":null,"abstract":"New approach methodologies (NAMs) are an increasing priority in the field of toxicology to fill data gaps and reduce time and resources in chemical safety assessment. We describe an NAMs workflow that integrates an in vitro high-throughput bioassay with an in silico computational model. In defining this workflow, we propose, as a crucial step of in silico development, the identification of explicit \"purpose contexts\": a priori definitions of the scope and intent of an in silico solution, which provide natural targets for the mechanistic interpretation, validation, and output design of the model. By inspecting data from an in vitro assay measuring the displacement of fluorescent probe 8-anilino-1-naphthalenesulfonic acid (ANSA) from the serum transport protein transthyretin (TTR) as a proxy for potential disruption of thyroxine (T4) binding, in collaboration with the experimenters, we developed three relevant purpose contexts for this in silico modeling effort: (1) examination and confirmation of the in vitro assay principle via orthogonal information, (2) immediate integration with the in vitro experimental cycle to reduce costs and enhance hit rates, and (3) ultimate replacement of the use of single-concentration screening as a prioritization strategy for bioactivity testing of bulk chemical libraries. From these purpose contexts, we derived the foundations of a robust and transparent quantitative structure-activity relationship (QSAR) model that is constructively fit for purpose, characterized by first-principles mechanistic analysis, strict data quality evaluation, contextually rigorous performance testing and, finally, delivery of a quantitative recommendation schedule to simultaneously improve in vitro hit rates and in silico model learning potential.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"16 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}