{"title":"Correction to \"Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set\".","authors":"Issar Arab, Kris Laukens, Wout Bittremieux","doi":"10.1021/acs.jcim.4c02123","DOIUrl":"10.1021/acs.jcim.4c02123","url":null,"abstract":"","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9649-9650"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142724529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Shahid Malik, Yan-Yun Chang, Yu-Chen Liu, Van The Le, Yu-Yen Ou
{"title":"MCNN_MC: Computational Prediction of Mitochondrial Carriers and Investigation of Bongkrekic Acid Toxicity Using Protein Language Models and Convolutional Neural Networks.","authors":"Muhammad Shahid Malik, Yan-Yun Chang, Yu-Chen Liu, Van The Le, Yu-Yen Ou","doi":"10.1021/acs.jcim.4c00961","DOIUrl":"10.1021/acs.jcim.4c00961","url":null,"abstract":"<p><p>Mitochondrial carriers (MCs) are essential proteins that transport metabolites across mitochondrial membranes and play a critical role in cellular metabolism. ADP/ATP (adenosine diphosphate/adenosine triphosphate) is one of the most important carriers as it contributes to cellular energy production and is susceptible to the powerful toxin bongkrekic acid. This toxin has claimed several lives; for example, a recent foodborne outbreak in Taipei, Taiwan, has caused four deaths and sickened 30 people. The issue of bongkrekic acid poisoning has been a long-standing problem in Indonesia, with reports as early as 1895 detailing numerous deaths from contaminated coconut fermented cakes. In bioinformatics, significant advances have been made in understanding biological processes through computational methods; however, no established computational method has been developed for identifying mitochondrial carriers. We propose a computational bioinformatics approach for predicting MCs from a broader class of secondary active transporters with a focus on the ADP/ATP carrier and its interaction with bongkrekic acid. The proposed model combines protein language models (PLMs) with multiwindow scanning convolutional neural networks (mCNNs). While PLM embeddings capture contextual information within proteins, mCNN scans multiple windows to identify potential binding sites and extract local features. Our results show 96.66% sensitivity, 95.76% specificity, 96.12% accuracy, 91.83% Matthews correlation coefficient (MCC), 94.63% F1-Score, and 98.55% area under the curve (AUC). The results demonstrate the effectiveness of the proposed approach in predicting MCs and elucidating their functions, particularly in the context of bongkrekic acid toxicity. This study presents a valuable approach for identifying novel mitochondrial complexes, characterizing their functional roles, and understanding mitochondrial toxicology mechanisms. Our findings, that utilize computational methods to improve our understanding of cellular processes and drug-target interactions, contribute to the development of therapeutic strategies for mitochondrial disorders, reducing the devastating effects of bongkrekic acid poisoning.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9125-9134"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141915463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe Wang, Jianping Wu, Mengjun Zheng, Chenchen Geng, Borui Zhen, Wei Zhang, Hui Wu, Zhengyang Xu, Gang Xu, Si Chen, Xiang Li
{"title":"StaPep: An Open-Source Toolkit for Structure Prediction, Feature Extraction, and Rational Design of Hydrocarbon-Stapled Peptides.","authors":"Zhe Wang, Jianping Wu, Mengjun Zheng, Chenchen Geng, Borui Zhen, Wei Zhang, Hui Wu, Zhengyang Xu, Gang Xu, Si Chen, Xiang Li","doi":"10.1021/acs.jcim.4c01718","DOIUrl":"10.1021/acs.jcim.4c01718","url":null,"abstract":"<p><p>All-hydrocarbon stapled peptides, with their covalent side-chain constraints, provide enhanced proteolytic stability and membrane permeability, making them superior to linear peptides. However, tools for extracting structural and physicochemical descriptors to predict the properties of hydrocarbon-stapled peptides are lacking. To address this, we present StaPep, a Python-based toolkit for generating 3D structures and calculating 21 features for hydrocarbon-stapled peptides. StaPep supports peptides containing two non-standard amino acids (norleucine and 2-aminoisobutyric acid) and six non-natural anchoring residues (S3, S5, S8, R3, R5, and R8), with customization options for other non-standard amino acids. We showcase StaPep's utility through three case studies. The first generates 3D structures of these peptides with a mean RMSD of 1.62 ± 0.86, offering essential structural insights for drug design and biological activity prediction. The second develops machine learning models based on calculated molecular features to differentiate between membrane-permeable and non-permeable stapled peptides, achieving an AUC of 0.93. The third constructs regression models to predict the antimicrobial activity of stapled peptides against <i>Escherichia coli</i>, with a Pearson correlation of 0.84. StaPep's pipeline spans data retrieval, structure generation, feature calculation, and machine learning modeling for hydrocarbon-stapled peptides. The source codes and data set are freely available on Github: https://github.com/dahuilangda/stapep_package.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9361-9373"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142580828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Martignac: Computational Workflows for Reproducible, Traceable, and Composable Coarse-Grained Martini Simulations.","authors":"Tristan Bereau, Luis J Walter, Joseph F Rudzinski","doi":"10.1021/acs.jcim.4c01754","DOIUrl":"10.1021/acs.jcim.4c01754","url":null,"abstract":"<p><p>Despite their wide use and far-reaching implications, molecular dynamics (MD) simulations suffer from a lack of both traceability and reproducibility. We introduce Martignac: computational workflows for the coarse-grained (CG) Martini force field. Martignac describes Martini CG MD simulations as an acyclic directed graph, providing the entire history of a simulation─from system preparation to property calculations. Martignac connects to NOMAD, such that all simulation data generated are automatically normalized and stored according to the FAIR principles. We present several prototypical Martini workflows, including system generation of simple liquids and bilayers, as well as free-energy calculations for solute solvation in homogeneous liquids and drug permeation in lipid bilayers. By connecting to the NOMAD database to automatically pull existing simulations and push any new simulation generated, Martignac contributes to improving the sustainability and reproducibility of molecular simulations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9413-9423"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142764601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of Recyclable Plastics with Machine Learning and Genetic Algorithm.","authors":"Chureh Atasi, Joseph Kern, Rampi Ramprasad","doi":"10.1021/acs.jcim.4c01530","DOIUrl":"10.1021/acs.jcim.4c01530","url":null,"abstract":"<p><p>We present an artificial intelligence-guided approach to design durable and chemically recyclable ring-opening polymerization (ROP) class polymers. This approach employs a genetic algorithm (GA) that designs new monomers and then utilizes virtual forward synthesis (VFS) to generate almost a million ROP polymers. Machine learning models to predict thermal, thermodynamic, and mechanical properties─crucial for application-specific performance and recyclability─are used to guide the GA toward optimal polymers. We present potential substitute polymers for polystyrene (PS) that achieve all property targets with low estimated synthetic complexity.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9249-9259"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142764649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Kravberg, Didier Devaurs, Anastasiia Varava, Lydia E Kavraki, Danica Kragic
{"title":"MoleQCage: Geometric High-Throughput Screening for Molecular Caging Prediction.","authors":"Alexander Kravberg, Didier Devaurs, Anastasiia Varava, Lydia E Kavraki, Danica Kragic","doi":"10.1021/acs.jcim.4c01419","DOIUrl":"10.1021/acs.jcim.4c01419","url":null,"abstract":"<p><p>Although being able to determine whether a host molecule can enclose a guest molecule and form a caging complex could benefit numerous chemical and medical applications, the experimental discovery of molecular caging complexes has not yet been achieved at scale. Here, we propose MoleQCage, a simple tool for the high-throughput screening of host and guest candidates based on an efficient robotics-inspired geometric algorithm for molecular caging prediction, providing theoretical guarantees and robustness assessment. MoleQCage is distributed as Linux-based software with a graphical user interface and is available online at https://hub.docker.com/r/dantrigne/moleqcage in the form of a Docker container. Documentation and examples are available as Supporting Information and online at https://hub.docker.com/r/dantrigne/moleqcage.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9034-9039"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142811435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced AI-Driven Prediction of Pregnancy-Related Adverse Drug Reactions.","authors":"Jinfu Peng, Li Fu, Guoping Yang, Dongshen Cao","doi":"10.1021/acs.jcim.4c01657","DOIUrl":"10.1021/acs.jcim.4c01657","url":null,"abstract":"<p><p>Ensuring drug safety during pregnancy is critical due to the potential risks to both the mother and fetus. However, the exclusion of pregnant women from clinical trials complicates the assessment of adverse drug reactions (ADRs) in this population. This study aimed to develop and validate risk prediction models for pregnancy-related ADRs of drugs using advanced Machine Learning (ML) and Deep Learning (DL) techniques, leveraging real-world data from the FDA Adverse Event Reporting System. We explored three methods─Information Component, Reporting Odds Ratio, and 95% confidence interval of ROR─for classifying drugs into high-risk and low-risk categories. DL models, including Directed Message Passing Neural Networks (DMPNN), Graph Neural Networks, and Graph Convolutional Networks, were developed and compared to traditional ML models like Random Forest, Support Vector Machines, and XGBoost. Among these, the DMPNN model, which integrated molecular graph information and molecular descriptors, exhibited the highest predictive performance, particularly at the preferred term level. The model was validated against external data sets from SIDER and DailyMed, demonstrating strong generalizability. Additionally, the model was applied to assess the risk of 22 oral hypoglycemic drugs, and potential substructure alerts for pregnancy-related ADRs were identified. These findings suggest that the DMPNN model is a valuable tool for predicting ADRs in pregnant women, offering significant advancement in drug safety assessment and providing crucial insights for safer medication use during pregnancy.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9286-9298"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142749488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Li, Oufan Zhang, Kunyang Sun, Yingze Wang, Xingyi Guan, Dorian Bagni, Mojtaba Haghighatlari, Fiona L Kearns, Conor Parks, Rommie E Amaro, Teresa Head-Gordon
{"title":"Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design.","authors":"Jie Li, Oufan Zhang, Kunyang Sun, Yingze Wang, Xingyi Guan, Dorian Bagni, Mojtaba Haghighatlari, Fiona L Kearns, Conor Parks, Rommie E Amaro, Teresa Head-Gordon","doi":"10.1021/acs.jcim.4c00634","DOIUrl":"10.1021/acs.jcim.4c00634","url":null,"abstract":"<p><p>Determining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates novel inhibitor molecules for target proteins by combining deep reinforcement learning with real-time 3D molecular docking using AutoDock Vina, thereby simultaneously creating chemical novelty while constraining molecules for shape and molecular compatibility with target active sites. Moreover, through the use of various types of reward functions, we have introduced novelty in generative tasks for new molecules such as chemical similarity to a target ligand, molecules grown from known protein bound fragments, and creation of molecules that enforce interactions with target residues in the protein active site. The iMiner algorithm is embedded in a composite workflow that filters out Pan-assay interference compounds, Lipinski rule violations, uncommon structures in medicinal chemistry, and poor synthetic accessibility with options for cross-validation against other docking scoring functions and automation of a molecular dynamics simulation to measure pose stability. We also allow users to define a set of rules for the structures they would like to exclude during the training process and postfiltering steps. Because our approach relies only on the structure of the target protein, iMiner can be easily adapted for the future development of other inhibitors or small molecule therapeutics of any target protein.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9082-9097"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141282331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HPTRMF: Collaborative Matrix Factorization-Based Prediction Method for LncRNA-Disease Associations Using High-Order Perturbation and Flexible Trifactor Regularization.","authors":"Guobo Xie, Dayin Li, Zhiyi Lin, Guosheng Gu, Weijun Li, Ruibin Chen, Zhenguo Liu","doi":"10.1021/acs.jcim.4c01070","DOIUrl":"10.1021/acs.jcim.4c01070","url":null,"abstract":"<p><p>Existing matrix factorization methods face challenges, including the cold start problem and global nonlinear data loss during similarity learning, particularly in predicting associations between long noncoding RNAs (LncRNAs) and diseases. To overcome these issues, we introduce HPTRMF, a matrix factorization approach incorporating high-order perturbation and flexible trifactor regularization. HPTRMF constructs a high-order correlation matrix utilizing the known association matrix, leveraging high-order perturbation to effectively address the cold start problem caused by data sparsity. Additionally, HPTRMF incorporates a flexible trifactor regularization term to capture similarity information on LncRNAs and diseases, enabling the effective handling of global nonlinear data loss by capturing such data in the similarity matrix. Experimental results demonstrate the superiority of HPTRMF over nine state-of-the-art algorithms in Leave-One-Out Cross-Validation (LOOCV) and Five-Fold Cross-Validation (5-Fold CV) on three data sets.HPTRMF and data sets are available in https://github.com/Llvvvv/HPTRMF.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9594-9608"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141764606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Assessment of Water Models in Protein-Glycan Interaction: Insights from Alchemical Free Energy Calculations and Molecular Dynamics Simulations.","authors":"Deng Li, Mona S Minkara","doi":"10.1021/acs.jcim.4c01361","DOIUrl":"10.1021/acs.jcim.4c01361","url":null,"abstract":"<p><p>Accurate computational simulations of protein-glycan dynamics are crucial for a comprehensive understanding of critical biological mechanisms, including host-pathogen interactions, immune system defenses, and intercellular communication. The accuracy of these simulations, including molecular dynamics (MD) simulation and alchemical free energy calculations, critically relies on the appropriate parameters, including the water model, because of the extensive hydrogen bonding with glycan hydroxyl groups. However, a systematic evaluation of water models' accuracy in simulating protein-glycan interaction at the molecular level is still lacking. In this study, we used full atomistic MD simulations and alchemical absolute binding free energy (ABFE) calculations to investigate the performance of five distinct water models in six protein-glycan complex systems. We evaluated water models' impact on structural dynamics and binding affinity through over 5.8 μs of simulation time per system. Our results reveal that most protein-glycan complexes are stable in the overall structural dynamics regardless of the water model used, while some show obvious fluctuations with specific water models. More importantly, we discover that the stability of the binding motif's conformation is dependent on the water model chosen when its residues form weak hydrogen bonds with the glycan. The water model also influences the conformational stability of the glycan in its bound state according to density functional theory (DFT) calculations. Using alchemical ABFE calculations, we find that the OPC water model exhibits exceptional consistency with experimental binding affinity data, whereas commonly used models such as TIP3P are less accurate. The findings demonstrate how different water models affect protein-glycan interactions and the accuracy of binding affinity calculations, which is crucial in developing therapeutic strategies targeting these interactions.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9459-9473"},"PeriodicalIF":5.6,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142386397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}