{"title":"","authors":"Xiaoqing Ru, Chao Zha and Xin Gao*, ","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 14","pages":"XXX-XXX XXX-XXX"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c00979","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mossa Ghattas, Prerna Gera, Steven Ramsey, Anthony Cruz-Balberdy, Nathan Abraham, Vjay Molino, Daniel McKay, Tom Kurtzman
{"title":"A Self-Consistent Approach to Rotamer and Protonation State Assignments (RAPA): Moving Beyond Single Protein Configurations.","authors":"Mossa Ghattas, Prerna Gera, Steven Ramsey, Anthony Cruz-Balberdy, Nathan Abraham, Vjay Molino, Daniel McKay, Tom Kurtzman","doi":"10.1021/acs.jcim.5c00859","DOIUrl":"10.1021/acs.jcim.5c00859","url":null,"abstract":"<p><p>There are currently over 160,000 protein crystal structures obtained by X-ray diffraction with resolutions of 1.5 Å or greater in the Protein Data Bank. At these resolutions hydrogen atoms do not resolve and heavy atoms such as oxygen, carbon, and nitrogen are indistinguishable. This leads to ambiguity in the rotamer and protonation states of multiple amino acids, notably asparagine, glutamine, histidine, serine, tyrosine, and threonine. When the rotamer and protonation states of these residues change, so too does the electrochemical surface of a binding site. A variety of computational approaches have been developed to assign states for these residues by investigating all possibilities and typically deciding on a single rotamer or protonation state for each residue that is consistent with the crystal structure. Here, we posit that there are multiple rotamer and protonation states that are consistent with the resolved structure of the proteins and introduce a Rotamer and Protonation Assignment (RAPA) protocol which analyzes local hydrogen-bonding environments in the resolved structures of proteins and identifies a set of unique rotamer and protonation states that are energetically consistent with the experimentally reported crystal structure. We evaluate the RAPA-predicted configurations in molecular dynamics simulations and find that there are multiple configurations for each protein that maintain structures consistent with the X-ray results. In our initial evaluations of the RAPA protocol, we find that for most proteins (69/77) there are multiple energetically accessible rotamer and protonation state configurations however the total number is limited to 8 or fewer for most of the proteins (62 of 77). This suggests that there is no combinatorial explosion in the number of energetically accessible rotamer and protonation states for most proteins and investigating all such states is computationally feasible.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"7639-7650"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Machine Learning Potentials through Transfer Learning across Chemical Elements.","authors":"Sebastien Röcken, Julija Zavadlav","doi":"10.1021/acs.jcim.5c00293","DOIUrl":"10.1021/acs.jcim.5c00293","url":null,"abstract":"<p><p>Machine learning potentials (MLPs) can enable simulations of ab initio accuracy at orders of magnitude lower computational cost. However, their effectiveness hinges on the availability of considerable data sets to ensure robust generalization across chemical space and thermodynamic conditions. The generation of such data sets can be labor-intensive, highlighting the need for innovative methods to train MLPs in data-scarce scenarios. Here, we introduce transfer learning of potential energy surfaces between chemically similar elements. Specifically, we leverage the trained MLP for silicon to initialize and expedite the training of an MLP for germanium. Utilizing classical force field and ab initio data sets, we demonstrate that transfer learning surpasses traditional training from scratch in force prediction, leading to more stable simulations and improved temperature transferability. These advantages become even more pronounced as the training data set size decreases. We also observe positive transfer learning effects for most out-of-target properties. Our findings demonstrate that transfer learning across chemical elements is a promising technique for developing accurate and numerically stable MLPs, particularly in a data-scarce regime.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"7406-7414"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144574457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang Sun, Yuxin Shen, Min Xu, Wang Qiao, Tianze Shang, Qikai Yin, Yuxiang Wang* and Baoju Zhang*,
{"title":"","authors":"Chang Sun, Yuxin Shen, Min Xu, Wang Qiao, Tianze Shang, Qikai Yin, Yuxiang Wang* and Baoju Zhang*, ","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 14","pages":"XXX-XXX XXX-XXX"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Xie, Ke Zuo*, Silvia De Rubeis, Giorgio Bonollo, Giorgio Colombo, Paolo Ruggerone* and Paolo Carloni*,
{"title":"","authors":"Song Xie, Ke Zuo*, Silvia De Rubeis, Giorgio Bonollo, Giorgio Colombo, Paolo Ruggerone* and Paolo Carloni*, ","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 14","pages":"XXX-XXX XXX-XXX"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01162","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanling Wang, Na Li*, Xiao Wang, Feng Cao, Shuwen Xiong* and Leyi Wei*,
{"title":"","authors":"Yanling Wang, Na Li*, Xiao Wang, Feng Cao, Shuwen Xiong* and Leyi Wei*, ","doi":"","DOIUrl":"","url":null,"abstract":"","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 14","pages":"XXX-XXX XXX-XXX"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01073","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diff-SE: A Diffusion-Augmented Contrastive Learning Framework for Super-Enhancer Prediction.","authors":"Haolu Zhou, Yu Han, Yude Bai, Yun Zuo, Wenying He, Fei Guo","doi":"10.1021/acs.jcim.5c01005","DOIUrl":"10.1021/acs.jcim.5c01005","url":null,"abstract":"<p><p>Super-enhancers (SEs) are cis-regulatory elements that play crucial roles in gene expression and are implicated in diseases such as cancer and Alzheimer's. Traditional identification methods rely on ChIP-seq experiments, which are costly and time-consuming. While recent computational approaches have leveraged sequence features for SE prediction, they often suffer from severe class imbalance and poor generalization across species. To address these limitations, we propose Diff-SE, a deep learning framework that integrates diffusion-based data augmentation with contrastive learning. The diffusion module models the continuous distribution of SEs to generate biologically meaningful synthetic positive samples, effectively balancing training data. A contrastive learning strategy is then used to enhance feature representation by maximizing intraclass similarity and interclass separation. Experimental results across eight data sets demonstrate that Diff-SE consistently outperforms the baseline model, achieving 10%-30% improvements in precision (PRE), Matthews correlation coefficient (MCC), and <i>F</i>1-score. Furthermore, Diff-SE exhibits superior generalization in cross-species validation between human and mouse cell lines. The code and data sets are available at https://github.com/15831959673/Diff-SE, enabling further research and applications in SE prediction.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"7789-7799"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144558444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating Neighboring Protein Features for Enhanced Drug-Target Interaction Prediction: A Comparative Analysis of Similarity-Based Alignment Methods.","authors":"Xiaoqing Ru, Chao Zha, Xin Gao","doi":"10.1021/acs.jcim.5c00979","DOIUrl":"10.1021/acs.jcim.5c00979","url":null,"abstract":"<p><p>Drug-target interaction (DTI) prediction is a fundamental computational task in drug discovery. Despite recent advancements, existing approaches often suffer from data sparsity and fail to capture the intricate nature of molecular interactions, limiting predictive performance. To address these challenges, we propose a novel DTI prediction framework that enhances both accuracy and interpretability by incorporating features from highly similar protein neighbors. Our framework extracts chemical and physicochemical features from drug-target binding affinity data and integrates interaction features from highly similar protein neighbors to enrich representation. To identify these neighbors, we employ a range of protein similarity alignment algorithms, including BLAST, MUSCLE, MAFFT, Clustal Omega and Foldseek. Experiments on the Davis and KIBA data sets demonstrate that incorporating features from high-similarity neighbors substantially improves prediction accuracy. Further analysis reveals that top-ranked neighbors contribute the most to performance gains, underscoring the importance of similarity-based feature augmentation. Additionally, comparisons among alignment methods highlight their robustness in neighbor selection, and case studies confirm the biological relevance of shared targets among closely related proteins. Overall, our framework presents a novel solution to data sparsity, improves predictive performance, and enhances model interpretability. This work lays a solid foundation for precise DTI prediction and provides valuable insights for advancing computational methods in drug discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"7701-7711"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144558446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tibor Viktor Szalai, Nikolett Péczka, Levente Sipos-Szabó, László Petri, Dávid Bajusz, György M Keserű
{"title":"Ultrahigh-Throughput Virtual Screening Strategies against PPI Targets: A Case Study of STAT Inhibitors.","authors":"Tibor Viktor Szalai, Nikolett Péczka, Levente Sipos-Szabó, László Petri, Dávid Bajusz, György M Keserű","doi":"10.1021/acs.jcim.5c00907","DOIUrl":"10.1021/acs.jcim.5c00907","url":null,"abstract":"<p><p>In recent years, virtual screening of ultralarge (10<sup>8+</sup>) libraries of synthetically accessible compounds (uHTVS) became a popular approach in hit identification. With AI-assisted virtual screening workflows, such as Deep Docking, these protocols might be feasible even without supercomputers. Yet, these methodologies have their own conceptual limitations, including the fact that physics-based docking is replaced by a cheaper deep learning (DL) step for the vast majority of compounds. In turn, the performance of this DL step will highly depend on the performance of the underlying docking model that is used to evaluate parts of the whole data set to train the DL architecture itself. Here, we evaluated the performance of the popular Deep Docking workflow on compound libraries of different sizes, against benchmark cases of classic brute-force docking approaches conducted on smaller libraries. We were especially interested in more difficult, protein-protein interaction-type oncotargets where the reliability of the underlying docking model is harder to assess. Specifically, our virtual screens have resulted in several new inhibitors of two oncogenic transcription factors, STAT3 and STAT5b. For STAT5b, in particular, we disclose the first application of virtual screening against its N-terminal domain, whose importance was recognized more recently. While the AI-based uHTVS is computationally more demanding, it can achieve exceptionally good hit rates (50.0% for STAT3). Deep Docking can also work well with a compound library containing only several million (instead of several billion) compounds, achieving a 42.9% hit rate against the SH2 domain of STAT5b, while presenting a highly economic workflow with just under 120,000 compounds actually docked.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"7734-7748"},"PeriodicalIF":5.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144558447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}