{"title":"Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications.","authors":"Dingyun Huang, Jacqueline M Cole","doi":"10.1021/acs.jcim.4c02029","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02029","url":null,"abstract":"<p><p>Pretrained language models have demonstrated strong capability and versatility in natural language processing (NLP) tasks, and they have important applications in optoelectronics research, such as data mining and topic modeling. Many language models have also been developed for other scientific domains, among which Bidirectional Encoder Representations from Transformers (BERT) is one of the most widely used architectures. We present three \"optoelectronics-aware\" BERT models, OE-BERT, OE-ALBERT, and OE-RoBERTa, that outperform both their counterpart general English models and larger models in a variety of NLP tasks about optoelectronics. Our work also demonstrates the efficacy of a cost-effective domain-adaptive pretraining (DAPT) method with RoBERTa, which significantly reduces computational resource requirements by more than 80% for its pretraining while maintaining or enhancing its performance. All models and data sets are available to the optoelectronics-research community.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143397571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valerij Talagayev, Yu Chen, Niklas Piet Doering, Leon Obendorf, Katrin Denzinger, Kristina Puls, Kevin Lam, Sijie Liu, Clemens Alexander Wolf, Theresa Noonan, Marko Breznik, Petra Knaus and Gerhard Wolber*,
{"title":"OpenMMDL - Simplifying the Complex: Building, Simulating, and Analyzing Protein–Ligand Systems in OpenMM","authors":"Valerij Talagayev, Yu Chen, Niklas Piet Doering, Leon Obendorf, Katrin Denzinger, Kristina Puls, Kevin Lam, Sijie Liu, Clemens Alexander Wolf, Theresa Noonan, Marko Breznik, Petra Knaus and Gerhard Wolber*, ","doi":"10.1021/acs.jcim.4c0215810.1021/acs.jcim.4c02158","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02158https://doi.org/10.1021/acs.jcim.4c02158","url":null,"abstract":"<p >Molecular dynamics (MD) simulations have become an essential tool for studying the dynamics of biological systems and exploring protein–ligand interactions. <i>OpenMM</i> is a modern, open-source software toolkit designed for MD simulations. Until now, it has lacked a module dedicated to building receptor–ligand systems, which is highly useful for investigating protein–ligand interactions for drug discovery. We therefore introduce <i>OpenMMDL</i>, an open-source toolkit that enables the preparation and simulation of protein–ligand complexes in <i>OpenMM</i>, along with the subsequent analysis of protein–ligand interactions. <i>OpenMMDL</i> consists of three main components: <i>OpenMMDL Setup</i>, a graphical user interface based on Python <i>Flask</i> to prepare protein and simulation settings, <i>OpenMMDL Simulation</i> to perform MD simulations with consecutive trajectory postprocessing, and finally <i>OpenMMDL Analysis</i> to analyze simulation results with respect to ligand binding. <i>OpenMMDL</i> is not only a versatile tool for analyzing protein–ligand interactions and generating ligand binding modes throughout simulations; it also tracks and clusters water molecules, particularly those exhibiting minimal displacement from their previous coordinates, providing insights into solvent dynamics. We applied <i>OpenMMDL</i> to study ligand–receptor interactions across diverse biological systems, including LDN-193189 and LDN-212854 with ALK2 (kinases), nifedipine and amlodipine in Ca<sub><i>v</i></sub>1.1 (ion channels), LSD in 5-HT<sub>2B</sub> (G-protein coupled receptors), letrozole in CYP19A1 (cytochrome P450 oxygenases), flavin mononucleotide binding the FMN-riboswitch (RNAs), ligand C08 bound to TLR8 (toll-like receptor), and PZM21 bound to MOR (opioid receptor), highlighting distinct functionalities of <i>OpenMMDL</i>. <i>OpenMMDL</i> is publicly available at https://github.com/wolberlab/OpenMMDL.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 4","pages":"1967–1978 1967–1978"},"PeriodicalIF":5.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c02158","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143473769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David A Cooper, Joseph DePaolo-Boisvert, Stanley A Nicholson, Barien Gad, David D L Minh
{"title":"Intracellular Pocket Conformations Determine Signaling Efficacy through the μ Opioid Receptor.","authors":"David A Cooper, Joseph DePaolo-Boisvert, Stanley A Nicholson, Barien Gad, David D L Minh","doi":"10.1021/acs.jcim.4c01437","DOIUrl":"10.1021/acs.jcim.4c01437","url":null,"abstract":"<p><p>It has been challenging to determine how a ligand that binds to a receptor activates downstream signaling pathways and to predict the strength of signaling. The challenge is compounded by functional selectivity, in which a single ligand binding to a single receptor can activate multiple signaling pathways at different levels. Spectroscopic studies show that in the largest class of cell surface receptors, 7 transmembrane receptors (7TMRs), activation is associated with ligand-induced shifts in the equilibria of intracellular pocket conformations in the absence of transducer proteins. We hypothesized that signaling through the μ opioid receptor, a prototypical 7TMR, is linearly proportional to the equilibrium probability of observing intracellular pocket conformations in the receptor-ligand complex. Here, we show that a machine learning model based on this hypothesis accurately calculates the efficacy of both G protein and β-arrestin-2 signaling. Structural features that the model associates with activation are intracellular pocket expansion, toggle switch rotation, and sodium binding pocket collapse. Distinct pathways are activated by different arrangements of the ligand and sodium binding pockets and the intracellular pocket. While recent work has categorized ligands as active or inactive (or partially active) based on binding affinities to two conformations, our approach accurately computes signaling efficacy along multiple pathways.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1465-1475"},"PeriodicalIF":5.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11817682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142996313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mariia Savenko, Robert Vácha, Christophe Ramseyer, Timothée Rivel
{"title":"Role of Divalent Ions in Membrane Models of Polymyxin-Sensitive and Resistant Gram-Negative Bacteria.","authors":"Mariia Savenko, Robert Vácha, Christophe Ramseyer, Timothée Rivel","doi":"10.1021/acs.jcim.4c01574","DOIUrl":"10.1021/acs.jcim.4c01574","url":null,"abstract":"<p><p>Polymyxins, critical last-resort antibiotics, impact the distribution of membrane-bound divalent cations in the outer membrane of Gram-negative bacteria. We employed atomistic molecular dynamics simulations to model the effect of displacing these ions. Two polymyxin-sensitive and two polymyxin-resistant models of the outer membrane of <i>Salmonella enterica</i> were investigated. First, we found that the removal of all calcium ions induces global stress on the model membranes, leading to substantial membrane restructuring. Next, we used enhanced sampling methods to explore the effects of localized stress by displacing membrane-bound ions. Our findings indicate that creating defects in the membrane-bound ion network facilitates polymyxin permeation. Additionally, our study of polymyxin-resistant mutations revealed that divalent ions in resistant model membranes are less likely to be displaced, potentially contributing to the increased resistance associated with these mutations. Lastly, we compared results from all-atom molecular dynamics simulations with coarse-grained simulations, demonstrating that the choice of force field significantly influences the behavior of membrane-bound ions under stress.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1476-1491"},"PeriodicalIF":5.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11815837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142996320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Astrid F Brandner, Iain P S Smith, Siewert J Marrink, Paulo C T Souza, Syma Khalid
{"title":"Systematic Approach to Parametrization of Disaccharides for the Martini 3 Coarse-Grained Force Field.","authors":"Astrid F Brandner, Iain P S Smith, Siewert J Marrink, Paulo C T Souza, Syma Khalid","doi":"10.1021/acs.jcim.4c01874","DOIUrl":"10.1021/acs.jcim.4c01874","url":null,"abstract":"<p><p>Sugars are ubiquitous in biology; they occur in all kingdoms of life. Despite their prevalence, they have often been somewhat neglected in studies of structure-dynamics-function relationships of macromolecules to which they are attached, with the exception of nucleic acids. This is largely due to the inherent difficulties of not only studying the conformational dynamics of sugars using experimental methods but indeed also resolving their static structures. Molecular dynamics (MD) simulations offer a route to the prediction of conformational ensembles and the time-dependent behavior of sugars and glycosylated macromolecules. However, at the all-atom level of detail, MD simulations are often too computationally demanding to allow a systematic investigation of molecular interactions in systems of interest. To overcome this, large scale simulations of complex biological systems have profited from advances in coarse-grained (CG) simulations. Perhaps the most widely used CG force field for biomolecular simulations is Martini. Here, we present a parameter set for glucose- and mannose-based disaccharides for Martini 3. The generation of the CG parameters from atomistic trajectories is automated as fully as possible, and where not possible, we provide details of the protocol used for manual intervention.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1537-1548"},"PeriodicalIF":5.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11815824/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142996324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elizaveta Mukhaleva, Babgen Manookian, Hanyu Chen, Indira R Sivaraj, Ning Ma, Wenyuan Wei, Konstancja Urbaniak, Grigoriy Gogoshin, Supriyo Bhattacharya, Nagarajan Vaidehi, Andrei S Rodin, Sergio Branciamore
{"title":"BaNDyT: Bayesian Network Modeling of Molecular Dynamics Trajectories.","authors":"Elizaveta Mukhaleva, Babgen Manookian, Hanyu Chen, Indira R Sivaraj, Ning Ma, Wenyuan Wei, Konstancja Urbaniak, Grigoriy Gogoshin, Supriyo Bhattacharya, Nagarajan Vaidehi, Andrei S Rodin, Sergio Branciamore","doi":"10.1021/acs.jcim.4c01981","DOIUrl":"10.1021/acs.jcim.4c01981","url":null,"abstract":"<p><p>Bayesian network modeling (BN modeling, or BNM) is an interpretable machine learning method for constructing probabilistic graphical models from the data. In recent years, it has been extensively applied to diverse types of biomedical data sets. Concurrently, our ability to perform long-time scale molecular dynamics (MD) simulations on proteins and other materials has increased exponentially. However, the analysis of MD simulation trajectories has not been data-driven but rather dependent on the user's prior knowledge of the systems, thus limiting the scope and utility of the MD simulations. Recently, we pioneered using BNM for analyzing the MD trajectories of protein complexes. The resulting BN models yield novel fully data-driven insights into the functional importance of the amino acid residues that modulate proteins' function. In this report, we describe the BaNDyT software package that implements the BNM specifically attuned to the MD simulation trajectories data. We believe that BaNDyT is the first software package to include specialized and advanced features for analyzing MD simulation trajectories using a probabilistic graphical network model. We describe here the software's uses, the methods associated with it, and a comprehensive Python interface to the underlying generalist BNM code. This provides a powerful and versatile mechanism for users to control the workflow. As an application example, we have utilized this methodology and associated software to study how membrane proteins, specifically the G protein-coupled receptors, selectively couple to G proteins. The software can be used for analyzing MD trajectories of any protein as well as polymeric materials.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1278-1288"},"PeriodicalIF":5.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143021307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PRA-MutPred: Predicting the Effect of Point Mutations in Protein-RNA Complexes Using Structural Features.","authors":"K Harini, M Sekijima, M Michael Gromiha","doi":"10.1021/acs.jcim.4c01452","DOIUrl":"10.1021/acs.jcim.4c01452","url":null,"abstract":"<p><p>Interactions between proteins and RNAs are essential for the proper functioning of cells, and mutations in these molecules may lead to diseases. These protein mutations alter the strength of interactions between the protein and RNA, generally described as binding affinity (Δ<i>G</i>). Hence, the affinity change upon mutation (ΔΔ<i>G</i>) is an important parameter for understanding the effect of mutations in protein-RNA complexes. In this work, we developed a machine-learning model to predict ΔΔ<i>G</i> values upon mutations in protein-RNA complexes. We collected experimentally determined ΔΔ<i>G</i> values of 710 mutations in 134 protein-RNA complexes. Diverse sequence and structural features were generated from both wild-type and modeled mutant complexes, which include conservation scores, residue-based, network-based, and interface features. Further, we developed a support vector regressor model with a correlation of 0.75 and a mean absolute error of 0.84 kcal/mol in the jack-knife test. We observed that the performance of the model is dictated by structural features, such as contact potentials, atom contacts in the interface of protein-RNA complexes, and the solvent accessibility of the mutated residue. We also developed a Web server, PRA-MutPred, predicting the protein-RNA binding affinity change upon mutation, which is available in the link https://web.iitm.ac.in/bioinfo2/pramutpred/.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1605-1614"},"PeriodicalIF":5.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143021311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Flow Network Framework for Chemistry Tasks.","authors":"Nianze Tao, Minori Abe","doi":"10.1021/acs.jcim.4c01792","DOIUrl":"10.1021/acs.jcim.4c01792","url":null,"abstract":"<p><p>In this work, we introduce ChemBFN, a language model that handles chemistry tasks based on Bayesian flow networks working with discrete data. A new accuracy schedule is proposed to improve sampling quality by significantly reducing reconstruction loss. We show evidence that our method is appropriate for generating molecules with satisfied diversity, even when a smaller number of sampling steps is used. A classifier-free guidance method is adapted for conditional generation. It is also worthwhile to point out that after generative training, our model can be fine-tuned on regression and classification tasks with state-of-the-art performance, which opens the gate of building all-in-one models in a single module style. Our model has been open sourced at https://github.com/Augus1999/bayesian-flow-network-for-chemistry.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1178-1187"},"PeriodicalIF":5.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142996255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raquel López-Ríos de Castro, Alejandro Santana-Bonilla*, Robert M. Ziolek and Christian D. Lorenz*,
{"title":"Automated Analysis of Soft Matter Interfaces, Interactions, and Self-Assembly with PySoftK","authors":"Raquel López-Ríos de Castro, Alejandro Santana-Bonilla*, Robert M. Ziolek and Christian D. Lorenz*, ","doi":"10.1021/acs.jcim.4c0184910.1021/acs.jcim.4c01849","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01849https://doi.org/10.1021/acs.jcim.4c01849","url":null,"abstract":"<p >Molecular dynamics simulations have become essential tools in the study of soft matter and biological macromolecules. The large amount of high-dimensional data associated with such simulations does not straightforwardly elucidate the atomistic mechanisms that underlie complex materials and molecular processes. Analysis of these simulations is complicated: the dynamics intrinsic to soft matter simulations necessitates careful application of specific, and often complex, algorithms to extract meaningful molecular scale understanding. There is an ongoing need for high-quality automated computational workflows to facilitate this analysis in a reproducible manner with minimal user input. In this work, we introduce a series of molecular simulation analysis tools for investigating interfaces, molecular interactions (including ring–ring stacking), and self-assembly. In addition, we include a number of auxiliary tools, including a useful function to unwrap molecular structures that are greater than half the length of their corresponding simulation box. These tools are contained in the PySoftK software package, making the application of these algorithms straightforward for the user. These new simulation analysis tools within PySoftK will support high-quality, reproducible analysis of soft matter and biomolecular simulations to bring about new predictive understanding in nano- and biotechnology.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 4","pages":"1679–1684 1679–1684"},"PeriodicalIF":5.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c01849","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143473670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Golbarg Gazerani, Lesley R Piercey, Syeda Reema, Katie A Wilson
{"title":"Examining the Biophysical Properties of the Inner Membrane of Gram-Negative ESKAPE Pathogens.","authors":"Golbarg Gazerani, Lesley R Piercey, Syeda Reema, Katie A Wilson","doi":"10.1021/acs.jcim.4c01457","DOIUrl":"10.1021/acs.jcim.4c01457","url":null,"abstract":"<p><p>The World Health Organization has identified multidrug-resistant bacteria as a serious global health threat. Gram-negative bacteria are particularly prone to antibiotic resistance, and their high rate of antibiotic resistance has been suggested to be related to the complex structure of their cell membrane. The outer membrane of Gram-negative bacteria contains lipopolysaccharides that protect the bacteria against threats such as antibiotics, while the inner membrane houses 20-30% of the bacterial cellular proteins. Given the cell membrane's critical role in bacterial survival, antibiotics targeting the cell membrane have been proposed to combat bacterial infections. However, a deeper understanding of the biophysical properties of the bacterial cell membrane is crucial to developing effective and specific antibiotics. In this study, Martini coarse-grain molecular dynamics simulations were used to investigate the interplay between membrane composition and biophysical properties of the inner membrane across four pathogenic bacterial species: <i>Klebsiella pneumoniae</i>, <i>Pseudomonas aeruginosa</i>, <i>Enterobacter cloacae</i>, and <i>Escherichia coli</i>. The simulations indicate the impact of species-specific membrane composition on the overall membrane properties. Specifically, the cardiolipin concentration in the inner membrane is a key factor influencing the membrane features. Model membranes with varying concentrations of bacterial lipids (phosphatidylglycerol, phosphatidylethanolamine, and cardiolipin) further support the significant role of cardiolipin in determining the membrane biophysical properties. The bacterial inner membrane models developed in this work pave the way for future simulations of bacterial membrane proteins and for simulations investigating novel strategies aimed at disrupting the bacterial membrane to treat antibiotic-resistant infections.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1453-1464"},"PeriodicalIF":5.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143057413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}