Bowen Han, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Mouyang Cheng, Mingda Li and Yongqiang Cheng
{"title":"AI-powered exploration of molecular vibrations, phonons, and spectroscopy","authors":"Bowen Han, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Mouyang Cheng, Mingda Li and Yongqiang Cheng","doi":"10.1039/D4DD00353E","DOIUrl":"https://doi.org/10.1039/D4DD00353E","url":null,"abstract":"<p >The vibrational dynamics of molecules and solids play a critical role in defining material properties, particularly their thermal behaviors. However, theoretical calculations of these dynamics are often computationally intensive, while experimental approaches can be technically complex and resource-demanding. Recent advancements in data-driven artificial intelligence (AI) methodologies have substantially enhanced the efficiency of these studies. This review explores the latest progress in AI-driven methods for investigating atomic vibrations, emphasizing their role in accelerating computations and enabling rapid predictions of lattice dynamics, phonon behaviors, molecular dynamics, and vibrational spectra. Key developments are discussed, including advancements in databases, structural representations, machine-learning interatomic potentials, graph neural networks, and other emerging approaches. Compared to traditional techniques, AI methods exhibit transformative potential, dramatically improving the efficiency and scope of research in materials science. The review concludes by highlighting the promising future of AI-driven innovations in the study of atomic vibrations.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 584-624"},"PeriodicalIF":6.2,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00353e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jordi Buils, Diego Garay-Ruiz, Enric Petrus, Mireia Segado-Centellas and Carles Bo
{"title":"Towards a universal scaling method for predicting equilibrium constants of polyoxometalates†","authors":"Jordi Buils, Diego Garay-Ruiz, Enric Petrus, Mireia Segado-Centellas and Carles Bo","doi":"10.1039/D4DD00358F","DOIUrl":"https://doi.org/10.1039/D4DD00358F","url":null,"abstract":"<p >The computational prediction of equilibrium constants is still an open problem for a wide variety of relevant chemical systems. In particular, acid dissociation constants (p<em>K</em><small><sub>a</sub></small>) are an essential asset in biological, synthetic and industrial chemistry whose prediction encounters several difficulties, requiring the development of novel strategies. The self-assembly of polyoxometalates (POMs) is another complex problem where acid-base reactions play a central role; the successful prediction of the formation constants of these structures is intimately linked with the limitations of p<em>K</em><small><sub>a</sub></small> determination. Our methodology POMSimulator enables the prediction of these polyoxometalate formation constants from Density Functional Theory (DFT) calculations, using the experimental <em>K</em><small><sub>f</sub></small> values available in the literature to fit the resulting predictions. In this work, we carry out a systematic analysis of a very large number of POM formation constants already predicted through the application of POMSimulator. We then propose a universal scaling scheme for the adjustment of the DFT-based formation constants of POMs, relying on a linear scaling of the form <em>y</em> = <em>mx</em> + <em>b</em>. Here, the slope (<em>m</em>) is a constant parameter – hence, universal towards the nature of the polyoxometalate and the calculation method. The intercept (<em>b</em>), in contrast, is a system-dependent parameter that can be predicted with a multi-linear regression model trained with statistical aggregates of the non-scaled formation constants. Thus, we are able to successfully predict the speciation and phase diagrams of POM systems for which available experimental data are minimal, as well as provide a general scaling scheme that might be extended to other kinds of chemical systems.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 970-978"},"PeriodicalIF":6.2,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00358f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative evaluation of anharmonic bond potentials for molecular simulations†","authors":"Paul J. van Maaren and David van der Spoel","doi":"10.1039/D4DD00344F","DOIUrl":"https://doi.org/10.1039/D4DD00344F","url":null,"abstract":"<p >Most general force fields only implement a harmonic potential to model covalent bonds. In addition, in some force fields, all or a selection of the covalent bonds are constrained in molecular dynamics simulations. Nevertheless, it is possible to implement accurate bond potentials for a relatively small computational cost. Such potentials may be important for spectroscopic applications, free energy perturbation calculations or for studying reactions using empirical valence bond theory. Here, we evaluate different bond potentials for diatomic molecules. Based on quantum-chemical scans around the equilibrium distance of 71 molecules using the MP2/aug-cc-pVTZ level of theory as well as CCSD(T) with the same basis-set, we determine the quality of fit to the data of 28 model potentials. As expected, a large spread in accuracies of the potentials is found and more complex potentials generally provide a better fit. As a second and more challenging test, five spectroscopic parameters (<em>ω</em><small><sub>e</sub></small>, <em>ω</em><small><sub>e</sub></small><em>x</em><small><sub>e</sub></small>, <em>α</em><small><sub>e</sub></small>, <em>B</em><small><sub>e</sub></small> and <em>D</em><small><sub>e</sub></small>) predicted based on quantum chemistry as well as the fitted potentials are compared to experimental data. A handful of the 28 potentials tested are found to be accurate. Of these, we suggest that the potential due to Hua (<em>Phys. Rev. A</em>, <strong>42</strong> (1990), 2524) could be a suitable choice for implementation in molecular simulations codes, since it is considerably more accurate than the well-known Morse potential (<em>Phys. Rev.</em>, <strong>34</strong> (1929), 57) at a very similar computational cost.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 824-830"},"PeriodicalIF":6.2,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00344f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel G. Espley, Samuel S. Allsop, David Buttar, Simone Tomasi and Matthew N. Grayson
{"title":"Correction: Distortion/interaction analysis via machine learning","authors":"Samuel G. Espley, Samuel S. Allsop, David Buttar, Simone Tomasi and Matthew N. Grayson","doi":"10.1039/D5DD90005K","DOIUrl":"https://doi.org/10.1039/D5DD90005K","url":null,"abstract":"<p >Correction for ‘Distortion/interaction analysis <em>via</em> machine learning’ by Samuel G. Espley <em>et al.</em>, <em>Digital Discovery</em>, 2024, <strong>3</strong>, 2479–2486, https://doi.org/10.1039/D4DD00224E.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 879-879"},"PeriodicalIF":6.2,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd90005k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renzheng Zhang, Jiaxin Xu, Hanfeng Zhang, Guoyue Xu and Tengfei Luo
{"title":"Active learning-guided exploration of thermally conductive polymers under strain†","authors":"Renzheng Zhang, Jiaxin Xu, Hanfeng Zhang, Guoyue Xu and Tengfei Luo","doi":"10.1039/D4DD00267A","DOIUrl":"https://doi.org/10.1039/D4DD00267A","url":null,"abstract":"<p >Finding amorphous polymers with higher thermal conductivity (TC) is technologically important, as they are ubiquitous in applications where heat transfer is crucial. While TC is generally low in amorphous polymers, it can be enhanced by mechanical strain, which facilitates the alignment of polymer chains. However, using the conventional Edisonian approach, the discovery of polymers that may have high TC after strain can be time-consuming, with no guarantee of success. In this work, we employ an active learning scheme to speed up the discovery of amorphous polymers with high TC under strain. Polymers under 2× strain are simulated using molecular dynamics (MD), and their TCs are calculated using non-equilibrium MD. A Gaussian process gegression (GPR) model is then built using these MD data as the training set. The GPR model is used to screen the PoLyInfo database, and the predicted mean TC and uncertainty are used towards an acquisition function to recommend new polymers for labeling <em>via</em> Bayesian optimization. The TCs of these selected polymers are then labeled using MD simulations, and the obtained data are incorporated to rebuild the GPR model, initiating a new iteration of the active learning cycle. Over a few cycles, we identified ten strained polymers with significantly higher TC (>1 W mK<small><sup>−1</sup></small>) than the original dataset, and the results offer valuable insights into the structural characteristics favorable for achieving high TC of polymers subject to strain.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 812-823"},"PeriodicalIF":6.2,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00267a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing large language models for quantum chemistry simulation input generation†","authors":"Pieter Floris Jacobs and Robert Pollice","doi":"10.1039/D4DD00366G","DOIUrl":"https://doi.org/10.1039/D4DD00366G","url":null,"abstract":"<p >Scientists across domains are often challenged to master domain-specific languages (DSLs) for their research, which are merely a means to an end but are pervasive in fields like computational chemistry. Automated code generation promises to overcome this barrier, allowing researchers to focus on their core expertise. While large language models (LLMs) have shown impressive capabilities in synthesizing code from natural language prompts, they often struggle with DSLs, likely due to their limited exposure during training. In this work, we investigate the potential of foundational LLMs for generating input files for the quantum chemistry package ORCA by establishing a general framework that can be adapted to other DSLs. To improve upon <img> as our base model, we explore the impact of prompt engineering, retrieval-augmented generation, and finetuning <em>via</em> synthetically generated datasets. We find that finetuning, even with synthetic datasets as small as 500 samples, significantly improves performance. Additionally, we observe that finetuning shows synergism with advanced prompt engineering such as chain-of-thought prompting. Consequently, our best finetuned models outperform the formally much more powerful <img> model. In turn, finetuning GPT-4o with the same small synthetic dataset leads to a further substantial performance improvement, suggesting our approach to be more general rather than limited to LLMs with poor base proficiency. All tools and datasets are made openly available for future research. We believe that this research lays the groundwork for a wider adoption of LLMs for DSLs in chemistry and beyond.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 762-775"},"PeriodicalIF":6.2,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00366g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beatrice W. Soh, Aniket Chitre, Shu Zheng Tan, Yuhan Wang, Yinqi Yi, Wendy Soh, Kedar Hippalgaonkar and D. Ian Wilson
{"title":"Opentrons for automated and high-throughput viscometry†","authors":"Beatrice W. Soh, Aniket Chitre, Shu Zheng Tan, Yuhan Wang, Yinqi Yi, Wendy Soh, Kedar Hippalgaonkar and D. Ian Wilson","doi":"10.1039/D4DD00368C","DOIUrl":"https://doi.org/10.1039/D4DD00368C","url":null,"abstract":"<p >We present an improved high-throughput proxy viscometer based on the Opentrons (OT-2) automated liquid handler. The working principle of the viscometer lies in the differing rates at which air-displacement pipettes dispense liquids of different viscosities. The operating protocol involves measuring the amount of liquid dispensed over a set time for given dispense conditions. Data collected at different set dispense flow rates was used to train an ensemble machine learning regressor to predict Newtonian liquid viscosity in the range of 20–20 000 cP, with ∼450 cP error (∼8% relative to sample mean). A phenomenological model predicting the observed trends is presented and used to extend the applicability of the proxy viscometer to simple non-Newtonian liquids. As proof-of-concept, we demonstrate the ability of the proxy viscometer to characterize the rheological behavior of two types of power-law fluids.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 711-722"},"PeriodicalIF":6.2,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00368c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emilio Nuñez-Andrade, Isaac Vidal-Daza, James W. Ryan, Rafael Gómez-Bombarelli and Francisco J. Martin-Martinez
{"title":"Embedded machine-readable molecular representation for resource-efficient deep learning applications†","authors":"Emilio Nuñez-Andrade, Isaac Vidal-Daza, James W. Ryan, Rafael Gómez-Bombarelli and Francisco J. Martin-Martinez","doi":"10.1039/D4DD00230J","DOIUrl":"https://doi.org/10.1039/D4DD00230J","url":null,"abstract":"<p >The practical implementation of deep learning methods for chemistry applications relies on encoding chemical structures into machine-readable formats that can be efficiently processed by computational tools. To this end, One Hot Encoding (OHE) is an established representation of alphanumeric categorical data in expanded numerical matrices. We have developed an embedded alternative to OHE that encodes discrete alphanumeric tokens of an <em>N</em>-sized alphabet into a few real numbers that constitute a simpler matrix representation of chemical structures. The implementation of this embedded One Hot Encoding (eOHE) in training machine learning models achieves comparable results to OHE in model accuracy and robustness while significantly reducing the use of computational resources. Our benchmarks across three molecular representations (SMILES, DeepSMILES, and SELFIES) and three different molecular databases (ZINC, QM9, and GDB-13) for Variational Autoencoders (VAEs) and Recurrent Neural Networks (RNNs) show that using eOHE reduces vRAM memory usage by up to 50% while increasing disk Memory Reduction Efficiency (MRE) to 80% on average. This encoding method opens up new avenues for data representation in embedded formats that promote energy efficiency and scalable computing in resource-constrained devices or in scenarios with limited computing resources. The application of eOHE impacts not only the chemistry field but also other disciplines that rely on the use of OHE.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 776-789"},"PeriodicalIF":6.2,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00230j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander E. Siemenn, Basita Das, Eunice Aissi, Fang Sheng, Lleyton Elliott, Blake Hudspeth, Marilyn Meyers, James Serdy and Tonio Buonassisi
{"title":"Archerfish: a retrofitted 3D printer for high-throughput combinatorial experimentation via continuous printing†","authors":"Alexander E. Siemenn, Basita Das, Eunice Aissi, Fang Sheng, Lleyton Elliott, Blake Hudspeth, Marilyn Meyers, James Serdy and Tonio Buonassisi","doi":"10.1039/D4DD00249K","DOIUrl":"https://doi.org/10.1039/D4DD00249K","url":null,"abstract":"<p >The maturation of 3D printing technology has enabled low-cost, rapid prototyping capabilities for mainstreaming accelerated product design. The materials research community has recognized this need, but no universally accepted rapid prototyping technique currently exists for material design. Toward this end, we develop Archerfish, a 3D printer retrofitted to dispense liquid with <em>in situ</em> mixing capabilities for performing high-throughput combinatorial printing (HTCP) of material compositions. Using this HTCP design, we demonstrate continuous printing throughputs of up to 250 unique compositions per minute, 100× faster than similar tools such as Opentrons that utilize stepwise printing with <em>ex situ</em> mixing. We validate the formation of these combinatorial “prototype” material gradients using hyperspectral image analysis and energy-dispersive X-ray spectroscopy. Furthermore, we describe hardware challenges to realizing reproducible, accurate, and precise composition gradients with continuous printing, including those related to precursor dispensing, mixing, and deposition. Despite these limitations, the continuous printing and low-cost design of Archerfish demonstrate promising accelerated materials screening results across a range of materials systems from nanoparticles to perovskites.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 896-909"},"PeriodicalIF":6.2,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00249k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benedikt Winter, Philipp Rehner, Timm Esper, Johannes Schilling and André Bardow
{"title":"Understanding the language of molecules: predicting pure component parameters for the PC-SAFT equation of state from SMILES†","authors":"Benedikt Winter, Philipp Rehner, Timm Esper, Johannes Schilling and André Bardow","doi":"10.1039/D4DD00077C","DOIUrl":"https://doi.org/10.1039/D4DD00077C","url":null,"abstract":"<p >A major bottleneck in developing sustainable processes and materials is a lack of property data. Recently, machine learning approaches have vastly improved previous methods for predicting molecular properties. However, these machine learning models are often not able to handle thermodynamic constraints adequately. In this work, we present a machine learning model based on natural language processing to predict pure-component parameters for the perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state. The model is based on our previously proposed SMILES-to-Properties-Transformer (SPT). By incorporating PC-SAFT into the neural network architecture, the machine learning model is trained directly on experimental vapor pressure and liquid density data. Combining established physical modeling approaches with state-of-the-art machine learning methods enables high-accuracy predictions across a wide range of pressures and temperatures, while keeping the thermodynamic consistency of an equation of state like PC-SAFT. SPT<small><sub>PC-SAFT</sub></small> demonstrates exceptional prediction accuracy even for complex molecules with various functional groups, outperforming traditional group contribution methods by a factor of four in the mean average percentage deviation. Moreover, SPT<small><sub>PC-SAFT</sub></small> captures the behavior of stereoisomers without any special consideration. To facilitate the application of our model, we provide predicted PC-SAFT parameters of 13 279 components, making PC-SAFT accessible to all researchers.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 5","pages":" 1142-1157"},"PeriodicalIF":6.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00077c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}