Rolf David, Miguel de la Puente, Axel Gomez, Olaia Anton, Guillaume Stirnemann, Damien Laage
{"title":"ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials.","authors":"Rolf David, Miguel de la Puente, Axel Gomez, Olaia Anton, Guillaume Stirnemann, Damien Laage","doi":"10.1039/d4dd00209a","DOIUrl":"10.1039/d4dd00209a","url":null,"abstract":"<p><p>The emergence of artificial intelligence is profoundly impacting computational chemistry, particularly through machine-learning interatomic potentials (MLIPs). Unlike traditional potential energy surface representations, MLIPs overcome the conventional computational scaling limitations by offering an effective combination of accuracy and efficiency for calculating atomic energies and forces to be used in molecular simulations. These MLIPs have significantly enhanced molecular simulations across various applications, including large-scale simulations of materials, interfaces, chemical reactions, and beyond. Despite these advances, the construction of training datasets-a critical component for the accuracy of MLIPs-has not received proportional attention, especially in the context of chemical reactivity, which depends on rare barrier-crossing events that are not easily included in the datasets. Here we address this gap by introducing ArcaNN, a comprehensive framework designed for generating training datasets for reactive MLIPs. ArcaNN employs a concurrent learning approach combined with advanced sampling techniques to ensure an accurate representation of high-energy geometries. The framework integrates automated processes for iterative training, exploration, new configuration selection, and energy and force labeling, all while ensuring reproducibility and documentation. We demonstrate ArcaNN's capabilities through two paradigm reactions: a nucleophilic substitution and a Diels-Alder reaction. These examples showcase its effectiveness, the uniformly low error of the resulting MLIP everywhere along the chemical reaction coordinate, and its potential for broad applications in reactive molecular dynamics. Finally, we provide guidelines for assessing the quality of MLIPs in reactive systems.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11563209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bradley P. Sutliff, Peter A. Beaucage, Debra J. Audus, Sara V. Orski and Tyler B. Martin
{"title":"Sorting polyolefins with near-infrared spectroscopy: identification of optimal data analysis pipelines and machine learning classifiers†‡","authors":"Bradley P. Sutliff, Peter A. Beaucage, Debra J. Audus, Sara V. Orski and Tyler B. Martin","doi":"10.1039/D4DD00235K","DOIUrl":"https://doi.org/10.1039/D4DD00235K","url":null,"abstract":"<p >Polyolefins (POs) are the largest class of polymers produced worldwide. Despite the intrinsic chemical similarities within this class of polymers, they are often physically incompatible. This combination presents a significant hurdle for high-throughput recycling systems that strive to sort various types of plastics from one another. Some research has been done to show that near-infrared spectroscopy (NIR) can sort POs from other plastics, but they generally fall short of sorting POs from one another. In this work, we enhance NIR spectroscopy-based sortation by screening over 12 000 machine-learning pipelines to enable sorting of PO species beyond what is possible using current NIR databases. These pipelines include a series of scattering corrections, filtering and differentiation, data scaling, dimensionality reduction, and machine learning classifiers. Common scattering corrections and preprocessing steps include scatter correction, linear detrending, and Savitzky–Golay filtering. Dimensionality reduction techniques such as principal component analysis (PCA), functional principal component analysis (fPCA) and uniform manifold approximation and projection (UMAP) were also investigated for classification enhancements. This analysis of preprocessing steps and classification algorithm combinations identified multiple data pipelines capable of successfully sorting PO materials with over 95% accuracy. Through rigorous testing, this study provides recommendations for consistently applying preprocessing and classification techniques without over-complicating the data analysis. This work also provides a set of preprocessing steps, a chosen classifier, and tuned hyperparameters that may be useful for benchmarking new models and data sets. Finally, the approach outlined here is ready to be applied by the developers of materials sortation equipment so that we can improve the value and purity of recycled plastic waste streams.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2341-2355"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00235k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Rensmeyer, Ben Craig, Denis Kramer and Oliver Niggemann
{"title":"High accuracy uncertainty-aware interatomic force modeling with equivariant Bayesian neural networks†","authors":"Tim Rensmeyer, Ben Craig, Denis Kramer and Oliver Niggemann","doi":"10.1039/D4DD00183D","DOIUrl":"https://doi.org/10.1039/D4DD00183D","url":null,"abstract":"<p > <em>Ab initio</em> molecular dynamics simulations of material properties have become a cornerstone in the development of novel materials for a wide range of applications such as battery technology and catalysis. Unfortunately, their high computational demand can make them unsuitable in many applications. Consequently, surrogate modeling <em>via</em> neural networks has become an active field of research. Two of the major obstacles to their practical application in many cases are assessing the reliability of the neural network predictions and the difficulty of generating suitable datasets to train the neural network in the first place. Bayesian neural networks offer a promising framework for modeling uncertainty, active learning and improving data efficiency and robustness by incorporating prior physical knowledge. However, due to the high computational demand and slow convergence of the gold standard approach of Monte Carlo Markov Chain (MCMC) sampling methods, variational inference <em>via</em> Monte Carlo dropout is currently the only sampling method successfully applied in this domain. Since MCMC methods have often displayed a superior quality in their uncertainty quantification, developing a suitable MCMC method in this domain would be a significant advance in making neural network-based molecular dynamics simulations more practically viable. In this paper, we demonstrate that convergence for state-of-the-art models with high-quality MCMC methods can still be achieved in a practical amount of time by introducing a novel parameter-specific adaptive step size scheme. In addition, we introduce a new stochastic neural network model based on the NequIP architecture and demonstrate that, when combined with our novel sampling algorithm, we obtain predictions with state-of-the-art accuracy as well as a significantly improved measure of uncertainty over Monte Carlo dropout. Lastly, we show that the proposed algorithm can even outperform deep ensembles while sampling from a single Markov chain.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2356-2366"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00183d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Shayan Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Samuel Shi, Maria J. Gendron Romero, Jason E. Hein and Jason Hattrick-Simpers
{"title":"Artificial intelligence-enabled optimization of battery-grade lithium carbonate production†","authors":"S. Shayan Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Samuel Shi, Maria J. Gendron Romero, Jason E. Hein and Jason Hattrick-Simpers","doi":"10.1039/D4DD00159A","DOIUrl":"https://doi.org/10.1039/D4DD00159A","url":null,"abstract":"<p >By 2035, the need for battery-grade lithium is expected to quadruple. About half of this lithium is currently sourced from brines and must be converted from lithium chloride into lithium carbonate (Li<small><sub>2</sub></small>CO<small><sub>3</sub></small>) through a process called softening. Conventional softening methods using sodium or potassium salts contribute to carbon emissions during reagent mining and battery manufacturing, exacerbating global warming. This study introduces an alternative approach using carbon dioxide (CO<small><sub>2(g)</sub></small>) as the carbonating reagent in the lithium softening process, offering a carbon capture solution. We employed an active learning-driven high-throughput method to rapidly capture CO<small><sub>2(g)</sub></small> and convert it to lithium carbonate. The model was simplified by focusing on the elemental concentrations of C, Li, and N for practical measurement and tracking, avoiding the complexities of ion speciation equilibria. This approach led to an optimized lithium carbonate process that capitalizes on CO<small><sub>2(g)</sub></small> capture and improves the battery metal supply chain's carbon efficiency.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2320-2326"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00159a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benedikt Winter, Clemens Winter, Johannes Schilling and André Bardow
{"title":"Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing","authors":"Benedikt Winter, Clemens Winter, Johannes Schilling and André Bardow","doi":"10.1039/D4DD90045F","DOIUrl":"10.1039/D4DD90045F","url":null,"abstract":"<p >Correction for ‘A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing’ by Benedikt Winter <em>et al.</em>, <em>Digital Discovery</em>, 2022, <strong>1</strong>, 859–869, https://doi.org/10.1039/D2DD00058J.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2384-2384"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heqian Zhang, Jiaquan Huang, Xiaoyu Wang, Zhizeng Gao, Song Meng, Hang Li, Shanshan Zhou, Shang Wang, Shan Wang, Xunyou Yan, Xinwei Yang, Xiaoluo Huang and Zhiwei Qin
{"title":"Embedding DNA-based natural language in microbes for the benefit of future researchers†","authors":"Heqian Zhang, Jiaquan Huang, Xiaoyu Wang, Zhizeng Gao, Song Meng, Hang Li, Shanshan Zhou, Shang Wang, Shan Wang, Xunyou Yan, Xinwei Yang, Xiaoluo Huang and Zhiwei Qin","doi":"10.1039/D4DD00251B","DOIUrl":"https://doi.org/10.1039/D4DD00251B","url":null,"abstract":"<p >Microorganisms are valuable resources as antibiotic producers, biocontrol agents, and symbiotic agents in various ecosystems and organisms. Over the past decades, there has been a notable increase in the identification and generation of both wild-type and genetically modified microbial strains from research laboratories worldwide. However, a substantial portion of the information represented in these strains remains scattered across the scientific literature. To facilitate the work of future researchers, in this perspective article, we advocate the adoption of the DNA-based natural language (DBNL) algorithm standard and then demonstrate it using a <em>Streptomyces</em> species as a proof of concept. This standard enables the sophisticated genome sequencing and subsequent extraction of valuable information encoded within a particular microbial species. In addition, it allows the access of such information for the continued research and applications even if a currently cultivated microbe cannot be cultured in the future. Embracing the DBNL algorithm standard promises to enhance the efficiency and effectiveness of microbial research, paving the way for innovative solutions and discoveries in diverse fields.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2377-2383"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00251b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuyan Yang, Yifei Lin, Shengnan Dai, Yifan Zhu, Jinyang Xi, Lili Xi, Xiaokun Gu, David J. Singh, Wenqing Zhang and Jiong Yang
{"title":"HH130: a standardized database of machine learning interatomic potentials, datasets, and its applications in the thermal transport of half-Heusler thermoelectrics†","authors":"Yuyan Yang, Yifei Lin, Shengnan Dai, Yifan Zhu, Jinyang Xi, Lili Xi, Xiaokun Gu, David J. Singh, Wenqing Zhang and Jiong Yang","doi":"10.1039/D4DD00240G","DOIUrl":"https://doi.org/10.1039/D4DD00240G","url":null,"abstract":"<p >High-throughput screening of thermoelectric materials from databases requires efficient and accurate computational methods. Machine-learning interatomic potentials (MLIPs) provide a promising avenue, facilitating the development of database-driven thermal transport applications through high-throughput simulations. However, the present challenge is the lack of standardized databases and openly available models for precise large-scale simulations. Here, we introduce HH130, a standardized database for 130 half-Heusler (HH) compounds in MatHub-3d (http://www.mathub3d.net), containing both MLIP models and datasets for the thermal transport of HH thermoelectrics. HH130 contains 31 891 total configurations (∼245 configurations per HH) and 390 MLIP models (three models per HH), generated using the dual adaptive sampling method to cover a wide range of thermodynamic conditions, and can be openly accessed on MatHub-3d. Comprehensive validation against first-principles calculations demonstrates that the MLIP models accurately predict energies, forces, and interatomic force constants (IFCs). The MLIP models in HH130 enabled us to efficiently perform four-phonon interactions for 80 HHs with phonon frequencies closely matching <em>ab initio</em> results. It is found that HHs with an 8 valence electron count (VEC) per unit cell generally exhibit lower lattice thermal conductivities (<em>κ</em><small><sub>L</sub></small>s) compared to those with an 18 VEC, due to a combination of low 2nd-order IFCs and large scattering phase spaces in the former group. Additionally, we identified several HHs that demonstrate significant reductions in <em>κ</em><small><sub>L</sub></small> due to four-phonon interactions. HH130 provides a robust platform for high-throughput computation of <em>κ</em><small><sub>L</sub></small> and aids in the discovery of next-generation thermoelectrics through machine learning.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2201-2210"},"PeriodicalIF":6.2,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00240g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dingqi Nai, Gabriel S. Gusmão, Zachary A. Kilwein, Fani Boukouvala and Andrew J. Medford
{"title":"Micro-kinetic modeling of temporal analysis of products data using kinetics-informed neural networks†","authors":"Dingqi Nai, Gabriel S. Gusmão, Zachary A. Kilwein, Fani Boukouvala and Andrew J. Medford","doi":"10.1039/D4DD00163J","DOIUrl":"https://doi.org/10.1039/D4DD00163J","url":null,"abstract":"<p >The temporal analysis of products (TAP) technique produces extensive transient kinetic data sets, but it is challenging to translate the large quantity of raw data into physically interpretable kinetic models, largely due to the computational scaling of existing numerical methods for fitting TAP data. In this work, we utilize kinetics-informed neural networks (KINNs), which are artificial feedforward neural networks designed to solve ordinary differential equations constrained by micro-kinetic models, to model the TAP data. We demonstrate that, under the assumption that all concentrations are known in the thin catalyst zone, KINNs can simultaneously fit the transient data, retrieve the kinetic model parameters, and interpolate unseen pulse behavior for multi-pulse experiments. We further demonstrate that, by modifying the loss function, KINNs maintain these capabilities even when precise thin-zone information is unavailable, as would be the case with real experimental TAP data. We also compare the approach to existing optimization techniques, which reveals improved noise tolerance and performance in extracting kinetic parameters. The KINNs approach offers an efficient alternative for TAP analysis and can assist in interpreting transient kinetics in complex systems over long timescales.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2327-2340"},"PeriodicalIF":6.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00163j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MolBar: a molecular identifier for inorganic and organic molecules with full support of stereoisomerism†","authors":"Nils van Staalduinen and Christoph Bannwarth","doi":"10.1039/D4DD00208C","DOIUrl":"https://doi.org/10.1039/D4DD00208C","url":null,"abstract":"<p >Before a new molecular structure is registered to a chemical structure database, a duplicate check is essential to ensure the integrity of the database. The Simplified Molecular Input Line Entry Specification (SMILES) and the IUPAC International Chemical Identifier (InChI) stand out as widely used molecular identifiers for these checks. Notable limitations arise when dealing with molecules from inorganic chemistry or structures characterized by non-central stereochemistry. When the stereoinformation needs to be assigned to a group of atoms, widely used identifiers cannot describe axial and planar chirality due to the atom-centered description of a molecule. To address this limitation, we introduce a novel chemical identifier called the Molecular Barcode (MolBar). Motivated by the field of theoretical chemistry, a fragment-based approach is used in addition to the conventional atomistic description. In this approach, the 3D structure of fragments is normalized using a specialized force field and characterized by physically inspired matrices derived solely from atomic positions. The resulting permutation-invariant representation is constructed from the eigenvalue spectra, providing comprehensive information on both bonding and stereochemistry. The robustness of MolBar is demonstrated through duplication and permutation invariance tests on the Molecule3D dataset of 3.9 million molecules. A Python implementation is available as open source and can be installed <em>via pip install molbar</em>.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2298-2319"},"PeriodicalIF":6.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00208c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keiichi Okubo, Jaydeep Thik, Tomoya Yamaguchi and Chen Ling
{"title":"Computer vision enabled high-quality electrochemical experimentation","authors":"Keiichi Okubo, Jaydeep Thik, Tomoya Yamaguchi and Chen Ling","doi":"10.1039/D4DD00213J","DOIUrl":"https://doi.org/10.1039/D4DD00213J","url":null,"abstract":"<p >The rotating disk electrode (RDE) technique is an essential tool for studying the activity, stability, and other fundamental properties of electrocatalysts. High-quality RDE experimentation requires evenly coating the catalyst layer on the electrode surface, which relies heavily on experience and currently lacks necessary quality control. The lack of an adequate evaluation method to ensure the quality of RDE experimentation, aside from conventional judgment based on expertise, reduces efficiency, complicates data interpretation, and hinders future automation of RDE experimentation. Here we propose a simple, easy-to-execute and non-destructive method that combines microscopy imaging and artificial intelligence-based decision-making to assess the quality of as-prepared electrodes. We develop a convolutional neural network-based method that uses microscopic images of as-prepared electrodes to directly evaluate the sample quality. In a study of electrodes used for the oxygen reduction reaction, the model achieved an accuracy of over 80% in predicting sample qualities. Our method enables the removal of low-quality samples prior to the actual RDE test, thereby ensuring high-quality electrochemical experimentation and paving the way towards high-quality automated electrochemical experimentation. This approach is applicable to various electrochemical systems and highlights the potential of artificial intelligence in automated experimentation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2183-2191"},"PeriodicalIF":6.2,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00213j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}