Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain and Gerbrand Ceder
{"title":"AlabOS: a Python-based reconfigurable workflow management framework for autonomous laboratories","authors":"Yuxing Fei, Bernardus Rendy, Rishi Kumar, Olympia Dartsi, Hrushikesh P. Sahasrabuddhe, Matthew J. McDermott, Zheren Wang, Nathan J. Szymanski, Lauren N. Walters, David Milsted, Yan Zeng, Anubhav Jain and Gerbrand Ceder","doi":"10.1039/D4DD00129J","DOIUrl":"https://doi.org/10.1039/D4DD00129J","url":null,"abstract":"<p >The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. AlabOS features a reconfigurable experiment workflow model and a resource reservation mechanism, enabling the simultaneous execution of varied workflows composed of modular tasks while eliminating conflicts between tasks. To showcase its capability, we demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory, the A-Lab, with around 3500 samples synthesized over 1.5 years.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00129j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-driven exploration of silver nanoplate formation in multidimensional chemical design spaces†","authors":"Huat Thart Chiang, Kiran Vaddi and Lilo Pozzo","doi":"10.1039/D4DD00211C","DOIUrl":"https://doi.org/10.1039/D4DD00211C","url":null,"abstract":"<p >We present an autonomous data-driven framework that iteratively explores the experimental design space of silver nanoparticle synthesis to obtain control over the formation of a desired morphology and size. The objective of the method is to identify design rules such as the effects of the design variables on the structure of the nanoparticle. The framework balances multimodal characterization methods (<em>i.e.</em> UV-vis spectroscopy, SAXS, TEM), taking into account the cost of performing a measurement and the quality of information gained. By integrating with an AI agent, we identify important design variables in the synthesis of small colloidally stable plate-like silver particles and outline how each variable affects plate thickness, radius, polydispersity, and relative concentration. Our findings are consistent with the literature, demonstrating that the framework could be further applied to new systems that have not been well characterized and understood. The framework is generalizable and allows tangible knowledge extraction from the high-throughput experimental runs while still considering inherent stochasticity.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00211c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre A. Schoepfer, Jan Weinreich, Ruben Laplaza, Jerome Waser and Clemence Corminboeuf
{"title":"Cost-informed Bayesian reaction optimization†","authors":"Alexandre A. Schoepfer, Jan Weinreich, Ruben Laplaza, Jerome Waser and Clemence Corminboeuf","doi":"10.1039/D4DD00225C","DOIUrl":"10.1039/D4DD00225C","url":null,"abstract":"<p >Bayesian optimization (BO) is an efficient method for solving complex optimization problems, including those in chemical research, where it is gaining significant popularity. Although effective in guiding experimental design, BO does not account for experimentation costs: testing readily available reagents under different conditions could be more cost and time-effective than synthesizing or buying additional ones. To address this issue, we present cost-informed BO (CIBO), an approach tailored for the rational planning of chemical experimentation that prioritizes the most cost-effective experiments. Reagents are used only when their anticipated improvement in reaction performance sufficiently outweighs their costs. Our algorithm tracks available reagents, including those recently acquired, and dynamically updates their cost during the optimization. Using literature data of Pd-catalyzed reactions, we show that CIBO reduces the cost of reaction optimization by up to 90% compared to standard BO. Our approach is compatible with any type of cost, <em>e.g.</em>, of buying equipment or compounds, waiting time, as well as environmental or security concerns. We believe CIBO extends the possibilities of BO in chemistry and envision applications for both traditional and self-driving laboratories for experiment planning.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11465108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saurabh Vashistha, Bashista Kumar Mahanta, Vivek Kumar Singh, Neha Sharma, Anjan Ray, Saurabh Dixit and Shailesh Kumar Singh
{"title":"Machine learning-assisted analysis of dry and lubricated tribological properties of Al–Co–Cr–Fe–Ni high entropy alloy","authors":"Saurabh Vashistha, Bashista Kumar Mahanta, Vivek Kumar Singh, Neha Sharma, Anjan Ray, Saurabh Dixit and Shailesh Kumar Singh","doi":"10.1039/D4DD00169A","DOIUrl":"https://doi.org/10.1039/D4DD00169A","url":null,"abstract":"<p >This study marks a notable advancement in tribology by thoroughly investigating the tribological properties of a high-entropy alloy under both lubricated and dry conditions. The research encompasses a detailed evaluation of the alloy's wear behavior, utilizing a data-driven modeling approach that employs an evolutionary framework to build and validate a predictive model. The findings offer critical insights into the tribological performance of high-entropy alloys under diverse operational and lubrication conditions. Specifically, the Al–Co–Cr–Fe–Ni alloy exhibits exceptional tribological properties, with a coefficient of friction ranging from 0.0165 to 0.6024 and surface roughness between 0.261 and 1.11. A data-driven methodology was employed to develop a predictive model with an accuracy exceeding 94%, effectively capturing the precise trends in lubrication behavior and providing in-depth information on surface characteristics for future experimental endeavors and data extraction. Additionally, the study underscores the profound impact of lubricant chemical composition on the wear behavior of the alloy, highlighting the crucial importance of selecting appropriate lubricants for specific tribological applications.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00169a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive review of emerging approaches in machine learning for de novo PROTAC design","authors":"Yossra Gharbi and Rocío Mercado","doi":"10.1039/D4DD00177J","DOIUrl":"https://doi.org/10.1039/D4DD00177J","url":null,"abstract":"<p >Targeted protein degradation (TPD) is a rapidly growing field in modern drug discovery that aims to regulate the intracellular levels of proteins by harnessing the cell's innate degradation pathways to selectively target and degrade disease-related proteins. This strategy creates new opportunities for therapeutic intervention in cases where occupancy-based inhibitors have not been successful. Proteolysis-targeting chimeras (PROTACs) are at the heart of TPD strategies, which leverage the ubiquitin–proteasome system for the selective targeting and proteasomal degradation of pathogenic proteins. This unique mechanism can be particularly useful for dealing with proteins that were once deemed “undruggable” using conventional small-molecule drugs. PROTACs are hetero-bifunctional molecules consisting of two ligands, connected by a chemical linker. As the field evolves, it becomes increasingly apparent that traditional methodologies for designing such complex molecules have limitations. This has led to the use of machine learning (ML) and generative modeling to improve and accelerate the development process. In this review, we aim to provide a thorough exploration of the impact of ML on <em>de novo</em> PROTAC design – an aspect of molecular design that has not been comprehensively reviewed despite its significance. Initially, we delve into the distinct characteristics of PROTAC linker design, underscoring the complexities required to create effective bifunctional molecules capable of TPD. We then examine how ML in the context of fragment-based drug design (FBDD), honed in the realm of small-molecule drug discovery, is paving the way for PROTAC linker design. Our review provides a critical evaluation of the limitations inherent in applying this method to the complex field of PROTAC development. Moreover, we review existing ML works applied to PROTAC design, highlighting pioneering efforts and, importantly, the limitations these studies face. By offering insights into the current state of PROTAC development and the integral role of ML in PROTAC design, we aim to provide valuable perspectives for biologists, chemists, and ML practitioners alike in their pursuit of better design strategies for this new modality.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00177j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning material synthesis–process–structure–property relationship by data fusion: Bayesian co-regionalization N-dimensional piecewise function learning†","authors":"A. Gilad Kusne, Austin McDannald and Brian DeCost","doi":"10.1039/D4DD00048J","DOIUrl":"https://doi.org/10.1039/D4DD00048J","url":null,"abstract":"<p >Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis–process–structure–property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis–process–structure–property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization and probability to merge knowledge across data sources into a unified model of synthesis–process–structure–property relationships. SAGE outputs a probabilistic posterior including the most likely relationship given the data along with proper uncertainty quantification. Beyond autonomous systems, SAGE will allow materials researchers to unify knowledge across their lab toward making better experiment design decisions.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00048j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jehad Abed, Yang Bai, Daniel Persaud, Jiheon Kim, Julia Witt, Jason Hattrick-Simpers and Edward H. Sargent
{"title":"AMPERE: automated modular platform for expedited and reproducible electrochemical testing†","authors":"Jehad Abed, Yang Bai, Daniel Persaud, Jiheon Kim, Julia Witt, Jason Hattrick-Simpers and Edward H. Sargent","doi":"10.1039/D4DD00203B","DOIUrl":"https://doi.org/10.1039/D4DD00203B","url":null,"abstract":"<p >Rapid and reliable electrochemical screening is critical to accelerate the development of catalysts for sustainable energy generation and storage. This paper introduces an automated and modular platform for expedited and reproducible electrochemical testing (AMPERE), designed to enhance the efficiency and reliability of multivariate optimization. The platform integrates a liquid-handling robot with custom-made modular array reactors, offering sample preparation and electrochemical testing in the same platform. Additionally, we use offline inductively coupled plasma optical emission spectroscopy (ICP-OES) to measure metal concentrations in the electrolyte after the reaction, which serves as a proxy for assessing the electrochemical stability. We use the platform to conduct 168 experiments continuously in less than 40 hours to examine the influence of catalyst ink formulation on the performance of Ir, Ru, IrO<small><sub>2</sub></small>, and RuO<small><sub>2</sub></small> for the oxygen evolution reaction (OER) in acid. We specifically investigate the role of solvent type and concentration, catalyst concentration, and binder content on the performance. We find that Ru/RuO<small><sub>2</sub></small> catalysts show improvements in activity that are not directly linked to improvements in the electrochemical surface area or inversely correlated to Ru dissolution. This suggests a complex interplay between the catalytic performance of the drop-casted catalyst film and ink formulation. AMPERE simplifies catalyst preparation and testing at large scale, making it faster, more reliable, and accessible for widespread use.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00203b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qisong Xu, Alan K. X. Tan, Liangfeng Guo, Yee Hwee Lim, Dillon W. P. Tay and Shi Jun Ang
{"title":"Composite machine learning strategy for natural products taxonomical classification and structural insights†","authors":"Qisong Xu, Alan K. X. Tan, Liangfeng Guo, Yee Hwee Lim, Dillon W. P. Tay and Shi Jun Ang","doi":"10.1039/D4DD00155A","DOIUrl":"https://doi.org/10.1039/D4DD00155A","url":null,"abstract":"<p >Taxonomical classification of natural products (NPs) can assist in genomic and phylogenetic analysis of source organisms and facilitate streamlining of bioprospecting efforts. Here, a composite machine learning strategy marrying graph convolutional neural networks (GCNNs) and eXteme Gradient boosting (XGB) is proposed and validated for taxonomical classification of NPs in five kingdoms (Animalia, Bacteria, Chromista, Fungi, and Plantae). Our composite model, trained on 133 092 NPs from the LOTUS database, achieved five-fold cross-validated classification accuracy of 97.4%. When employed to classify out-of-sample NPs from the NP Atlas database, accuracies of 82.8% for bacteria and 86.6% for fungi were obtained. Dimensionality-reduced representations of the molecular embeddings from our composite model revealed distinct clusters of NPs that suggest a basis for enhanced classification performance. The top critical substructures from the NPs of each kingdom were also identified and compared to provide insights on structure–taxonomy relationships. Overall, this study showcases the potential of composite machine learning models for robust taxonomical classification of NPs, which can streamline discovery of NPs.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00155a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salatan Duangdangchote, Dwight S. Seferos and Oleksandr Voznyy
{"title":"Stability and transferability of machine learning force fields for molecular dynamics applications†","authors":"Salatan Duangdangchote, Dwight S. Seferos and Oleksandr Voznyy","doi":"10.1039/D4DD00140K","DOIUrl":"https://doi.org/10.1039/D4DD00140K","url":null,"abstract":"<p >In this study, we focus on simplifying the generation of Machine Learning Force Fields (MLFFs) for Molecular Dynamics (MD) simulations of inorganic materials, with an emphasis on sustainable use of computational resources. We evaluate the efficiency and accuracy of existing state-of-the-art graph neural network (GNN) models and introduce new benchmarks that go beyond conventional mean absolute error on forces and energies. We showcase our methodology on the example of lithium-ion conductor materials, paving the way to a broader screening of ionic conductors for batteries and fuel cells.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00140k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer learning based on atomic feature extraction for the prediction of experimental 13C chemical shifts†","authors":"Žarko Ivković, Jesús Jover and Jeremy Harvey","doi":"10.1039/D4DD00168K","DOIUrl":"https://doi.org/10.1039/D4DD00168K","url":null,"abstract":"<p >Forecasting experimental chemical shifts of organic compounds is a long-standing challenge in organic chemistry. Recent advances in machine learning (ML) have led to routines that surpass the accuracy of <em>ab initio</em> Density Functional Theory (DFT) in estimating experimental <small><sup>13</sup></small>C shifts. The extraction of knowledge from other models, known as transfer learning, has demonstrated remarkable improvements, particularly in scenarios with limited data availability. However, the extent to which transfer learning improves predictive accuracy in low-data regimes for experimental chemical shift predictions remains unexplored. This study indicates that atomic features derived from a message passing neural network (MPNN) forcefield are robust descriptors for atomic properties. A dense network utilizing these descriptors to predict <small><sup>13</sup></small>C shifts achieves a mean absolute error (MAE) of 1.68 ppm. When these features are used as node labels in a simple graph neural network (GNN), the model attains a better MAE of 1.34 ppm. On the other hand, embeddings from a self-supervised pre-trained 3D aware transformer are not sufficiently descriptive for a feedforward model but show reasonable accuracy within the GNN framework, achieving an MAE of 1.51 ppm. Under low-data conditions, all transfer-learned models show a significant improvement in predictive accuracy compared to existing literature models, regardless of the sampling strategy used to select from the pool of unlabeled examples. We demonstrated that extracting atomic features from models trained on large and diverse datasets is an effective transfer learning strategy for predicting NMR chemical shifts, achieving results on par with existing literature models. This method provides several benefits, such as reduced training times, simpler models with fewer trainable parameters, and strong performance in low-data scenarios, without the need for costly <em>ab initio</em> data of the target property. This technique can be applied to other chemical tasks opening many new potential applications where the amount of data is a limiting factor.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00168k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}