Christofer Hardcastle, Ryan O'Mullan, Raymundo Arróyave and Brent Vela
{"title":"Physics-informed Gaussian process classification for constraint-aware alloy design†","authors":"Christofer Hardcastle, Ryan O'Mullan, Raymundo Arróyave and Brent Vela","doi":"10.1039/D5DD00084J","DOIUrl":"https://doi.org/10.1039/D5DD00084J","url":null,"abstract":"<p >Alloy design can be framed as a constraint-satisfaction problem. Building on previous methodologies, we propose equipping Gaussian Process Classifiers (GPCs) with physics-informed prior mean functions to model the centers of feasible design spaces. Through three case studies, we highlight the utility of informative priors for handling constraints on continuous and categorical properties. (1) <em>Phase stability</em>: by incorporating CALPHAD predictions as priors for solid-solution phase stability, we enhance model validation using a publicly available XRD dataset. (2) <em>Phase stability prediction refinement</em>: we demonstrate an <em>in silico</em> active learning approach to efficiently correct phase diagrams. (3) <em>Continuous property thresholds</em>: by embedding priors into continuous property models, we accelerate the discovery of alloys meeting specific property thresholds <em>via</em> active learning. In each case, integrating physics-based insights into the classification framework substantially improved model performance, demonstrating an efficient strategy for constraint-aware alloy design.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1884-1900"},"PeriodicalIF":6.2,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00084j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving an inverse problem with generative models","authors":"John R. Kitchin","doi":"10.1039/D5DD00137D","DOIUrl":"https://doi.org/10.1039/D5DD00137D","url":null,"abstract":"<p >Inverse problems, where we seek the values of inputs to a model that lead to a desired set of outputs, are considered a more challenging problem in science and engineering than forward problems where we compute or measure outputs from known inputs. In this work we demonstrate the use of two generative machine learning methods to solve inverse problems. We compare this approach to two more conventional approaches that use a forward model with nonlinear programming, and the use of a backward model. We illustrate each method on a dataset obtained from a simple remote instrument that has three inputs: the setting of the red, green and blue channels of an RGB LED. We focus on several outputs from a light sensor that measures intensity at 445 nm, 515 nm, 590 nm, and 630 nm. The specific problem we solve is identifying inputs that lead to a specific intensity in three of those channels. We show that generative models can be used to solve this kind of inverse problem, and they have some advantages over the conventional approaches.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1856-1869"},"PeriodicalIF":6.2,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00137d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuchao Tang, Bin Xiao, Shuizhou Chen, Quan Qian and Yi Liu
{"title":"Predefined attention-focused mechanism using center-environment features: a machine learning study of alloying effects on the stability of Nb5Si3 alloys†","authors":"Yuchao Tang, Bin Xiao, Shuizhou Chen, Quan Qian and Yi Liu","doi":"10.1039/D5DD00079C","DOIUrl":"https://doi.org/10.1039/D5DD00079C","url":null,"abstract":"<p >Digital encoding of material structures using graph-based features combined with deep neural networks often lacks local specificity. Additionally, incorporating a self-attention mechanism increases architectural complexity and demands extensive data. To overcome these challenges, we developed a Center-Environment (CE) feature representation—a less data-intensive, physics-informed predefined attention mechanism. The pre-attention mechanism underlying the CE model shifts attention from complex black-box machine learning (ML) algorithms to explicit feature models with physical meaning, reducing data requirements while enhancing the transparency and interpretability of ML models. This CE-based ML approach was employed to investigate the alloying effects on the structural stability of Nb<small><sub>5</sub></small>Si<small><sub>3</sub></small>, guiding data-driven compositional design for ultra-high-temperature NbSi superalloys. The CE features leveraged the Atomic Environment Type (AET) method to characterize the local low-symmetry physical environments of atoms. The optimized CE<small><sub>AET</sub></small> models reasonably predicted double-site substitution energies in α-Nb<small><sub>5</sub></small>Si<small><sub>3</sub></small>, achieving a mean absolute error (MAE) of 329.43 meV per cell. The robust transferability of the CE<small><sub>AET</sub></small> models was demonstrated by their successful prediction of untrained β-Nb<small><sub>5</sub></small>Si<small><sub>3</sub></small> structures. Site occupancy preferences were identified for B, Si, and Al at Si sites and for Ti, Hf, and Zr at Nb sites within β-Nb<small><sub>5</sub></small>Si<small><sub>3</sub></small>. This CE-based ML approach represents a broadly applicable and intelligent computational design method capable of handling complex crystal structures with strong transferability, even when working with small datasets.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1870-1883"},"PeriodicalIF":6.2,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00079c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Murat Cihan Sorkun, Xuan Zhou, Joannes Murigneux, Nicola Menegazzo, Ayush Kumar Narsaria, David Thanoon, Peter A. A. Klusener, Kaustubh Kaluskar, Sharan Shetty, Efstathios Barmpoutsis and Süleyman Er
{"title":"RedCat, an automated discovery workflow for aqueous organic electrolytes†","authors":"Murat Cihan Sorkun, Xuan Zhou, Joannes Murigneux, Nicola Menegazzo, Ayush Kumar Narsaria, David Thanoon, Peter A. A. Klusener, Kaustubh Kaluskar, Sharan Shetty, Efstathios Barmpoutsis and Süleyman Er","doi":"10.1039/D5DD00111K","DOIUrl":"https://doi.org/10.1039/D5DD00111K","url":null,"abstract":"<p >Developing cost-effective organic molecules with robust redox activity and high solubility is crucial for widespread acceptance and deployment of aqueous organic redox flow batteries (AORFBs). We present RedCat, an automated workflow designed to accelerate the discovery of redox-active organic molecules from extensive molecular databases. This workflow employs structure-based selection, machine learning models for predicting redox reaction energy and aqueous solubility, and dynamically integrates up-to-date pricing data to prioritize candidates. Applying this workflow to 112 million molecules from the PubChem database, we identified 261 promising anolyte candidates. We validated their battery-related properties through first-principles and molecular dynamics calculations and experimentally tested two electrochemically active molecules. These molecules demonstrated higher energy densities than previously reported compounds, confirming the robustness of our workflow in discovering electrolytes. With its open-access code repository and modular design, RedCat is well-suited for integration into self-driving labs, offering a scalable framework for autonomous, data-driven electrolyte discovery.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1844-1855"},"PeriodicalIF":6.2,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00111k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinpeng Li, Chuxuan Ding, Daobin Liu, Linjiang Chen and Jun Jiang
{"title":"Autonomous laboratories in China: an embodied intelligence-driven platform to accelerate chemical discovery","authors":"Jinpeng Li, Chuxuan Ding, Daobin Liu, Linjiang Chen and Jun Jiang","doi":"10.1039/D5DD00072F","DOIUrl":"https://doi.org/10.1039/D5DD00072F","url":null,"abstract":"<p >The emergence of autonomous laboratories—automated robotic platforms integrated with rapidly advancing artificial intelligence (AI)—is poised to transform research by shifting traditional trial-and-error approaches toward accelerated chemical discovery. These platforms combine AI models, hardware, and software to execute experiments, interact with robotic systems, and manage data, thereby closing the predict-make-measure discovery loop. However, key challenges remain, including how to efficiently achieve autonomous high-throughput experimentation and integrate diverse technologies into cohesive systems. In this perspective, we identify the fundamental elements required for closed-loop autonomous experimentation: chemical science databases, large-scale intelligent models, automated experimental platforms, and integrated management/decision-making systems. Furthermore, with the advancement of AI models, we emphasize the progress from simple iterative-algorithm-driven systems to comprehensive intelligent autonomous systems powered by large-scale models in China, which enable self-driving chemical discovery within individual laboratories. Looking ahead, the development of intelligent autonomous laboratories into a distributed network holds great promise for further accelerating chemical discoveries and fostering innovation on a broader scale.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1672-1684"},"PeriodicalIF":6.2,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00072f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mutual information informed novelty estimation of materials along chemical and structural axes†","authors":"Andrew R. Falkowski and Taylor D. Sparks","doi":"10.1039/D5DD00167F","DOIUrl":"https://doi.org/10.1039/D5DD00167F","url":null,"abstract":"<p >Assessing the novelty of computationally or experimentally discovered materials against vast databases is crucial for efficient materials exploration, yet robust, objective methods are lacking. This paper introduces a parameter-free approach to quantify material novelty along chemical and structural axes. Our method leverages mutual information (MI), analyzing how it changes with calculated inter-material distances (<em>e.g.</em>, using EIMD for chemistry, LoStOP for structure) to derive data-driven weight functions. These functions define meaningful similarity neighborhoods without preset cutoffs, yielding quantitative novelty scores based on local density. We validate the approach using synthetic data and demonstrate its effectiveness across diverse materials datasets, including perovskites with controlled subgroups, a collection with varied structure types, and predicted lithium compounds from the GNOME database compared against materials in the materials project. The MI-informed framework successfully identifies and differentiates chemical and structural novelty, offering an interpretable tool to guide materials discovery and assess new candidates within the context of existing knowledge.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1833-1843"},"PeriodicalIF":6.2,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00167f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cameron S. Movassaghi, Katie A. Perrotta, Maya E. Curry, Audrey N. Nashner, Katherine K. Nguyen, Mila E. Wesely, Miguel Alcañiz Fillol, Chong Liu, Aaron S. Meyer and Anne M. Andrews
{"title":"Machine-learning-guided design of electroanalytical pulse waveforms†","authors":"Cameron S. Movassaghi, Katie A. Perrotta, Maya E. Curry, Audrey N. Nashner, Katherine K. Nguyen, Mila E. Wesely, Miguel Alcañiz Fillol, Chong Liu, Aaron S. Meyer and Anne M. Andrews","doi":"10.1039/D5DD00005J","DOIUrl":"https://doi.org/10.1039/D5DD00005J","url":null,"abstract":"<p >Voltammetry is widely used to detect and quantify oxidizable or reducible species in complex environments. The neurotransmitter serotonin epitomizes an analyte that is challenging to detect <em>in situ</em> due to its low concentrations and the co-existence of similarly structured analytes and interferents. We developed rapid-pulse voltammetry for brain neurotransmitter monitoring due to the high information content elicited from voltage pulses. Generally, the design of voltammetry waveforms remains challenging due to prohibitively large combinatorial search spaces and a lack of design principles. Here, we illustrate how Bayesian optimization can be used to hone searches for optimized rapid pulse waveforms. Our machine-learning-guided workflow (SeroOpt) outperformed random and human-guided waveform designs and is tunable <em>a priori</em> to enable selective analyte detection. We interpreted the black box optimizer and found that the logic of machine-learning-guided waveform design reflected domain knowledge. Our approach is straightforward and generalizable for all single and multi-analyte problems requiring optimized electrochemical waveform solutions. Overall, SeroOpt enables data-driven exploration of the waveform design space and a new paradigm in electroanalytical method development.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1812-1832"},"PeriodicalIF":6.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00005j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu
{"title":"Choosing a suitable acquisition function for batch Bayesian optimization: comparison of serial and Monte Carlo approaches†","authors":"Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu","doi":"10.1039/D5DD00066A","DOIUrl":"https://doi.org/10.1039/D5DD00066A","url":null,"abstract":"<p >Batch Bayesian optimization is widely used for optimizing expensive experimental processes when several samples can be tested together to save time or cost. A central decision in designing a Bayesian optimization campaign to guide experiments is the choice of a batch acquisition function when little or nothing is known about the landscape of the “black box” function to be optimized. To inform this decision, we first compare the performance of serial and Monte Carlo batch acquisition functions on two mathematical functions that serve as proxies for typical materials synthesis and processing experiments. The two functions, both in six dimensions, are the Ackley function, which epitomizes a “needle-in-haystack” search, and the Hartmann function, which exemplifies a “false optimum” problem. Our study evaluates the serial upper confidence bound with local penalization (UCB/LP) batch acquisition policy against Monte Carlo-based parallel approaches: <em>q</em>-log expected improvement (<em>q</em>logEI) and <em>q</em>-upper confidence bound (<em>q</em>UCB), where <em>q</em> is the batch size. Tests on Ackley and Hartmann show that UCB/LP and <em>q</em>UCB perform well in noiseless conditions, both outperforming <em>q</em>logEI. For the Hartmann function with noise, all Monte Carlo functions achieve faster convergence with less sensitivity to initial conditions compared to UCB/LP. We then confirm the findings on an empirical regression model built from experimental data in maximizing power conversion efficiency of flexible perovskite solar cells. Our results suggest that when empirically optimizing a “black-box” function in ≤six dimensions with no prior knowledge of the landscape or noise characteristics, <em>q</em>UCB is best suited as the default to maximize confidence in the modeled optimum while minimizing the number of expensive samples needed.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1751-1762"},"PeriodicalIF":6.2,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00066a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Iglesias, Cristopher Tinajero, Simone Marchetti, Jaume Luis-Gómez, Raúl Martinez-Cuenca, Jose F. Fuentes-Ballesteros, Clara A. Aranda, Alejandro Martínez Serra, María C. Asensio, Rafael Abargues, Pablo P. Boix, Marcileia Zanatta and Victor Sans
{"title":"Digital flow platform for the synthesis of high-quality multi-material perovskites†","authors":"Diego Iglesias, Cristopher Tinajero, Simone Marchetti, Jaume Luis-Gómez, Raúl Martinez-Cuenca, Jose F. Fuentes-Ballesteros, Clara A. Aranda, Alejandro Martínez Serra, María C. Asensio, Rafael Abargues, Pablo P. Boix, Marcileia Zanatta and Victor Sans","doi":"10.1039/D5DD00099H","DOIUrl":"https://doi.org/10.1039/D5DD00099H","url":null,"abstract":"<p >Perovskite materials have demonstrated great potential for a wide range of optoelectronic applications due to their exceptional electronic and optical properties. However, synthesising high-quality perovskite films remains a significant challenge, often hindered by batch-wise processes that suffer from limited control over reaction conditions, scalability and reproducibility. In this study, we present a novel approach for synthesising single-crystal perovskites with an optimised continuous-flow reactor. Our methodology utilises a 3D printed system that enables precise control over reactant concentrations, reaction times, and temperature profiles. The reaction chamber was designed and optimised by combining residence time distribution (RTD) studies and computational fluid dynamics (CFD) simulations. High-quality single-crystal perovskites with different formulations were obtained employing seeding and seedless conditions. The possibility of synthesising mixed halide single crystal perovskites with different compositions along its structure was demonstrated by simply shifting the feedstock solution during the crystallisation, demonstrating the versatility of this technology.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1772-1783"},"PeriodicalIF":6.2,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00099h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computation-guided exploration of the reaction parameter space of N,N-dimethylformamide hydrolysis†","authors":"Ignas Pakamorė and Ross S. Forgan","doi":"10.1039/D5DD00200A","DOIUrl":"https://doi.org/10.1039/D5DD00200A","url":null,"abstract":"<p >Navigating the reaction parameter space can pose challenges, especially considering the exponential growth in the number of parameters even in seemingly straightforward chemical reactions or formulations. Consequently, recent research efforts have been increasingly dedicated to the development of computational tools aimed at facilitating the exploration process. Herein, we introduce ChemSPX, a Python-based program specifically crafted for exploring the complex landscape of reaction parameter space. We propose the use of the inverse distance function to map reaction parameter space and efficiently sample sparse regions. This is implemented in ChemSPX to allow the user to simply generate sets of reaction conditions that efficiently sample wide parameter spaces. In addition, the program includes tools necessary for the analysis and comprehension of the multidimensional parameter space landscape. The developed algorithms were utilized to experimentally investigate the hydrolysis of <em>N</em>,<em>N-</em>dimethylformamide (DMF), a commonly employed solvent, in the specific context of metal–organic framework synthesis. We use ChemSPX to generate batches of experiments to sample parameter space, starting from an empty space, but subsequently assessing under-sampled regions. We use statistical analysis and machine learning models to show that addition of strong acids induces hydrolysis, generating up to 1.0% (w/w) formic acid. The results show that ChemSPX can generate datasets that efficiently sample parameter space, in this case allowing the user to distinguish the individual effects of five different physical and chemical variables on reaction outcome.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1784-1793"},"PeriodicalIF":6.2,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00200a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}