Cameron S. Movassaghi, Katie A. Perrotta, Maya E. Curry, Audrey N. Nashner, Katherine K. Nguyen, Mila E. Wesely, Miguel Alcañiz Fillol, Chong Liu, Aaron S. Meyer and Anne M. Andrews
{"title":"Machine-learning-guided design of electroanalytical pulse waveforms†","authors":"Cameron S. Movassaghi, Katie A. Perrotta, Maya E. Curry, Audrey N. Nashner, Katherine K. Nguyen, Mila E. Wesely, Miguel Alcañiz Fillol, Chong Liu, Aaron S. Meyer and Anne M. Andrews","doi":"10.1039/D5DD00005J","DOIUrl":"https://doi.org/10.1039/D5DD00005J","url":null,"abstract":"<p >Voltammetry is widely used to detect and quantify oxidizable or reducible species in complex environments. The neurotransmitter serotonin epitomizes an analyte that is challenging to detect <em>in situ</em> due to its low concentrations and the co-existence of similarly structured analytes and interferents. We developed rapid-pulse voltammetry for brain neurotransmitter monitoring due to the high information content elicited from voltage pulses. Generally, the design of voltammetry waveforms remains challenging due to prohibitively large combinatorial search spaces and a lack of design principles. Here, we illustrate how Bayesian optimization can be used to hone searches for optimized rapid pulse waveforms. Our machine-learning-guided workflow (SeroOpt) outperformed random and human-guided waveform designs and is tunable <em>a priori</em> to enable selective analyte detection. We interpreted the black box optimizer and found that the logic of machine-learning-guided waveform design reflected domain knowledge. Our approach is straightforward and generalizable for all single and multi-analyte problems requiring optimized electrochemical waveform solutions. Overall, SeroOpt enables data-driven exploration of the waveform design space and a new paradigm in electroanalytical method development.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1812-1832"},"PeriodicalIF":6.2,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00005j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federico Grasselli, Sanggyu Chong, Venkat Kapil, Silvia Bonfanti and Kevin Rossi
{"title":"Uncertainty in the era of machine learning for atomistic modeling","authors":"Federico Grasselli, Sanggyu Chong, Venkat Kapil, Silvia Bonfanti and Kevin Rossi","doi":"10.1039/D5DD00102A","DOIUrl":"10.1039/D5DD00102A","url":null,"abstract":"<p >The widespread adoption of machine learning surrogate models has significantly improved the scale and complexity of systems and processes that can be explored accurately and efficiently using atomistic modeling. However, the inherently data-driven nature of machine learning models introduces uncertainties that must be quantified, understood, and effectively managed to ensure reliable predictions and conclusions. Building upon these premises, in this perspective, we first overview state-of-the-art uncertainty estimation methods, from Bayesian frameworks to ensembling techniques, and discuss their application in atomistic modeling. We then examine the interplay between model accuracy, uncertainty, training dataset composition, data acquisition strategies, model transferability, and robustness. In doing so, we synthesize insights from the existing literature and highlight areas of ongoing debate.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2654-2675"},"PeriodicalIF":6.2,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12423928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu
{"title":"Choosing a suitable acquisition function for batch Bayesian optimization: comparison of serial and Monte Carlo approaches†","authors":"Imon Mia, Mark Lee, Weijie Xu, William Vandenberghe and Julia W. P. Hsu","doi":"10.1039/D5DD00066A","DOIUrl":"https://doi.org/10.1039/D5DD00066A","url":null,"abstract":"<p >Batch Bayesian optimization is widely used for optimizing expensive experimental processes when several samples can be tested together to save time or cost. A central decision in designing a Bayesian optimization campaign to guide experiments is the choice of a batch acquisition function when little or nothing is known about the landscape of the “black box” function to be optimized. To inform this decision, we first compare the performance of serial and Monte Carlo batch acquisition functions on two mathematical functions that serve as proxies for typical materials synthesis and processing experiments. The two functions, both in six dimensions, are the Ackley function, which epitomizes a “needle-in-haystack” search, and the Hartmann function, which exemplifies a “false optimum” problem. Our study evaluates the serial upper confidence bound with local penalization (UCB/LP) batch acquisition policy against Monte Carlo-based parallel approaches: <em>q</em>-log expected improvement (<em>q</em>logEI) and <em>q</em>-upper confidence bound (<em>q</em>UCB), where <em>q</em> is the batch size. Tests on Ackley and Hartmann show that UCB/LP and <em>q</em>UCB perform well in noiseless conditions, both outperforming <em>q</em>logEI. For the Hartmann function with noise, all Monte Carlo functions achieve faster convergence with less sensitivity to initial conditions compared to UCB/LP. We then confirm the findings on an empirical regression model built from experimental data in maximizing power conversion efficiency of flexible perovskite solar cells. Our results suggest that when empirically optimizing a “black-box” function in ≤six dimensions with no prior knowledge of the landscape or noise characteristics, <em>q</em>UCB is best suited as the default to maximize confidence in the modeled optimum while minimizing the number of expensive samples needed.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1751-1762"},"PeriodicalIF":6.2,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00066a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Iglesias, Cristopher Tinajero, Simone Marchetti, Jaume Luis-Gómez, Raúl Martinez-Cuenca, Jose F. Fuentes-Ballesteros, Clara A. Aranda, Alejandro Martínez Serra, María C. Asensio, Rafael Abargues, Pablo P. Boix, Marcileia Zanatta and Victor Sans
{"title":"Digital flow platform for the synthesis of high-quality multi-material perovskites†","authors":"Diego Iglesias, Cristopher Tinajero, Simone Marchetti, Jaume Luis-Gómez, Raúl Martinez-Cuenca, Jose F. Fuentes-Ballesteros, Clara A. Aranda, Alejandro Martínez Serra, María C. Asensio, Rafael Abargues, Pablo P. Boix, Marcileia Zanatta and Victor Sans","doi":"10.1039/D5DD00099H","DOIUrl":"https://doi.org/10.1039/D5DD00099H","url":null,"abstract":"<p >Perovskite materials have demonstrated great potential for a wide range of optoelectronic applications due to their exceptional electronic and optical properties. However, synthesising high-quality perovskite films remains a significant challenge, often hindered by batch-wise processes that suffer from limited control over reaction conditions, scalability and reproducibility. In this study, we present a novel approach for synthesising single-crystal perovskites with an optimised continuous-flow reactor. Our methodology utilises a 3D printed system that enables precise control over reactant concentrations, reaction times, and temperature profiles. The reaction chamber was designed and optimised by combining residence time distribution (RTD) studies and computational fluid dynamics (CFD) simulations. High-quality single-crystal perovskites with different formulations were obtained employing seeding and seedless conditions. The possibility of synthesising mixed halide single crystal perovskites with different compositions along its structure was demonstrated by simply shifting the feedstock solution during the crystallisation, demonstrating the versatility of this technology.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1772-1783"},"PeriodicalIF":6.2,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00099h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computation-guided exploration of the reaction parameter space of N,N-dimethylformamide hydrolysis†","authors":"Ignas Pakamorė and Ross S. Forgan","doi":"10.1039/D5DD00200A","DOIUrl":"https://doi.org/10.1039/D5DD00200A","url":null,"abstract":"<p >Navigating the reaction parameter space can pose challenges, especially considering the exponential growth in the number of parameters even in seemingly straightforward chemical reactions or formulations. Consequently, recent research efforts have been increasingly dedicated to the development of computational tools aimed at facilitating the exploration process. Herein, we introduce ChemSPX, a Python-based program specifically crafted for exploring the complex landscape of reaction parameter space. We propose the use of the inverse distance function to map reaction parameter space and efficiently sample sparse regions. This is implemented in ChemSPX to allow the user to simply generate sets of reaction conditions that efficiently sample wide parameter spaces. In addition, the program includes tools necessary for the analysis and comprehension of the multidimensional parameter space landscape. The developed algorithms were utilized to experimentally investigate the hydrolysis of <em>N</em>,<em>N-</em>dimethylformamide (DMF), a commonly employed solvent, in the specific context of metal–organic framework synthesis. We use ChemSPX to generate batches of experiments to sample parameter space, starting from an empty space, but subsequently assessing under-sampled regions. We use statistical analysis and machine learning models to show that addition of strong acids induces hydrolysis, generating up to 1.0% (w/w) formic acid. The results show that ChemSPX can generate datasets that efficiently sample parameter space, in this case allowing the user to distinguish the individual effects of five different physical and chemical variables on reaction outcome.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1784-1793"},"PeriodicalIF":6.2,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00200a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuang Shi, Nakul Rampal, Chengbin Zhao, Dongrong Joe Fu, Christian Borgs, Jennifer T. Chayes and Omar M. Yaghi
{"title":"Comparison of LLMs in extracting synthesis conditions and generating Q&A datasets for metal–organic frameworks†","authors":"Yuang Shi, Nakul Rampal, Chengbin Zhao, Dongrong Joe Fu, Christian Borgs, Jennifer T. Chayes and Omar M. Yaghi","doi":"10.1039/D5DD00081E","DOIUrl":"https://doi.org/10.1039/D5DD00081E","url":null,"abstract":"<p >Artificial intelligence, represented by large language models (LLMs), has demonstrated tremendous capabilities in natural language recognition and extraction. To further evaluate the performance of various LLMs in extracting information from academic papers, this study explores the application of LLMs in reticular chemistry, focusing on their effectiveness in generating Q&A datasets and extracting synthesis conditions from scientific literature. The models evaluated include OpenAI's GPT-4 Turbo, Anthropic's Claude 3 Opus, and Google's Gemini 1.5 Pro. Key results indicate that Claude excelled in providing complete synthesis data, while Gemini outperformed others in accuracy, characterization-free compliance (obedience), and proactive structuring of responses. Although GPT-4 was less effective in quantitative metrics, it demonstrated strong logical reasoning and contextual inference capabilities. Overall, Gemini and Claude achieved the highest scores in accuracy, groundedness, and adherence to prompt requirements, making them suitable benchmarks for future studies. The findings reveal the potential of LLMs to aid in scientific research, particularly in the efficient construction of structured datasets, which can help train models, predict, and assist in the synthesis of new metal–organic frameworks (MOFs).</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2676-2683"},"PeriodicalIF":6.2,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00081e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federico Ottomano, John Y. Goulermas, Vladimir Gusev, Rahul Savani, Michael W. Gaultois, Troy D. Manning, Hai Lin, Teresa Partida Manzanera, Emmeline G. Poole, Matthew S. Dyer, John B. Claridge, Jon Alaria, Luke M. Daniels, Su Varma, David Rimmer, Kevin Sanderson and Matthew J. Rosseinsky
{"title":"Assessing data-driven predictions of band gap and electrical conductivity for transparent conducting materials†","authors":"Federico Ottomano, John Y. Goulermas, Vladimir Gusev, Rahul Savani, Michael W. Gaultois, Troy D. Manning, Hai Lin, Teresa Partida Manzanera, Emmeline G. Poole, Matthew S. Dyer, John B. Claridge, Jon Alaria, Luke M. Daniels, Su Varma, David Rimmer, Kevin Sanderson and Matthew J. Rosseinsky","doi":"10.1039/D5DD00010F","DOIUrl":"https://doi.org/10.1039/D5DD00010F","url":null,"abstract":"<p >Machine Learning (ML) has offered innovative perspectives for accelerating the discovery of new functional materials, leveraging the increasing availability of material databases. Despite the promising advances, data-driven methods face constraints imposed by the quantity and quality of available data. Moreover, ML is often employed in tandem with simulated datasets originating from density functional theory (DFT), and assessed through in-sample evaluation schemes. This scenario raises questions about the practical utility of ML in uncovering new and significant material classes for industrial applications. Here, we propose a data-driven framework aimed at accelerating the discovery of new <em>transparent conducting materials</em> (TCMs), an important category of semiconductors with a wide range of applications. To mitigate the shortage of available data, we create and validate unique experimental databases, comprising several examples of existing TCMs. We assess state-of-the-art (SOTA) ML models for property prediction from the stoichiometry alone. We propose a bespoke evaluation scheme to provide empirical evidence on the ability of ML to uncover new, previously unseen materials of interest. We test our approach on a list of 55 compositions containing typical elements of known TCMs. Although our study indicates that ML tends to identify new TCMs compositionally similar to those in the training data, we empirically demonstrate that it can highlight material candidates that may have been previously overlooked, offering a systematic approach to identify materials that are likely to display TCMs characteristics.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1794-1811"},"PeriodicalIF":6.2,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00010f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongchen Wang, Kangming Li, Scott Ramsay, Yao Fehlis, Edward Kim and Jason Hattrick-Simpers
{"title":"Evaluating the performance and robustness of LLMs in materials science Q&A and property predictions†","authors":"Hongchen Wang, Kangming Li, Scott Ramsay, Yao Fehlis, Edward Kim and Jason Hattrick-Simpers","doi":"10.1039/D5DD00090D","DOIUrl":"https://doi.org/10.1039/D5DD00090D","url":null,"abstract":"<p >Large Language Models (LLMs) have the potential to revolutionize scientific research, yet their robustness and reliability in domain-specific applications remain insufficiently explored. In this study, we evaluate the performance and robustness of LLMs for materials science, focusing on domain-specific question answering and materials property prediction across diverse real-world and adversarial conditions. Three distinct datasets are used in this study: (1) a set of multiple-choice questions from undergraduate-level materials science courses, (2) a dataset including various steel compositions and yield strengths, and (3) a band gap dataset, containing textual descriptions of material crystal structures and band gap values. The performance of LLMs is assessed using various prompting strategies, including zero-shot chain-of-thought, expert prompting, and few-shot in-context learning. The robustness of these models is tested against various forms of “noise”, ranging from realistic disturbances to intentionally adversarial manipulations, to evaluate their resilience and reliability under real-world conditions. Additionally, the study showcases unique phenomena of LLMs during predictive tasks, such as mode collapse behavior when the proximity of prompt examples is altered and performance recovery from train/test mismatch. The findings aim to provide informed skepticism for the broad use of LLMs in materials science and to inspire advancements that enhance their robustness and reliability for practical applications.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1612-1624"},"PeriodicalIF":6.2,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00090d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viktoriia Baibakova, Kevin Cruse, Michael G. Taylor, Carolin M. Sutter-Fella, Gerbrand Ceder, Anubhav Jain and Samuel M. Blau
{"title":"Precursor reaction pathway leading to BiFeO3 formation: insights from text-mining and chemical reaction network analyses†","authors":"Viktoriia Baibakova, Kevin Cruse, Michael G. Taylor, Carolin M. Sutter-Fella, Gerbrand Ceder, Anubhav Jain and Samuel M. Blau","doi":"10.1039/D5DD00160A","DOIUrl":"https://doi.org/10.1039/D5DD00160A","url":null,"abstract":"<p >BiFeO<small><sub>3</sub></small> (BFO) is a next-generation non-toxic multiferroic material with applications in sensors, memory devices, and spintronics, where its crystallinity and crystal structure directly influence its functional properties. Designing sol–gel syntheses that result in phase-pure BFO remains a challenge due to the complex interactions between metal complexes in the precursor solution. Here, we combine text-mined data and chemical reaction network (CRN) analysis to obtain novel insight into BFO sol–gel precursor chemistry. We perform text-mining analysis of 340 synthesis recipes with the emphasis on phase-pure BFO and identify trends in the use of precursor materials, including that nitrates are the preferred metal salts, 2-methoxyethanol (2 ME) is the dominant solvent, and adding citric acid as a chelating agent frequently leads to phase-pure BFO. Our CRN analysis reveals that the thermodynamically favored reaction mechanism between bismuth nitrate and 2ME interaction involves partial solvation followed by dimerization, contradicting assumptions in previous literature. We suggest that further oligomerization, facilitated by nitrite ion bridging, is critical for achieving the pure BFO phase.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1602-1611"},"PeriodicalIF":6.2,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00160a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Karageorgis, Simone Tomasi, Elliot H. E. Farrar, Maxime Tarrago and Tabassum Malik
{"title":"A digital tool for liquid–liquid extraction process design†","authors":"George Karageorgis, Simone Tomasi, Elliot H. E. Farrar, Maxime Tarrago and Tabassum Malik","doi":"10.1039/D5DD00104H","DOIUrl":"https://doi.org/10.1039/D5DD00104H","url":null,"abstract":"<p >Aqueous liquid–liquid extractions are crucial for purifying compounds and removing impurities in the pharmaceutical industry. However, the extensive solvent space involved in such operations highlights the need for an informed approach in solvent selection. We present a digital tool designed to leverage data-driven experimentation to enhance process efficiency and sustainability, aligning with industry trends towards digitalisation. It allows users to input various parameters, retrieve relevant data, and visualise extraction efficiencies, thereby improving process understanding and reducing process development lead times. By providing interactive visualisations and facilitating rapid hypothesis generation, the tool supports informed decision-making and streamlines workflows. The tool's application is demonstrated through representative complex scenarios involving the separation of multiple compounds present in a mixture at the end of a Buchwald coupling reaction. Overall, this digital tool offers a new practical and data-led approach to chemical process design, with the potential to promote experimental efficiency during development and to improve the environmental sustainability of commercial processes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1763-1771"},"PeriodicalIF":6.2,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00104h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}