Daniel Armstrong, Zlatko Jončev, Jeff Guo ‡ and Philippe Schwaller
{"title":"Tango*: constrained synthesis planning using chemically informed value functions","authors":"Daniel Armstrong, Zlatko Jončev, Jeff Guo ‡ and Philippe Schwaller","doi":"10.1039/D5DD00130G","DOIUrl":"10.1039/D5DD00130G","url":null,"abstract":"<p >Computer-aided synthesis planning (CASP) has made significant strides in generating retrosynthetic pathways for simple molecules in a non-constrained fashion. Recent work has introduced specialized bidirectional search algorithms to find synthesis pathways that incorporate pre-selected starting materials, tackling a specific formulation of the starting material-constrained problem. In this work, we introduce a simple guided search—Tango*-which allows solving the starting material-constrained synthesis planning problem using an existing unidirectional search algorithm, Retro*. We show that by optimising a single hyperparameter, Tango* outperforms existing methods in terms of efficiency and solve rate. We also highlight the effectiveness of our computed node cost function in steering synthesis pathways.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2570-2578"},"PeriodicalIF":6.2,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12355204/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura van Weesep, Rıza Özçelik, Marloes Pennings, Emanuele Criscuolo, Christian Ottmann, Luc Brunsveld and Francesca Grisoni
{"title":"Identifying 14-3-3 interactome binding sites with deep learning","authors":"Laura van Weesep, Rıza Özçelik, Marloes Pennings, Emanuele Criscuolo, Christian Ottmann, Luc Brunsveld and Francesca Grisoni","doi":"10.1039/D5DD00132C","DOIUrl":"10.1039/D5DD00132C","url":null,"abstract":"<p >Protein–protein interactions are at the heart of biological processes. Understanding how proteins interact is key for deciphering their roles in health and disease, and for therapeutic interventions. However, identifying protein interaction sites, especially for intrinsically disordered proteins, is challenging. Here, we developed a deep learning framework to predict potential protein binding sites to 14-3-3 – a ‘central hub’ protein holding a key role in cellular signaling networks. After systematically testing multiple deep learning approaches to predict sequence binding to 14-3-3, we developed an ensemble model that achieved a 75% balanced accuracy on external sequences. Our approach was applied prospectively to identify putative binding sites across medically relevant proteins (ranging from highly structured to intrinsically disordered) for a total of approximately 300 sequences. The top eight predicted peptide sequences were experimentally validated in the wet-lab, and binding to 14-3-3 was confirmed for five out of eight sequences (<em>K</em><small><sub>d</sub></small> ranging from 1.6 ± 0.1 μM to 70 ± 5 μM). The relevance of our results was further confirmed by X-ray crystallography and molecular dynamics simulations. These sequences represent potential new binding sites within the 14-3-3 interactome (<em>e.g.</em>, relating to Alzheimer's disease as the binding to tau is not the new part), and provide opportunities to investigate their functional relevance. Our results highlight the ability of deep learning to capture intricate patterns underlying protein–protein interactions, even for challenging cases like intrinsically disordered proteins. To further the understanding and targeting of 14-3-3/protein interactions, our model was provided as a freely accessible web resource at the following URL: https://14-3-3-bindsite.streamlit.app/.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2602-2614"},"PeriodicalIF":6.2,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360161/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inconsistency of LLMs in molecular representations","authors":"Bing Yan, Angelica Chen and Kyunghyun Cho","doi":"10.1039/D5DD00176E","DOIUrl":"https://doi.org/10.1039/D5DD00176E","url":null,"abstract":"<p >Large language models (LLM) have demonstrated remarkable capabilities in chemistry, yet their ability to capture intrinsic chemistry remains uncertain. Within any familiar, chemically equivalent representation family, rigorous chemical reasoning should be representation-invariant, yielding consistent predictions across these representations. Here, we introduce the first systematic benchmark to evaluate the consistency of LLMs across key chemistry tasks. We curated the benchmark using paired representations of SMILES strings and IUPAC names. We find that the state-of-the-art general LLMs exhibit strikingly low consistency rates (≤1%). Even after finetuning on our dataset, the models still generate inconsistent predictions. To address this, we incorporate a sequence-level symmetric Kullback–Leibler (KL) divergence loss as a consistency regularizer. While this intervention improves surface-level consistency, it fails to enhance accuracy, suggesting that consistency and accuracy are orthogonal properties. These findings indicate that both consistency and accuracy must be considered to properly assess LLMs' capabilities in scientific reasoning.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2876-2892"},"PeriodicalIF":6.2,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00176e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sakib Matin, Alice E. A. Allen, Emily Shinkle, Aleksandra Pachalieva, Galen T. Craven, Benjamin Nebgen, Justin S. Smith, Richard Messerly, Ying Wai Li, Sergei Tretiak, Kipton Barros and Nicholas Lubbers
{"title":"Teacher-student training improves the accuracy and efficiency of machine learning interatomic potentials","authors":"Sakib Matin, Alice E. A. Allen, Emily Shinkle, Aleksandra Pachalieva, Galen T. Craven, Benjamin Nebgen, Justin S. Smith, Richard Messerly, Ying Wai Li, Sergei Tretiak, Kipton Barros and Nicholas Lubbers","doi":"10.1039/D5DD00085H","DOIUrl":"https://doi.org/10.1039/D5DD00085H","url":null,"abstract":"<p >Machine learning interatomic potentials (MLIPs) are revolutionizing the field of molecular dynamics (MD) simulations. Recent MLIPs have tended towards more complex architectures trained on larger datasets. The resulting increase in computational and memory costs may prohibit the application of these MLIPs to perform large-scale MD simulations. Herein, we present a teacher-student training framework in which the latent knowledge from the teacher (atomic energies) is used to augment the students' training. We show that the light-weight student MLIPs have faster MD speeds at a fraction of the memory footprint compared to the teacher models. Remarkably, the student models can even surpass the accuracy of the teachers, even though both are trained on the same quantum chemistry dataset. Our work highlights a practical method for MLIPs to reduce the resources required for large-scale MD simulations.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2502-2511"},"PeriodicalIF":6.2,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00085h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145028026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Vamvakeros, E. Papoutsellis, H. Dong, R. Docherty, A. M. Beale, S. J. Cooper and S. D. M. Jacques
{"title":"nDTomo: a modular Python toolkit for X-ray chemical imaging and tomography","authors":"A. Vamvakeros, E. Papoutsellis, H. Dong, R. Docherty, A. M. Beale, S. J. Cooper and S. D. M. Jacques","doi":"10.1039/D5DD00252D","DOIUrl":"https://doi.org/10.1039/D5DD00252D","url":null,"abstract":"<p > <em>nDTomo</em> is a Python-based software suite for the simulation, reconstruction and analysis of X-ray chemical imaging and computed tomography data. It provides a collection of Python function-based tools designed for accessibility and education as well as a graphical user interface. Prioritising transparency and ease of learning, <em>nDTomo</em> adopts a function-centric design that facilitates straightforward understanding and extension of core workflows, from phantom generation and pencil-beam tomography simulation to sinogram correction, tomographic reconstruction and peak fitting. While many scientific toolkits embrace object-oriented design for modularity and scalability, <em>nDTomo</em> instead emphasises pedagogical clarity, making it especially suitable for students and researchers entering the chemical imaging and tomography field. The suite also includes modern deep learning tools, such as a self-supervised neural network for peak analysis (PeakFitCNN) and a GPU-based direct least squares reconstruction (DLSR) approach for simultaneous tomographic reconstruction and parameter estimation. Rather than aiming to replace established tomography frameworks, <em>nDTomo</em> serves as an open, function-oriented environment for training, prototyping, and research in chemical imaging and tomography.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2579-2592"},"PeriodicalIF":6.2,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00252d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145028031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nis Fisker-Bødker, Daniel Persaud, Yang Bai, Mark Kozdras, Tejs Vegge, Jason Hattrick-Simpers and Jin Hyun Chang
{"title":"AMPERE-2: an open-hardware, robotic platform for automated electrodeposition and electrochemical validation","authors":"Nis Fisker-Bødker, Daniel Persaud, Yang Bai, Mark Kozdras, Tejs Vegge, Jason Hattrick-Simpers and Jin Hyun Chang","doi":"10.1039/D5DD00180C","DOIUrl":"https://doi.org/10.1039/D5DD00180C","url":null,"abstract":"<p >An Opentrons OT-2 liquid-handling robot was used as the framework to develop an automated platform for the electrodeposition and electrochemical testing of multi-element catalysts. Catalytic activity was demonstrated <em>via</em> alkaline water splitting, specifically targeting the oxygen evolution reaction (OER). The setup integrates multiple pumps, a flushing tool, custom deposition and electrochemical testing electrodes, and a potentiostat to enable reproducible and efficient electrodeposition and evaluation. Stock solutions of metal chlorides were combined with two complexing agents, ammonium hydroxide and sodium citrate, to stabilize the deposition process and tune the surface morphology. Analysis by cyclic voltammetry and electron microscopy revealed that the complexing agents significantly influenced deposition rates and surface structures, with the most effective catalysts forming either in the absence of additives or when both agents were applied together. Deposition times of 30–60 seconds yielded the lowest OER overpotentials, indicating an optimal catalyst layer thickness. The platform demonstrates robust reproducibility with uncertainty in overpotential measurements at 16 mV.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2491-2501"},"PeriodicalIF":6.2,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00180c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145028025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of robotic liquid handling as a capacitated vehicle routing problem","authors":"Guangqi Wu, Runzhong Wang and Connor. W. Coley","doi":"10.1039/D5DD00233H","DOIUrl":"10.1039/D5DD00233H","url":null,"abstract":"<p >We present an optimization strategy to reduce the execution time of liquid handling operations in the context of an automated chemical laboratory. By formulating the task as a capacitated vehicle routing problem (CVRP), we leverage heuristic solvers traditionally used in logistics and transportation planning to optimize task execution times. As exemplified using an 8-channel pipette with individually controllable tips, our approach demonstrates robust optimization performance across different labware formats (<em>e.g.</em>, well-plates, vial holders), achieving up to a 37% reduction in execution time for randomly generated tasks compared to the baseline sorting method. We further apply the method to a real-world high-throughput materials discovery campaign and observe that 3 minutes of optimization time led to a reduction of 61 minutes in execution time compared to the best-performing sorting-based strategy. Our results highlight the potential for substantial improvements in throughput and efficiency in automated laboratories without any hardware modifications. This optimization strategy offers a practical and scalable solution to accelerate combinatorial experimentation in areas such as drug combination screening, reaction condition optimization, materials development, and formulation engineering.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2593-2601"},"PeriodicalIF":6.2,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Molecular representation learning: cross-domain foundations and future Frontiers","authors":"Rahul Sheshanarayana and Fengqi You","doi":"10.1039/D5DD00170F","DOIUrl":"https://doi.org/10.1039/D5DD00170F","url":null,"abstract":"<p >Molecular representation learning has catalyzed a paradigm shift in computational chemistry and materials science—from reliance on manually engineered descriptors to the automated extraction of features using deep learning. This transition enables data-driven predictions of molecular properties, inverse design of compounds, and accelerated discovery of chemical and crystalline materials—including organic molecules, inorganic solids, and catalytic systems. This review provides a comprehensive and comparative evaluation of deep learning-based molecular representations, focusing on graph neural networks, autoencoders, diffusion models, generative adversarial networks, transformer architectures, and hybrid self-supervised learning (SSL) frameworks. Special attention is given to underexplored areas such as 3D-aware representations, physics-informed neural potentials, and cross-modal fusion strategies that integrate graphs, sequences, and quantum descriptors. While previous reviews have largely centered on GNNs and generative models, our synthesis addresses key gaps in the literature—particularly the limited exploration of geometric learning, chemically informed SSL, and multi-modal representation integration. We critically assess persistent challenges, including data scarcity, representational inconsistency, interpretability, and the high computational costs of existing methods. Emerging strategies such as contrastive learning, multi-modal adaptive fusion, and differentiable simulation pipelines are discussed in depth, revealing promising directions for improving generalization and real-world applicability. Notably, we highlight how equivariant models and learned potential energy surfaces offer physically consistent, geometry-aware embeddings that extend beyond static graphs. By integrating insights across domains, this review equips cheminformatics and materials science communities with a forward-looking synthesis of methodological innovations. Ultimately, advances in pretraining, hybrid representations, and differentiable modeling are poised to accelerate progress in drug discovery, materials design, and sustainable chemistry.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2298-2335"},"PeriodicalIF":6.2,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00170f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145028072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ask Hjorth Larsen, Mikael J. Kuisma, Tara M. Boland, Fredrik A. Nilsson and Kristian S. Thygesen
{"title":"Taskblaster: a generic framework for automated computational workflows","authors":"Ask Hjorth Larsen, Mikael J. Kuisma, Tara M. Boland, Fredrik A. Nilsson and Kristian S. Thygesen","doi":"10.1039/D5DD00097A","DOIUrl":"10.1039/D5DD00097A","url":null,"abstract":"<p >We introduce Taskblaster, a generic and lightweight Python framework for composing, executing, and managing computational workflows with automated error handling. Taskblaster supports dynamic workflows including flow control using branches and iteration, making the system Turing complete. Taskblaster aims to promote modular designs, where workflows are composed of reusable sub-workflows, and to simplify data maintenance as projects evolve and change. We discuss the main design elements including workflow syntax, a storage model based on intuitively named tasks in a nested directory tree, and command-line tools to automate and control the execution of the tasks. Tasks are executed by worker processes that may run directly in a terminal or be submitted using a queueing system, allowing for task-specific resource control. We provide a library (ASR-lib) of workflows for common materials simulations employing the Atomic Simulation Environment and the GPAW electronic structure code, but Taskblaster can equally well be used with other computational codes.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2512-2520"},"PeriodicalIF":6.2,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decoding non-linearity and complexity: deep tabular learning approaches for materials science","authors":"Vahid Attari and Raymundo Arroyave","doi":"10.1039/D5DD00166H","DOIUrl":"https://doi.org/10.1039/D5DD00166H","url":null,"abstract":"<p >Materials datasets, particularly those capturing high-temperature properties pose significant challenges for learning tasks due to their skewed distributions, wide feature ranges, and multimodal behaviors. While tree-based models like XGBoost are inherently non-linear and often perform well on many tabular problems, their reliance on piecewise constant splits can limit effectiveness when modeling smooth, long-tailed, or higher-order relationships prevalent in advanced materials data. To address these challenges, we investigate the effectiveness of encoder–decoder model for data transformation using regularized Fully Dense Networks (FDN-R), Disjunctive Normal Form Networks (DNF-Net), 1D Convolutional Neural Networks (CNNs), and Variational Autoencoders, along with TabNet, a hybrid attention-based model, to address these challenges. Our results indicate that while XGBoost remains competitive on simpler tasks, encoder–decoder models, particularly those based on regularized FDN-R and DNF-Net, demonstrate better generalization on highly skewed targets like creep resistance, across small, medium, and large datasets. TabNet's attention mechanism offers moderate gains but underperforms on extreme values. These findings emphasize the importance of aligning model architecture with feature complexity and demonstrate the promise of hybrid encoder–decoder models for robust and generalizable materials prediction from composition data.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2765-2780"},"PeriodicalIF":6.2,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00166h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}