Digital discovery最新文献_第4页

Tango*: constrained synthesis planning using chemically informed value functions Tango*：使用化学信息价值函数的约束综合规划。

IF 6.2

Digital discovery Pub Date : 2025-08-11 DOI: 10.1039/D5DD00130G

Daniel Armstrong, Zlatko Jončev, Jeff Guo ‡ and Philippe Schwaller

引用次数: 0

Identifying 14-3-3 interactome binding sites with deep learning 利用深度学习识别14-3-3相互作用蛋白结合位点。

IF 6.2

Digital discovery Pub Date : 2025-08-08 DOI: 10.1039/D5DD00132C

Laura van Weesep, Rıza Özçelik, Marloes Pennings, Emanuele Criscuolo, Christian Ottmann, Luc Brunsveld and Francesca Grisoni

{"title":"Identifying 14-3-3 interactome binding sites with deep learning","authors":"Laura van Weesep, Rıza Özçelik, Marloes Pennings, Emanuele Criscuolo, Christian Ottmann, Luc Brunsveld and Francesca Grisoni","doi":"10.1039/D5DD00132C","DOIUrl":"10.1039/D5DD00132C","url":null,"abstract":"Protein–protein interactions are at the heart of biological processes. Understanding how proteins interact is key for deciphering their roles in health and disease, and for therapeutic interventions. However, identifying protein interaction sites, especially for intrinsically disordered proteins, is challenging. Here, we developed a deep learning framework to predict potential protein binding sites to 14-3-3 – a ‘central hub’ protein holding a key role in cellular signaling networks. After systematically testing multiple deep learning approaches to predict sequence binding to 14-3-3, we developed an ensemble model that achieved a 75% balanced accuracy on external sequences. Our approach was applied prospectively to identify putative binding sites across medically relevant proteins (ranging from highly structured to intrinsically disordered) for a total of approximately 300 sequences. The top eight predicted peptide sequences were experimentally validated in the wet-lab, and binding to 14-3-3 was confirmed for five out of eight sequences (Kd ranging from 1.6 ± 0.1 μM to 70 ± 5 μM). The relevance of our results was further confirmed by X-ray crystallography and molecular dynamics simulations. These sequences represent potential new binding sites within the 14-3-3 interactome (e.g., relating to Alzheimer's disease as the binding to tau is not the new part), and provide opportunities to investigate their functional relevance. Our results highlight the ability of deep learning to capture intricate patterns underlying protein–protein interactions, even for challenging cases like intrinsically disordered proteins. To further the understanding and targeting of 14-3-3/protein interactions, our model was provided as a freely accessible web resource at the following URL: https://14-3-3-bindsite.streamlit.app/.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2602-2614"},"PeriodicalIF":6.2,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360161/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inconsistency of LLMs in molecular representations 法学硕士在分子表征上的不一致性

IF 6.2

Digital discovery Pub Date : 2025-08-08 DOI: 10.1039/D5DD00176E

Bing Yan, Angelica Chen and Kyunghyun Cho

{"title":"Inconsistency of LLMs in molecular representations","authors":"Bing Yan, Angelica Chen and Kyunghyun Cho","doi":"10.1039/D5DD00176E","DOIUrl":"https://doi.org/10.1039/D5DD00176E","url":null,"abstract":"Large language models (LLM) have demonstrated remarkable capabilities in chemistry, yet their ability to capture intrinsic chemistry remains uncertain. Within any familiar, chemically equivalent representation family, rigorous chemical reasoning should be representation-invariant, yielding consistent predictions across these representations. Here, we introduce the first systematic benchmark to evaluate the consistency of LLMs across key chemistry tasks. We curated the benchmark using paired representations of SMILES strings and IUPAC names. We find that the state-of-the-art general LLMs exhibit strikingly low consistency rates (≤1%). Even after finetuning on our dataset, the models still generate inconsistent predictions. To address this, we incorporate a sequence-level symmetric Kullback–Leibler (KL) divergence loss as a consistency regularizer. While this intervention improves surface-level consistency, it fails to enhance accuracy, suggesting that consistency and accuracy are orthogonal properties. These findings indicate that both consistency and accuracy must be considered to properly assess LLMs' capabilities in scientific reasoning.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2876-2892"},"PeriodicalIF":6.2,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00176e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Teacher-student training improves the accuracy and efficiency of machine learning interatomic potentials 师生培训提高了机器学习原子间势的准确性和效率

IF 6.2

Digital discovery Pub Date : 2025-08-07 DOI: 10.1039/D5DD00085H

Sakib Matin, Alice E. A. Allen, Emily Shinkle, Aleksandra Pachalieva, Galen T. Craven, Benjamin Nebgen, Justin S. Smith, Richard Messerly, Ying Wai Li, Sergei Tretiak, Kipton Barros and Nicholas Lubbers

引用次数: 0

nDTomo: a modular Python toolkit for X-ray chemical imaging and tomography nDTomo：用于x射线化学成像和断层扫描的模块化Python工具包

IF 6.2

Digital discovery Pub Date : 2025-08-07 DOI: 10.1039/D5DD00252D

A. Vamvakeros, E. Papoutsellis, H. Dong, R. Docherty, A. M. Beale, S. J. Cooper and S. D. M. Jacques

{"title":"nDTomo: a modular Python toolkit for X-ray chemical imaging and tomography","authors":"A. Vamvakeros, E. Papoutsellis, H. Dong, R. Docherty, A. M. Beale, S. J. Cooper and S. D. M. Jacques","doi":"10.1039/D5DD00252D","DOIUrl":"https://doi.org/10.1039/D5DD00252D","url":null,"abstract":" nDTomo is a Python-based software suite for the simulation, reconstruction and analysis of X-ray chemical imaging and computed tomography data. It provides a collection of Python function-based tools designed for accessibility and education as well as a graphical user interface. Prioritising transparency and ease of learning, nDTomo adopts a function-centric design that facilitates straightforward understanding and extension of core workflows, from phantom generation and pencil-beam tomography simulation to sinogram correction, tomographic reconstruction and peak fitting. While many scientific toolkits embrace object-oriented design for modularity and scalability, nDTomo instead emphasises pedagogical clarity, making it especially suitable for students and researchers entering the chemical imaging and tomography field. The suite also includes modern deep learning tools, such as a self-supervised neural network for peak analysis (PeakFitCNN) and a GPU-based direct least squares reconstruction (DLSR) approach for simultaneous tomographic reconstruction and parameter estimation. Rather than aiming to replace established tomography frameworks, nDTomo serves as an open, function-oriented environment for training, prototyping, and research in chemical imaging and tomography.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2579-2592"},"PeriodicalIF":6.2,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00252d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145028031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AMPERE-2: an open-hardware, robotic platform for automated electrodeposition and electrochemical validation 安培-2：开放式硬件，机器人平台，用于自动电沉积和电化学验证

IF 6.2

Digital discovery Pub Date : 2025-08-06 DOI: 10.1039/D5DD00180C

Nis Fisker-Bødker, Daniel Persaud, Yang Bai, Mark Kozdras, Tejs Vegge, Jason Hattrick-Simpers and Jin Hyun Chang

{"title":"AMPERE-2: an open-hardware, robotic platform for automated electrodeposition and electrochemical validation","authors":"Nis Fisker-Bødker, Daniel Persaud, Yang Bai, Mark Kozdras, Tejs Vegge, Jason Hattrick-Simpers and Jin Hyun Chang","doi":"10.1039/D5DD00180C","DOIUrl":"https://doi.org/10.1039/D5DD00180C","url":null,"abstract":"An Opentrons OT-2 liquid-handling robot was used as the framework to develop an automated platform for the electrodeposition and electrochemical testing of multi-element catalysts. Catalytic activity was demonstrated via alkaline water splitting, specifically targeting the oxygen evolution reaction (OER). The setup integrates multiple pumps, a flushing tool, custom deposition and electrochemical testing electrodes, and a potentiostat to enable reproducible and efficient electrodeposition and evaluation. Stock solutions of metal chlorides were combined with two complexing agents, ammonium hydroxide and sodium citrate, to stabilize the deposition process and tune the surface morphology. Analysis by cyclic voltammetry and electron microscopy revealed that the complexing agents significantly influenced deposition rates and surface structures, with the most effective catalysts forming either in the absence of additives or when both agents were applied together. Deposition times of 30–60 seconds yielded the lowest OER overpotentials, indicating an optimal catalyst layer thickness. The platform demonstrates robust reproducibility with uncertainty in overpotential measurements at 16 mV.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2491-2501"},"PeriodicalIF":6.2,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00180c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145028025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimization of robotic liquid handling as a capacitated vehicle routing problem 机器人液体搬运优化作为一个有能力车辆路径问题。

IF 6.2

Digital discovery Pub Date : 2025-08-04 DOI: 10.1039/D5DD00233H

Guangqi Wu, Runzhong Wang and Connor. W. Coley

{"title":"Optimization of robotic liquid handling as a capacitated vehicle routing problem","authors":"Guangqi Wu, Runzhong Wang and Connor. W. Coley","doi":"10.1039/D5DD00233H","DOIUrl":"10.1039/D5DD00233H","url":null,"abstract":"We present an optimization strategy to reduce the execution time of liquid handling operations in the context of an automated chemical laboratory. By formulating the task as a capacitated vehicle routing problem (CVRP), we leverage heuristic solvers traditionally used in logistics and transportation planning to optimize task execution times. As exemplified using an 8-channel pipette with individually controllable tips, our approach demonstrates robust optimization performance across different labware formats (e.g., well-plates, vial holders), achieving up to a 37% reduction in execution time for randomly generated tasks compared to the baseline sorting method. We further apply the method to a real-world high-throughput materials discovery campaign and observe that 3 minutes of optimization time led to a reduction of 61 minutes in execution time compared to the best-performing sorting-based strategy. Our results highlight the potential for substantial improvements in throughput and efficiency in automated laboratories without any hardware modifications. This optimization strategy offers a practical and scalable solution to accelerate combinatorial experimentation in areas such as drug combination screening, reaction condition optimization, materials development, and formulation engineering.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2593-2601"},"PeriodicalIF":6.2,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Molecular representation learning: cross-domain foundations and future Frontiers 分子表征学习：跨域基础和未来前沿

IF 6.2

Digital discovery Pub Date : 2025-08-01 DOI: 10.1039/D5DD00170F

Rahul Sheshanarayana and Fengqi You

{"title":"Molecular representation learning: cross-domain foundations and future Frontiers","authors":"Rahul Sheshanarayana and Fengqi You","doi":"10.1039/D5DD00170F","DOIUrl":"https://doi.org/10.1039/D5DD00170F","url":null,"abstract":"Molecular representation learning has catalyzed a paradigm shift in computational chemistry and materials science—from reliance on manually engineered descriptors to the automated extraction of features using deep learning. This transition enables data-driven predictions of molecular properties, inverse design of compounds, and accelerated discovery of chemical and crystalline materials—including organic molecules, inorganic solids, and catalytic systems. This review provides a comprehensive and comparative evaluation of deep learning-based molecular representations, focusing on graph neural networks, autoencoders, diffusion models, generative adversarial networks, transformer architectures, and hybrid self-supervised learning (SSL) frameworks. Special attention is given to underexplored areas such as 3D-aware representations, physics-informed neural potentials, and cross-modal fusion strategies that integrate graphs, sequences, and quantum descriptors. While previous reviews have largely centered on GNNs and generative models, our synthesis addresses key gaps in the literature—particularly the limited exploration of geometric learning, chemically informed SSL, and multi-modal representation integration. We critically assess persistent challenges, including data scarcity, representational inconsistency, interpretability, and the high computational costs of existing methods. Emerging strategies such as contrastive learning, multi-modal adaptive fusion, and differentiable simulation pipelines are discussed in depth, revealing promising directions for improving generalization and real-world applicability. Notably, we highlight how equivariant models and learned potential energy surfaces offer physically consistent, geometry-aware embeddings that extend beyond static graphs. By integrating insights across domains, this review equips cheminformatics and materials science communities with a forward-looking synthesis of methodological innovations. Ultimately, advances in pretraining, hybrid representations, and differentiable modeling are poised to accelerate progress in drug discovery, materials design, and sustainable chemistry.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2298-2335"},"PeriodicalIF":6.2,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00170f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145028072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Taskblaster: a generic framework for automated computational workflows Taskblaster：用于自动计算工作流的通用框架。

IF 6.2

Digital discovery Pub Date : 2025-08-01 DOI: 10.1039/D5DD00097A

Ask Hjorth Larsen, Mikael J. Kuisma, Tara M. Boland, Fredrik A. Nilsson and Kristian S. Thygesen

引用次数: 0

Decoding non-linearity and complexity: deep tabular learning approaches for materials science 解码非线性和复杂性：材料科学的深度表格学习方法

IF 6.2

Digital discovery Pub Date : 2025-08-01 DOI: 10.1039/D5DD00166H

Vahid Attari and Raymundo Arroyave

{"title":"Decoding non-linearity and complexity: deep tabular learning approaches for materials science","authors":"Vahid Attari and Raymundo Arroyave","doi":"10.1039/D5DD00166H","DOIUrl":"https://doi.org/10.1039/D5DD00166H","url":null,"abstract":"Materials datasets, particularly those capturing high-temperature properties pose significant challenges for learning tasks due to their skewed distributions, wide feature ranges, and multimodal behaviors. While tree-based models like XGBoost are inherently non-linear and often perform well on many tabular problems, their reliance on piecewise constant splits can limit effectiveness when modeling smooth, long-tailed, or higher-order relationships prevalent in advanced materials data. To address these challenges, we investigate the effectiveness of encoder–decoder model for data transformation using regularized Fully Dense Networks (FDN-R), Disjunctive Normal Form Networks (DNF-Net), 1D Convolutional Neural Networks (CNNs), and Variational Autoencoders, along with TabNet, a hybrid attention-based model, to address these challenges. Our results indicate that while XGBoost remains competitive on simpler tasks, encoder–decoder models, particularly those based on regularized FDN-R and DNF-Net, demonstrate better generalization on highly skewed targets like creep resistance, across small, medium, and large datasets. TabNet's attention mechanism offers moderate gains but underperforms on extreme values. These findings emphasize the importance of aligning model architecture with feature complexity and demonstrate the promise of hybrid encoder–decoder models for robust and generalizable materials prediction from composition data.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2765-2780"},"PeriodicalIF":6.2,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00166h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0