Digital discovery最新文献

筛选
英文 中文
Evaluation of foundational machine learned interatomic potentials for migration barrier predictions 迁移势垒预测中基本机器学习原子间势的评价
IF 6.2
Digital discovery Pub Date : 2026-03-30 DOI: 10.1039/D5DD00534E
Achinthya Krishna Bheemaguli, Penghao Xiao and Gopalakrishnan Sai Gautam
{"title":"Evaluation of foundational machine learned interatomic potentials for migration barrier predictions","authors":"Achinthya Krishna Bheemaguli, Penghao Xiao and Gopalakrishnan Sai Gautam","doi":"10.1039/D5DD00534E","DOIUrl":"https://doi.org/10.1039/D5DD00534E","url":null,"abstract":"<p >Fast, and accurate prediction of ionic migration barriers (<em>E</em><small><sub>m</sub></small>) is crucial for designing next-generation battery materials that combine high energy density with facile ion transport. Given the computational costs associated with estimating <em>E</em><small><sub>m</sub></small> using conventional density functional theory (DFT) based nudged elastic band (NEB) calculations, we benchmark the accuracy in <em>E</em><small><sub>m</sub></small> and geometry predictions of five foundational machine learned interatomic potentials (MLIPs), which can potentially accelerate predictions of ionic transport. Specifically, we assess the accuracy of MACE-MP-0, MACE-OMAT-medium, Orb-v3, SevenNet, CHGNet, and M3GNet models, coupled with the NEB framework, against DFT-NEB-calculated <em>E</em><small><sub>m</sub></small> across a diverse set of battery-relevant chemistries and structures. Notably, MACE-MP-0 and Orb-v3 exhibit the lowest mean absolute errors in <em>E</em><small><sub>m</sub></small> predictions across the entire dataset and over data points that are not outliers, respectively. Importantly, Orb-v3, MACE-OMAT-medium, and SevenNet classify ‘good’ <em>versus</em> ‘bad’ ionic conductors with an accuracy of &gt;82%, based on a threshold <em>E</em><small><sub>m</sub></small> of 500 meV, indicating their utility in high-throughput screening approaches. Notably, intermediate images generated by MACE-MP-0 and SevenNet provide better initial guesses relative to conventional interpolation techniques in &gt;71% of structures, offering a practical route to accelerate subsequent DFT-NEB relaxations. Finally, we observe that accurate <em>E</em><small><sub>m</sub></small> predictions by MLIPs are not correlated with accurate (local) geometry predictions. Our work establishes the use-cases, accuracies, and limitations of foundational MLIPs in estimating <em>E</em><small><sub>m</sub></small> and should serve as a base for accelerating the discovery of novel ionic conductors for batteries and beyond.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1809-1819"},"PeriodicalIF":6.2,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00534e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147733013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
POLARIS: perovskite optimization using LLM-assisted refinement and intelligent screening POLARIS:钙钛矿优化使用llm辅助细化和智能筛选
IF 6.2
Digital discovery Pub Date : 2026-03-30 DOI: 10.1039/D5DD00378D
Jordan Marshall, Sheryl L. Sanchez, Rushik Desai, Elham Foadian, Utkarsh Pratiush, Arun Mannodi-Kanakkithodi, Sergei V. Kalinin and Mahshid Ahmadi
{"title":"POLARIS: perovskite optimization using LLM-assisted refinement and intelligent screening","authors":"Jordan Marshall, Sheryl L. Sanchez, Rushik Desai, Elham Foadian, Utkarsh Pratiush, Arun Mannodi-Kanakkithodi, Sergei V. Kalinin and Mahshid Ahmadi","doi":"10.1039/D5DD00378D","DOIUrl":"https://doi.org/10.1039/D5DD00378D","url":null,"abstract":"<p >We present a comprehensive and reproducible pipeline that unites literature mining, molecular graph generation, and uncertainty-aware predictive modeling to accelerate the design of organic spacer cations for two-dimensional (2D) halide perovskites (HPs). Despite the critical influence of spacer chemistry on phase stability, excitonic behavior, transport properties and environmental robustness, the chemical space of HPs remains underexplored due to inconsistent reporting and limited structured datasets. To overcome this, we curated a diverse set of 200 experimental papers from various publishers and research groups into Google's NotebookLM powered by Gemini, utilizing its retrieval-augmented generation (RAG) framework to extract synthesis-relevant metadata with high accuracy and reproducibility. To ensure data quality and consistency, we limited our selection to papers published in peer-reviewed journals with an impact factor above 10, focusing on studies with well-documented experimental protocols. Benchmarking against five other large language models (LLMs) confirmed NotebookLM's superior stability and minimal hallucination rate, making it ideal for hypothesis-driven data curation. From extracted IUPAC names, we constructed SMILES representations and augmented the dataset with over 10 000 ammonium-containing molecules from QM9. These were converted into graph-based molecular embeddings and used to train a multitask graph neural network coupled with a Gaussian process (GNN–GP) backend to predict optoelectronic and structural properties with uncertainty quantification. The latent space clustering of the learned embeddings revealed chemically interpretable families of spacer candidates, which we cross-validated against ChatGPT-generated design heuristics. The convergence between unsupervised clustering and transformer-derived guidance highlights the power of combining LLMs with active learning to generate, test, and refine design hypotheses in underexplored chemical domains. This study demonstrates how fragmented literature can be transformed into actionable, structure–property insights through a tightly integrated informatics pipeline available to a broad experimental community, and demonstrates the value of open repositories that can be mined for information. Our approach lays the foundation for closed-loop, autonomous materials discovery and design and provides a scalable strategy for targeted development of next-generation HP optoelectronics.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1765-1782"},"PeriodicalIF":6.2,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00378d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147733010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and scalable retrosynthetic planning with a transformer neural network and speculative beam search 快速和可扩展的反合成规划与变压器神经网络和推测束搜索
IF 6.2
Digital discovery Pub Date : 2026-03-30 DOI: 10.1039/D5DD00573F
Natalia Andronova, Mikhail Andronov, Jürgen Schmidhuber, Michael Wand and Djork-Arné Clevert
{"title":"Fast and scalable retrosynthetic planning with a transformer neural network and speculative beam search","authors":"Natalia Andronova, Mikhail Andronov, Jürgen Schmidhuber, Michael Wand and Djork-Arné Clevert","doi":"10.1039/D5DD00573F","DOIUrl":"https://doi.org/10.1039/D5DD00573F","url":null,"abstract":"<p >AI-based computer-aided synthesis planning (CASP) systems are in demand as components of AI-driven drug discovery workflows. However, the high latency of such CASP systems limits their utility for high-throughput synthesizability screening in <em>de novo</em> drug design. We propose a transformer-based single-step retrosynthesis model with reduced inference latency based on speculative beam search combined with a scalable drafting strategy called Medusa. Replacing the standard transformer and beam search with our approach accelerates the expansion stage of the planning algorithm, leading to higher solvability in CASP when planning under stringent time limits, and saves hours of computation when synthesis is constrained by the number of iterations. Our method brings AI-based CASP systems closer to meeting the stringent latency requirements of high-throughput synthesizability screening and improving the overall user experience.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1783-1793"},"PeriodicalIF":6.2,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00573f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147733011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NaviDiv: a web app for monitoring chemical diversity in generative molecular design NaviDiv:用于监测生成分子设计中的化学多样性的网络应用程序。
IF 6.2
Digital discovery Pub Date : 2026-03-30 DOI: 10.1039/D5DD00487J
Mohammed Azzouzi, Thanapat Worakul and Clémence Corminboeuf
{"title":"NaviDiv: a web app for monitoring chemical diversity in generative molecular design","authors":"Mohammed Azzouzi, Thanapat Worakul and Clémence Corminboeuf","doi":"10.1039/D5DD00487J","DOIUrl":"10.1039/D5DD00487J","url":null,"abstract":"<p >The rapid progress in generative models for molecular design has led to extensive libraries of candidate molecules for biological and chemical applications. However, ensuring these molecules are diverse and representative of broader chemical space remains challenging, with researchers often over-exploring limited regions or missing promising candidates due to inadequate monitoring tools. This work presents NaviDiv (Navigating Diversity in Chemical Space), a comprehensive web-based framework for managing chemical diversity in the string-based generative molecular design through three integrated capabilities: multi-metric diversity analysis capturing structural, syntactic, and molecular framework variations; interactive real-time visualization enabling immediate detection of model collapse; and adaptive constraint generation that dynamically guides optimization while preserving diversity. Through a singlet fission material discovery case study using REINVENT4, we demonstrate that different diversity metrics (<em>i.e.</em> structural similarity, fragment composition, and sequence patterns) respond differently during optimization, with constraint effectiveness depending critically on representational alignment with the generative model. <em>n</em>-Gram-based constraints outperform fingerprint-based approaches due to direct correspondence with SMILES generation, while combined constraints maintain diversity across all metrics while achieving optimization performance within 15% of unconstrained baselines. The framework is freely available at https://github.com/LCMD-epfl/NaviDiv, providing accessible tools for data-driven decisions about diversity–property trade-offs in automated molecular discovery.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1579-1589"},"PeriodicalIF":6.2,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13041631/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147610375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning potential energy surfaces of hydrogen atom transfer reactions in peptides 学习多肽中氢原子转移反应的势能面。
IF 6.2
Digital discovery Pub Date : 2026-03-30 DOI: 10.1039/D6DD00054A
Marlen Neubert, Patrick Reiser, Frauke Gräter and Pascal Friederich
{"title":"Learning potential energy surfaces of hydrogen atom transfer reactions in peptides","authors":"Marlen Neubert, Patrick Reiser, Frauke Gräter and Pascal Friederich","doi":"10.1039/D6DD00054A","DOIUrl":"10.1039/D6DD00054A","url":null,"abstract":"<p >Hydrogen atom transfer (HAT) reactions are essential in many biological processes, such as radical migration in damaged proteins, but their mechanistic pathways remain incompletely understood. Simulating HAT processes is challenging due to the conflicting requirements of quantum chemical accuracy and biologically relevant time and length scales; thus, neither classical force fields nor DFT-based molecular dynamics simulations are applicable. Machine-learned potentials offer an alternative, with the ability to learn potential energy surfaces (PESs) that capture reactions and transitions with near-quantum accuracy. However, training such models to generalize across diverse HAT configurations—especially at radical positions in proteins—requires tailored data generation strategies and careful model selection. In this work, we systematically generate HAT reaction configurations in peptides to build large datasets using semiempirical methods as well as DFT. We benchmark three atomistic machine-learned potential architectures, SchNet, Allegro, and MACE, on their ability to learn HAT potential energy surfaces and indirectly predict reaction barriers through direct energy predictions. MACE consistently outperforms the other models in energy, force, and reaction barrier prediction accuracy, achieving a mean absolute error of 1.13 kcal mol<small><sup>−1</sup></small> on DFT barrier predictions. Short molecular dynamics simulations indicate that the learned potential is numerically stable at finite temperature and can sustain reactive HAT sampling under moderate biasing, serving as a feasibility check for downstream simulation workflows. We show that the trained MACE potential generalizes well beyond our training data by performing out-of-distribution evaluations and analysis of HAT barriers in collagen I snapshots. This level of accuracy can enable integration of machine-learning-based barrier predictions into large-scale simulation workflows to compute reaction rates from predicted barriers, advancing the mechanistic understanding of HAT and radical migration in peptides. We analyze scaling laws, model transferability, and cost-performance trade-offs, and outline strategies for improvement through the combination of machine-learned potentials with transition state search algorithms and active learning. The presented approach is generalizable to other biomolecular systems, offering a method toward quantum-accurate barrier predictions of chemical reactivity in complex biological environments.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1831-1844"},"PeriodicalIF":6.2,"publicationDate":"2026-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13054793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147640441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A symmetry-preserving and transferable representation for learning the Kohn–Sham density matrix 学习Kohn-Sham密度矩阵的对称保持和可转移表示
IF 6.2
Digital discovery Pub Date : 2026-03-27 DOI: 10.1039/D5DD00230C
Liwei Zhang, Patrizia Mazzeo, Michele Nottoli, Edoardo Cignoni, Lorenzo Cupellini and Benjamin Stamm
{"title":"A symmetry-preserving and transferable representation for learning the Kohn–Sham density matrix","authors":"Liwei Zhang, Patrizia Mazzeo, Michele Nottoli, Edoardo Cignoni, Lorenzo Cupellini and Benjamin Stamm","doi":"10.1039/D5DD00230C","DOIUrl":"https://doi.org/10.1039/D5DD00230C","url":null,"abstract":"<p >The Kohn–Sham (KS) density matrix is one of the most essential properties in KS density functional theory (DFT), from which many other physical properties of interest can be derived. In this work, we present a parameterized representation for learning the mapping from a molecular configuration to its corresponding density matrix using the Atomic Cluster Expansion (ACE) framework, which preserves the physical symmetries of the mapping, including isometric equivariance and Grassmannianity. Trained on several typical molecules, the proposed representation is shown to be systematically improvable with the increase of the model parameters and is transferable to molecules that are not part of and even more complex than those in the training set. The models generated by the proposed approach are illustrated as being able to generate reasonable predictions of the density matrix to either accelerate the DFT calculations or to provide approximations to some properties of the molecules.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1868-1880"},"PeriodicalIF":6.2,"publicationDate":"2026-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00230c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147732962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hacking 3D printers as laboratory robots 将3D打印机作为实验室机器人
IF 6.2
Digital discovery Pub Date : 2026-03-26 DOI: 10.1039/D5DD00451A
Sander Baas, Nessa Carson and Vittorio Saggiomo
{"title":"Hacking 3D printers as laboratory robots","authors":"Sander Baas, Nessa Carson and Vittorio Saggiomo","doi":"10.1039/D5DD00451A","DOIUrl":"https://doi.org/10.1039/D5DD00451A","url":null,"abstract":"<p >The emergence of affordable and reliable 3D printers has enabled laboratories to optimize setups, print custom parts, accelerate research, and rapidly prototype. A new movement has emerged in the past decade, where 3D printers are repurposed as laboratory-specific robots. There are three distinct approaches in the 3D-printer-as-lab-robot approach: modifying the extruder for non-standard material printing, replacing the extruder with a third-party implement, such as a pipette, microscope, or slide holder, or deconstructing the printer completely and using it as a cheap and widely available parts kit for lab-built robots such as syringe pumps. New developments in printer hardware and software control, which enable the use of printers as laboratory robots, are also discussed.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1460-1469"},"PeriodicalIF":6.2,"publicationDate":"2026-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00451a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147733028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-performance training and inference for deep equivariant interatomic potentials 深度等变原子间势的高性能训练与推理
IF 6.2
Digital discovery Pub Date : 2026-03-26 DOI: 10.1039/D5DD00423C
Chuin Wei Tan, Marc L. Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R. Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R. Hu, Tess Smidt, Anders Johansson, William C. Witt, Boris Kozinsky and Albert Musaelian
{"title":"High-performance training and inference for deep equivariant interatomic potentials","authors":"Chuin Wei Tan, Marc L. Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R. Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R. Hu, Tess Smidt, Anders Johansson, William C. Witt, Boris Kozinsky and Albert Musaelian","doi":"10.1039/D5DD00423C","DOIUrl":"https://doi.org/10.1039/D5DD00423C","url":null,"abstract":"<p >Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presents a major overhaul of the NequIP framework focusing on multi-node parallelism, computational performance, and extensibility. The redesigned framework supports distributed training on large datasets and removes barriers preventing full utilization of the PyTorch 2.0 compiler at train time. We demonstrate this acceleration in a case study by training Allegro models on the SPICE 2 dataset of organic molecular systems. For inference, we introduce the first end-to-end infrastructure that uses the PyTorch Ahead-of-Time Inductor compiler for machine learning interatomic potentials. Additionally, we implement a custom kernel for the Allegro model's most expensive operation, the tensor product. Together, these advancements speed up molecular dynamics calculations on system sizes of practical relevance by up to factors of 5 to 18.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1558-1567"},"PeriodicalIF":6.2,"publicationDate":"2026-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00423c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147733025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature ComProScanner:一个基于多代理的框架,用于从科学文献中提取组合属性结构化数据
IF 6.2
Digital discovery Pub Date : 2026-03-25 DOI: 10.1039/D5DD00521C
Aritra Roy, Enrico Grisan, John Buckeridge and Chiara Gattinoni
{"title":"ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature","authors":"Aritra Roy, Enrico Grisan, John Buckeridge and Chiara Gattinoni","doi":"10.1039/D5DD00521C","DOIUrl":"https://doi.org/10.1039/D5DD00521C","url":null,"abstract":"<p >Modern materials discovery using data-driven techniques relies heavily on large and structured databases of material compositions and properties; however, the majority of information regarding experimentally synthesised materials lies buried within millions of scientific articles. Large language models and agents have now made it possible to extract structured knowledge from scientific text, but, despite several approaches designed for this aim, no highly accurate approach focused on composition and property extraction—the bare minimum for data-driven methods—to create machine learning-ready databases without the need for human assistance has been developed. We therefore developed ComProScanner, an autonomous multi-agent platform that facilitates the extraction, validation, classification and visualisation of machine-readable chemical compositions and properties for comprehensive database creation. ComProScanner is a publisher-to-database framework which incorporates publisher APIs bypassing the need to manually upload papers into the framework and it is capable of scanning thousands of papers without human intervention. We evaluated our framework using 100 journal articles against 10 different LLMs, including both open-source and proprietary models, to extract highly complex compositions associated with ceramic piezoelectric materials and corresponding piezoelectric strain coefficients (<em>d</em><small><sub>33</sub></small>), motivated by the lack of a large dataset for such materials. DeepSeek-V3-0324 outperformed all models with a significant overall accuracy of 0.82. Even with this small journal sample, the vast majority of the piezoelectric materials we extracted are not included in commonly available databases and we identified one system with a significantly high piezoelectric coefficient. This framework provides a simple, user-friendly, readily usable package for extracting highly complex experimental data buried in the literature to build machine learning or deep learning datasets.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1794-1808"},"PeriodicalIF":6.2,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00521c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147733012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increasing trustworthiness of machine learning-based drug sensitivity prediction with a multivariate random forest approach 用多元随机森林方法提高基于机器学习的药物敏感性预测的可信度
IF 6.2
Digital discovery Pub Date : 2026-03-25 DOI: 10.1039/D5DD00284B
Lisa-Marie Rolli, Lea Eckhart, Lutz Herrmann, Andrea Volkamer, Hans-Peter Lenhof and Kerstin Lenhof
{"title":"Increasing trustworthiness of machine learning-based drug sensitivity prediction with a multivariate random forest approach","authors":"Lisa-Marie Rolli, Lea Eckhart, Lutz Herrmann, Andrea Volkamer, Hans-Peter Lenhof and Kerstin Lenhof","doi":"10.1039/D5DD00284B","DOIUrl":"https://doi.org/10.1039/D5DD00284B","url":null,"abstract":"<p >Ensuring the trustworthiness of machine learning (ML) models in high-stake applications is crucial. One such application is predicting anti-cancer drug sensitivity, where ML models are built with the final goal of integrating them into treatment recommendation systems for personalized medicine. Here, we propose a trustworthy multivariate random forest method MORGOTH, available in our package ‘morgoth’. Besides standard regression and classification functions, MORGOTH allows for the simultaneous optimization of regression and classification tasks <em>via</em> a joint splitting criterion. Additionally, it provides a graph representation of the random forest to address model interpretability, and a cluster analysis of the leaves to measure the dissimilarity of new inputs from the training data to account for its reliability and robustness. In total, MORGOTH provides a comprehensive approach that unites simultaneous regression and classification, interpretability, reliability, and robustness in a single framework. While our package is broadly applicable, we demonstrate its capabilities for anti-cancer drug sensitivity prediction by a comprehensive large-scale study on the Genomics of Drug Sensitivity in Cancer (GDSC) database. We trained single-drug as well as multi-drug models. In either case, MORGOTH clearly outperforms state-of-the-art neural network approaches. Moreover, we highlight an evaluation issue for multi-drug models and demonstrate that single-drug models consistently outperform them when evaluated fairly.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 1746-1764"},"PeriodicalIF":6.2,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2026/dd/d5dd00284b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147733009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书