Prediction rigidities for data-driven chemistry†

IF 3.1 3区化学 Q2 Chemistry

Faraday Discussions Pub Date : 2024-08-23 DOI:10.1039/D4FD00101J

Sanggyu Chong, Filippo Bigi, Federico Grasselli, Philip Loche, Matthias Kellner and Michele Ceriotti

{"title":"Prediction rigidities for data-driven chemistry†","authors":"Sanggyu Chong, Filippo Bigi, Federico Grasselli, Philip Loche, Matthias Kellner and Michele Ceriotti","doi":"10.1039/D4FD00101J","DOIUrl":null,"url":null,"abstract":"<p >The widespread application of machine learning (ML) to the chemical sciences is making it very important to understand how the ML models learn to correlate chemical structures with their properties, and what can be done to improve the training efficiency whilst guaranteeing interpretability and transferability. In this work, we demonstrate the wide utility of prediction rigidities, a family of metrics derived from the loss function, in understanding the robustness of ML model predictions. We show that the prediction rigidities allow the assessment of the model not only at the global level, but also on the local or the component-wise level at which the intermediate (<em>e.g.</em> atomic, body-ordered, or range-separated) predictions are made. We leverage these metrics to understand the learning behavior of different ML models, and to guide efficient dataset construction for model training. We finally implement the formalism for a ML model targeting a coarse-grained system to demonstrate the applicability of the prediction rigidities to an even broader class of atomistic modeling problems.</p>","PeriodicalId":49075,"journal":{"name":"Faraday Discussions","volume":"256 ","pages":" 322-344"},"PeriodicalIF":3.1000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/fd/d4fd00101j?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Faraday Discussions","FirstCategoryId":"92","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/fd/d4fd00101j","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Chemistry","Score":null,"Total":0}

引用次数: 0

Abstract

The widespread application of machine learning (ML) to the chemical sciences is making it very important to understand how the ML models learn to correlate chemical structures with their properties, and what can be done to improve the training efficiency whilst guaranteeing interpretability and transferability. In this work, we demonstrate the wide utility of prediction rigidities, a family of metrics derived from the loss function, in understanding the robustness of ML model predictions. We show that the prediction rigidities allow the assessment of the model not only at the global level, but also on the local or the component-wise level at which the intermediate (e.g. atomic, body-ordered, or range-separated) predictions are made. We leverage these metrics to understand the learning behavior of different ML models, and to guide efficient dataset construction for model training. We finally implement the formalism for a ML model targeting a coarse-grained system to demonstrate the applicability of the prediction rigidities to an even broader class of atomistic modeling problems.

Abstract Image

查看原文本刊更多论文

数据驱动化学的预测刚性

机器学习（ML）在化学科学领域的广泛应用，使得了解 ML 模型如何学习将化学结构与其性质联系起来，以及如何在保证可解释性和可转移性的同时提高训练效率变得非常重要。在这项工作中，我们展示了预测刚性的广泛实用性，它是由损失函数衍生出的一系列指标，有助于理解 ML 模型预测的鲁棒性。我们表明，预测刚度不仅可以在全局层面对模型进行评估，还可以在局部或组件层面对模型进行评估，而中间预测（如原子、体有序或范围分离）就是在局部或组件层面进行的。我们利用这些指标来了解不同 ML 模型的学习行为，并指导模型训练的高效数据集构建。最后，我们针对粗粒度系统实现了 ML 模型的形式主义，以证明预测刚性适用于更广泛的原子建模问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊