数据驱动的自动数据叠加方法及其在软物质科学中的应用

IF 2.4 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Kyle R. Lennon, G. McKinley, J. Swan
{"title":"数据驱动的自动数据叠加方法及其在软物质科学中的应用","authors":"Kyle R. Lennon, G. McKinley, J. Swan","doi":"10.1017/dce.2023.3","DOIUrl":null,"url":null,"abstract":"Abstract The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.","PeriodicalId":34169,"journal":{"name":"DataCentric Engineering","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A data-driven method for automated data superposition with applications in soft matter science\",\"authors\":\"Kyle R. Lennon, G. McKinley, J. Swan\",\"doi\":\"10.1017/dce.2023.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.\",\"PeriodicalId\":34169,\"journal\":{\"name\":\"DataCentric Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2022-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"DataCentric Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1017/dce.2023.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"DataCentric Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/dce.2023.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 3

摘要

具有内部参数自相似性的数据集的叠加是物理科学中许多类型的实验数据分析的长期和广泛的技术。通常,这种叠加是手动执行的,或者最近通过应用几种自动化算法中的一种来执行。然而,这些方法在本质上往往是启发式的,容易因手动数据转移或参数化而导致用户偏差,并且缺乏处理数据和叠加数据的结果模型中的不确定性的原生框架。在这项工作中,我们开发了一种数据驱动的非参数方法,用于将实验数据与任意坐标变换叠加在一起,该方法使用高斯过程回归来学习描述数据的统计模型,然后使用最大后验估计来最佳地叠加数据集。该统计框架对实验噪声具有较强的鲁棒性,并对学习到的坐标变换自动产生不确定性估计。此外,它与黑箱机器学习的区别在于它的可解释性——具体来说,它产生的模型本身可以被询问,以深入了解所研究的系统。我们通过将其应用于表征软材料力学的四个代表性数据集来展示我们方法的这些显著特征。在每种情况下,我们的方法都重复了使用其他方法获得的结果,但减少了偏差并增加了不确定性估计。这种方法可以对许多领域的自相似数据进行标准化的统计处理,产生可解释的数据驱动模型,可以为材料分类、设计和发现等应用提供信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A data-driven method for automated data superposition with applications in soft matter science
Abstract The superposition of data sets with internal parametric self-similarity is a longstanding and widespread technique for the analysis of many types of experimental data across the physical sciences. Typically, this superposition is performed manually, or recently through the application of one of a few automated algorithms. However, these methods are often heuristic in nature, are prone to user bias via manual data shifting or parameterization, and lack a native framework for handling uncertainty in both the data and the resulting model of the superposed data. In this work, we develop a data-driven, nonparametric method for superposing experimental data with arbitrary coordinate transformations, which employs Gaussian process regression to learn statistical models that describe the data, and then uses maximum a posteriori estimation to optimally superpose the data sets. This statistical framework is robust to experimental noise and automatically produces uncertainty estimates for the learned coordinate transformations. Moreover, it is distinguished from black-box machine learning in its interpretability—specifically, it produces a model that may itself be interrogated to gain insight into the system under study. We demonstrate these salient features of our method through its application to four representative data sets characterizing the mechanics of soft materials. In every case, our method replicates results obtained using other approaches, but with reduced bias and the addition of uncertainty estimates. This method enables a standardized, statistical treatment of self-similar data across many fields, producing interpretable data-driven models that may inform applications such as materials classification, design, and discovery.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
DataCentric Engineering
DataCentric Engineering Engineering-General Engineering
CiteScore
5.60
自引率
0.00%
发文量
26
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信