A reproducibility study of atomistic line graph neural networks for materials property prediction†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Kangming Li, Brian DeCost, Kamal Choudhary and Jason Hattrick-Simpers
{"title":"A reproducibility study of atomistic line graph neural networks for materials property prediction†","authors":"Kangming Li, Brian DeCost, Kamal Choudhary and Jason Hattrick-Simpers","doi":"10.1039/D4DD00064A","DOIUrl":null,"url":null,"abstract":"<p >Use of machine learning has been increasingly popular in materials science as data-driven materials discovery is becoming the new paradigm. Reproducibility of findings is paramount for promoting transparency and accountability in research and building trust in the scientific community. Here we conduct a reproducibility analysis of the work by K. Choudhary and B. Brian [<em>npj Comput. Mater.</em>, <strong>7</strong>, 2021, 185], in which a new graph neural network architecture was developed with improved performance on multiple atomistic prediction tasks. We examine the reproducibility for the model performance on 29 regression tasks and for an ablation analysis of the graph neural network layers. We find that the reproduced results generally exhibit a good quantitative agreement with the initial study, despite minor disparities in model performance and training efficiency that may be resulting from factors such as hardware difference and stochasticity involved in model training and data splits. The ease of conducting these reproducibility experiments confirms the great benefits of open data and code practices to which the initial work adhered. We also discuss some further enhancements in reproducible practices such as code and data archiving and providing data identifiers used in dataset splits.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00064a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00064a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Use of machine learning has been increasingly popular in materials science as data-driven materials discovery is becoming the new paradigm. Reproducibility of findings is paramount for promoting transparency and accountability in research and building trust in the scientific community. Here we conduct a reproducibility analysis of the work by K. Choudhary and B. Brian [npj Comput. Mater., 7, 2021, 185], in which a new graph neural network architecture was developed with improved performance on multiple atomistic prediction tasks. We examine the reproducibility for the model performance on 29 regression tasks and for an ablation analysis of the graph neural network layers. We find that the reproduced results generally exhibit a good quantitative agreement with the initial study, despite minor disparities in model performance and training efficiency that may be resulting from factors such as hardware difference and stochasticity involved in model training and data splits. The ease of conducting these reproducibility experiments confirms the great benefits of open data and code practices to which the initial work adhered. We also discuss some further enhancements in reproducible practices such as code and data archiving and providing data identifiers used in dataset splits.

Abstract Image

Abstract Image

用于材料性能预测的原子线图神经网络重现性研究
随着数据驱动的材料发现正在成为新的范式,机器学习的使用在材料科学领域日益流行。研究结果的可重复性对于促进研究的透明度和问责制以及建立科学界的信任至关重要。在此,我们对 K. Choudhary 和 B. Brian [npj Comput. Mater., 7, 2021, 185]的研究成果进行了可重复性分析,该研究开发了一种新的图神经网络架构,提高了多种原子预测任务的性能。我们研究了 29 项回归任务中模型性能的再现性,以及图神经网络层的消融分析。我们发现,尽管在模型性能和训练效率方面可能会因硬件差异、模型训练中的随机性以及数据分割等因素而存在细微差别,但重现的结果总体上与最初的研究在数量上表现出良好的一致性。这些可重复性实验的轻松进行证实了最初工作所坚持的开放数据和代码实践的巨大好处。我们还讨论了可重复性实践中的一些进一步改进,如代码和数据归档以及提供数据集拆分中使用的数据标识符。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信