AlloyBERT:利用大型语言模型进行合金属性预测

IF 3.1 3区 材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
Akshat Chaudhari, Chakradhar Guntuboina, Hongshuo Huang, Amir Barati Farimani
{"title":"AlloyBERT:利用大型语言模型进行合金属性预测","authors":"Akshat Chaudhari, Chakradhar Guntuboina, Hongshuo Huang, Amir Barati Farimani","doi":"10.1016/j.commatsci.2024.113256","DOIUrl":null,"url":null,"abstract":"The pursuit of novel alloys tailored to specific requirements poses significant challenges for researchers in the field. This underscores the importance of developing predictive techniques for essential physical properties of alloys based on their chemical composition and processing parameters. This study introduces AlloyBERT, a transformer encoder-based model designed to predict properties such as elastic modulus and yield strength of alloys using textual inputs. Leveraging the pre-trained RoBERTa and BERT encoder model as its foundation, AlloyBERT employs self-attention mechanisms to establish meaningful relationships between words, enabling it to interpret human-readable input and predict target alloy properties. By combining a tokenizer trained on our textual data and a RoBERTa encoder pre-trained and fine-tuned for this specific task, we achieved a mean squared error (MSE) of 0.00015 on the Multi Principal Elemental Alloys (MPEA) data set and 0.00527 on the Refractory Alloy Yield Strength (RAYS) dataset using BERT encoder. This surpasses the performance of shallow models, which achieved a best-case MSE of 0.02376 and 0.01459 on the MPEA and RAYS datasets respectively. Our results highlight the potential of language models in material science and establish a foundational framework for text-based prediction of alloy properties that does not rely on complex underlying representations, calculations, or simulations.","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AlloyBERT: Alloy property prediction with large language models\",\"authors\":\"Akshat Chaudhari, Chakradhar Guntuboina, Hongshuo Huang, Amir Barati Farimani\",\"doi\":\"10.1016/j.commatsci.2024.113256\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The pursuit of novel alloys tailored to specific requirements poses significant challenges for researchers in the field. This underscores the importance of developing predictive techniques for essential physical properties of alloys based on their chemical composition and processing parameters. This study introduces AlloyBERT, a transformer encoder-based model designed to predict properties such as elastic modulus and yield strength of alloys using textual inputs. Leveraging the pre-trained RoBERTa and BERT encoder model as its foundation, AlloyBERT employs self-attention mechanisms to establish meaningful relationships between words, enabling it to interpret human-readable input and predict target alloy properties. By combining a tokenizer trained on our textual data and a RoBERTa encoder pre-trained and fine-tuned for this specific task, we achieved a mean squared error (MSE) of 0.00015 on the Multi Principal Elemental Alloys (MPEA) data set and 0.00527 on the Refractory Alloy Yield Strength (RAYS) dataset using BERT encoder. This surpasses the performance of shallow models, which achieved a best-case MSE of 0.02376 and 0.01459 on the MPEA and RAYS datasets respectively. Our results highlight the potential of language models in material science and establish a foundational framework for text-based prediction of alloy properties that does not rely on complex underlying representations, calculations, or simulations.\",\"PeriodicalId\":10650,\"journal\":{\"name\":\"Computational Materials Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Materials Science\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://doi.org/10.1016/j.commatsci.2024.113256\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1016/j.commatsci.2024.113256","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

追求符合特定要求的新型合金给该领域的研究人员带来了巨大挑战。这凸显了根据合金的化学成分和加工参数开发合金基本物理性质预测技术的重要性。本研究介绍了 AlloyBERT,这是一种基于变压器编码器的模型,旨在利用文本输入预测合金的弹性模量和屈服强度等属性。AlloyBERT 以预先训练好的 RoBERTa 和 BERT 编码器模型为基础,采用自我注意机制在单词之间建立有意义的关系,使其能够解释人类可读的输入并预测目标合金属性。通过将在文本数据上训练的标记符号化器与针对这一特定任务预先训练和微调的 RoBERTa 编码器相结合,我们使用 BERT 编码器在多主元素合金 (MPEA) 数据集上实现了 0.00015 的均方误差 (MSE),在耐火合金屈服强度 (RAYS) 数据集上实现了 0.00527 的均方误差 (MSE)。这超过了浅层模型的性能,后者在 MPEA 和 RAYS 数据集上的最佳 MSE 分别为 0.02376 和 0.01459。我们的研究结果凸显了语言模型在材料科学领域的潜力,并为基于文本的合金特性预测建立了一个基础框架,而无需依赖复杂的底层表示、计算或模拟。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
AlloyBERT: Alloy property prediction with large language models
The pursuit of novel alloys tailored to specific requirements poses significant challenges for researchers in the field. This underscores the importance of developing predictive techniques for essential physical properties of alloys based on their chemical composition and processing parameters. This study introduces AlloyBERT, a transformer encoder-based model designed to predict properties such as elastic modulus and yield strength of alloys using textual inputs. Leveraging the pre-trained RoBERTa and BERT encoder model as its foundation, AlloyBERT employs self-attention mechanisms to establish meaningful relationships between words, enabling it to interpret human-readable input and predict target alloy properties. By combining a tokenizer trained on our textual data and a RoBERTa encoder pre-trained and fine-tuned for this specific task, we achieved a mean squared error (MSE) of 0.00015 on the Multi Principal Elemental Alloys (MPEA) data set and 0.00527 on the Refractory Alloy Yield Strength (RAYS) dataset using BERT encoder. This surpasses the performance of shallow models, which achieved a best-case MSE of 0.02376 and 0.01459 on the MPEA and RAYS datasets respectively. Our results highlight the potential of language models in material science and establish a foundational framework for text-based prediction of alloy properties that does not rely on complex underlying representations, calculations, or simulations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computational Materials Science
Computational Materials Science 工程技术-材料科学:综合
CiteScore
6.50
自引率
6.10%
发文量
665
审稿时长
26 days
期刊介绍: The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信