LLM-Prop:使用大型语言模型预测晶体材料的性质

IF 9.4 1区 材料科学 Q1 CHEMISTRY, PHYSICAL
Andre Niyongabo Rubungo, Craig Arnold, Barry P. Rand, Adji Bousso Dieng
{"title":"LLM-Prop:使用大型语言模型预测晶体材料的性质","authors":"Andre Niyongabo Rubungo, Craig Arnold, Barry P. Rand, Adji Bousso Dieng","doi":"10.1038/s41524-025-01536-2","DOIUrl":null,"url":null,"abstract":"<p>The prediction of crystal properties plays a crucial role in materials science and applications. Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks (GNNs). However, accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge. Surprisingly, predicting crystal properties from crystal text descriptions is understudied, despite the rich information and expressiveness that text data offer. In this paper, we develop and make public a benchmark dataset (TextEdge) that contains crystal text descriptions with their properties. We then propose LLM-Prop, a method that leverages the general-purpose learning capabilities of large language models (LLMs) to predict properties of crystals from their text descriptions. LLM-Prop outperforms the current state-of-the-art GNN-based methods by approximately 8% on predicting band gap, 3% on classifying whether the band gap is direct or indirect, and 65% on predicting unit cell volume, and yields comparable performance on predicting formation energy per atom, energy per atom, and energy above hull. LLM-Prop also outperforms the fine-tuned MatBERT, a domain-specific pre-trained BERT model, despite having 3 times fewer parameters. We further fine-tune the LLM-Prop model directly on CIF files and condensed structure information generated by Robocrystallographer and found that LLM-Prop fine-tuned on text descriptions provides a better performance on average. Our empirical results highlight the importance of having a natural language input to LLMs to accurately predict crystal properties and the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction.</p>","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"12 1","pages":""},"PeriodicalIF":9.4000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-Prop: predicting the properties of crystalline materials using large language models\",\"authors\":\"Andre Niyongabo Rubungo, Craig Arnold, Barry P. Rand, Adji Bousso Dieng\",\"doi\":\"10.1038/s41524-025-01536-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The prediction of crystal properties plays a crucial role in materials science and applications. Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks (GNNs). However, accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge. Surprisingly, predicting crystal properties from crystal text descriptions is understudied, despite the rich information and expressiveness that text data offer. In this paper, we develop and make public a benchmark dataset (TextEdge) that contains crystal text descriptions with their properties. We then propose LLM-Prop, a method that leverages the general-purpose learning capabilities of large language models (LLMs) to predict properties of crystals from their text descriptions. LLM-Prop outperforms the current state-of-the-art GNN-based methods by approximately 8% on predicting band gap, 3% on classifying whether the band gap is direct or indirect, and 65% on predicting unit cell volume, and yields comparable performance on predicting formation energy per atom, energy per atom, and energy above hull. LLM-Prop also outperforms the fine-tuned MatBERT, a domain-specific pre-trained BERT model, despite having 3 times fewer parameters. We further fine-tune the LLM-Prop model directly on CIF files and condensed structure information generated by Robocrystallographer and found that LLM-Prop fine-tuned on text descriptions provides a better performance on average. Our empirical results highlight the importance of having a natural language input to LLMs to accurately predict crystal properties and the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction.</p>\",\"PeriodicalId\":19342,\"journal\":{\"name\":\"npj Computational Materials\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"npj Computational Materials\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://doi.org/10.1038/s41524-025-01536-2\",\"RegionNum\":1,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-025-01536-2","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

晶体性质的预测在材料科学和应用中起着至关重要的作用。目前预测晶体性质的方法主要是利用图神经网络(gnn)对晶体结构进行建模。然而,准确地模拟晶体中原子和分子之间复杂的相互作用仍然是一个挑战。令人惊讶的是,尽管文本数据提供了丰富的信息和表达能力,但从晶体文本描述预测晶体性质的研究还不够充分。在本文中,我们开发并公开了一个包含晶体文本描述及其属性的基准数据集(TextEdge)。然后,我们提出了LLM-Prop,这是一种利用大型语言模型(llm)的通用学习能力从文本描述中预测晶体性质的方法。LLM-Prop在预测带隙方面比目前最先进的基于gnn的方法高出约8%,在分类带隙是直接的还是间接的方面高出3%,在预测单元胞体体积方面高出65%,在预测每个原子的地层能量、每个原子的能量和船体上的能量方面也取得了相当的成绩。LLM-Prop也优于微调的MatBERT,这是一种特定领域的预训练BERT模型,尽管参数减少了3倍。我们进一步在CIF文件和Robocrystallographer生成的压缩结构信息上对LLM-Prop模型进行了微调,发现对文本描述进行微调的LLM-Prop平均提供了更好的性能。我们的实证结果强调了为llm提供自然语言输入以准确预测晶体性质的重要性,以及目前gnn无法捕获与空间群对称和Wyckoff位有关的信息以准确预测晶体性质。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

LLM-Prop: predicting the properties of crystalline materials using large language models

LLM-Prop: predicting the properties of crystalline materials using large language models

The prediction of crystal properties plays a crucial role in materials science and applications. Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks (GNNs). However, accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge. Surprisingly, predicting crystal properties from crystal text descriptions is understudied, despite the rich information and expressiveness that text data offer. In this paper, we develop and make public a benchmark dataset (TextEdge) that contains crystal text descriptions with their properties. We then propose LLM-Prop, a method that leverages the general-purpose learning capabilities of large language models (LLMs) to predict properties of crystals from their text descriptions. LLM-Prop outperforms the current state-of-the-art GNN-based methods by approximately 8% on predicting band gap, 3% on classifying whether the band gap is direct or indirect, and 65% on predicting unit cell volume, and yields comparable performance on predicting formation energy per atom, energy per atom, and energy above hull. LLM-Prop also outperforms the fine-tuned MatBERT, a domain-specific pre-trained BERT model, despite having 3 times fewer parameters. We further fine-tune the LLM-Prop model directly on CIF files and condensed structure information generated by Robocrystallographer and found that LLM-Prop fine-tuned on text descriptions provides a better performance on average. Our empirical results highlight the importance of having a natural language input to LLMs to accurately predict crystal properties and the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
npj Computational Materials
npj Computational Materials Mathematics-Modeling and Simulation
CiteScore
15.30
自引率
5.20%
发文量
229
审稿时长
6 weeks
期刊介绍: npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings. Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信