Large language model-driven database for thermoelectric materials

IF 3.1 3区 材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
Suman Itani , Yibo Zhang , Jiadong Zang
{"title":"Large language model-driven database for thermoelectric materials","authors":"Suman Itani ,&nbsp;Yibo Zhang ,&nbsp;Jiadong Zang","doi":"10.1016/j.commatsci.2025.113855","DOIUrl":null,"url":null,"abstract":"<div><div>Thermoelectric materials have the ability to convert waste heat into electricity, offering a valuable solution for energy harvesting. However, their widespread use is hindered by low conversion efficiency, the reliance on expensive rare earth elements, and the environmental and regulatory concerns associated with lead-based materials. A fast and cost-effective way to identify highly efficient thermoelectric materials is through data-driven methods. These approaches rely on robust and comprehensive datasets to train models. Although there are several databases on thermoelectric materials, there is still a need to collect and integrate experimental data from peer-reviewed research articles to capture diverse compositions and properties of materials. Here we developed a comprehensive database of 7,123 thermoelectric compounds, containing key information such as chemical composition, structural detail, seebeck coefficient, electrical and thermal conductivity, power factor, and figure of merit (ZT). We used the GPTArticleExtractor workflow, powered by large language models (LLM), to extract and curate data automatically from the scientific literature published in Elsevier journals. This process enabled the creation of a structured database that addresses the challenges of manual data collection. The open access database could stimulate data-driven research and advance thermoelectric material analysis and discovery.</div></div>","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":"253 ","pages":"Article 113855"},"PeriodicalIF":3.1000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927025625001983","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Thermoelectric materials have the ability to convert waste heat into electricity, offering a valuable solution for energy harvesting. However, their widespread use is hindered by low conversion efficiency, the reliance on expensive rare earth elements, and the environmental and regulatory concerns associated with lead-based materials. A fast and cost-effective way to identify highly efficient thermoelectric materials is through data-driven methods. These approaches rely on robust and comprehensive datasets to train models. Although there are several databases on thermoelectric materials, there is still a need to collect and integrate experimental data from peer-reviewed research articles to capture diverse compositions and properties of materials. Here we developed a comprehensive database of 7,123 thermoelectric compounds, containing key information such as chemical composition, structural detail, seebeck coefficient, electrical and thermal conductivity, power factor, and figure of merit (ZT). We used the GPTArticleExtractor workflow, powered by large language models (LLM), to extract and curate data automatically from the scientific literature published in Elsevier journals. This process enabled the creation of a structured database that addresses the challenges of manual data collection. The open access database could stimulate data-driven research and advance thermoelectric material analysis and discovery.

Abstract Image

大型语言模型驱动的热电材料数据库
热电材料具有将废热转化为电能的能力,为能量收集提供了有价值的解决方案。然而,它们的广泛使用受到转换效率低、依赖昂贵的稀土元素以及与铅基材料有关的环境和监管问题的阻碍。识别高效热电材料的一种快速且经济有效的方法是通过数据驱动的方法。这些方法依赖于鲁棒和全面的数据集来训练模型。虽然有几个关于热电材料的数据库,但仍然需要从同行评审的研究文章中收集和整合实验数据,以捕捉材料的不同成分和特性。在这里,我们建立了一个包含7123种热电化合物的综合数据库,包括化学成分、结构细节、塞贝克系数、电导率和导热系数、功率因数和品质系数(ZT)等关键信息。我们使用GPTArticleExtractor工作流,由大型语言模型(LLM)提供支持,从Elsevier期刊上发表的科学文献中自动提取和管理数据。这个过程支持创建结构化数据库,以解决手动数据收集的挑战。开放访问数据库可以刺激数据驱动的研究和推进热电材料的分析和发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computational Materials Science
Computational Materials Science 工程技术-材料科学:综合
CiteScore
6.50
自引率
6.10%
发文量
665
审稿时长
26 days
期刊介绍: The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信