Large language model-driven database for thermoelectric materials

IF 3.3 3区材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY

Computational Materials Science Pub Date : 2025-03-28 DOI:10.1016/j.commatsci.2025.113855

Suman Itani , Yibo Zhang , Jiadong Zang

{"title":"Large language model-driven database for thermoelectric materials","authors":"Suman Itani , Yibo Zhang , Jiadong Zang","doi":"10.1016/j.commatsci.2025.113855","DOIUrl":null,"url":null,"abstract":"<div><div>Thermoelectric materials have the ability to convert waste heat into electricity, offering a valuable solution for energy harvesting. However, their widespread use is hindered by low conversion efficiency, the reliance on expensive rare earth elements, and the environmental and regulatory concerns associated with lead-based materials. A fast and cost-effective way to identify highly efficient thermoelectric materials is through data-driven methods. These approaches rely on robust and comprehensive datasets to train models. Although there are several databases on thermoelectric materials, there is still a need to collect and integrate experimental data from peer-reviewed research articles to capture diverse compositions and properties of materials. Here we developed a comprehensive database of 7,123 thermoelectric compounds, containing key information such as chemical composition, structural detail, seebeck coefficient, electrical and thermal conductivity, power factor, and figure of merit (ZT). We used the GPTArticleExtractor workflow, powered by large language models (LLM), to extract and curate data automatically from the scientific literature published in Elsevier journals. This process enabled the creation of a structured database that addresses the challenges of manual data collection. The open access database could stimulate data-driven research and advance thermoelectric material analysis and discovery.</div></div>","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":"253 ","pages":"Article 113855"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927025625001983","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Thermoelectric materials have the ability to convert waste heat into electricity, offering a valuable solution for energy harvesting. However, their widespread use is hindered by low conversion efficiency, the reliance on expensive rare earth elements, and the environmental and regulatory concerns associated with lead-based materials. A fast and cost-effective way to identify highly efficient thermoelectric materials is through data-driven methods. These approaches rely on robust and comprehensive datasets to train models. Although there are several databases on thermoelectric materials, there is still a need to collect and integrate experimental data from peer-reviewed research articles to capture diverse compositions and properties of materials. Here we developed a comprehensive database of 7,123 thermoelectric compounds, containing key information such as chemical composition, structural detail, seebeck coefficient, electrical and thermal conductivity, power factor, and figure of merit (ZT). We used the GPTArticleExtractor workflow, powered by large language models (LLM), to extract and curate data automatically from the scientific literature published in Elsevier journals. This process enabled the creation of a structured database that addresses the challenges of manual data collection. The open access database could stimulate data-driven research and advance thermoelectric material analysis and discovery.

Abstract Image

查看原文本刊更多论文

大型语言模型驱动的热电材料数据库

热电材料具有将废热转化为电能的能力，为能量收集提供了有价值的解决方案。然而，它们的广泛使用受到转换效率低、依赖昂贵的稀土元素以及与铅基材料有关的环境和监管问题的阻碍。识别高效热电材料的一种快速且经济有效的方法是通过数据驱动的方法。这些方法依赖于鲁棒和全面的数据集来训练模型。虽然有几个关于热电材料的数据库，但仍然需要从同行评审的研究文章中收集和整合实验数据，以捕捉材料的不同成分和特性。在这里，我们建立了一个包含7123种热电化合物的综合数据库，包括化学成分、结构细节、塞贝克系数、电导率和导热系数、功率因数和品质系数（ZT）等关键信息。我们使用GPTArticleExtractor工作流，由大型语言模型（LLM）提供支持，从Elsevier期刊上发表的科学文献中自动提取和管理数据。这个过程支持创建结构化数据库，以解决手动数据收集的挑战。开放访问数据库可以刺激数据驱动的研究和推进热电材料的分析和发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Materials Science 工程技术-材料科学：综合

CiteScore

6.50

自引率

6.10%

发文量

665

审稿时长

26 days

期刊介绍： The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.