{"title":"aiMP and aiOQ databases in FactSage: Materials informatics relying on ab initio, machine learning and CALPHAD data","authors":"C. Frueh, C. Aras, Ö. Büyükuslu, M. to Baben","doi":"10.1016/j.calphad.2025.102838","DOIUrl":null,"url":null,"abstract":"<div><div>aiMP and aiOQ are databases derived from the 0 K density functional theory (DFT) calculations data stored in the Materials Project and Open Quantum Materials Database (OQMD) repositories, respectively. aiMP and aiOQ databases rely on methods to process 0 K DFT data using machine learning models trained on thousands of compounds. These models adjust formation enthalpies to improve consistency with existing CALPHAD (CALculation of PHAse Diagrams) databases and predict thermodynamic properties such as entropy and heat capacity as functions of temperature.</div><div>This work demonstrates three Materials Informatics applications of large-scale CALPHAD-compatible databases enabled by automated workflows.</div><div>First, a comparison was made between the SGTE Pure Substance database (SGPS), containing 3927 compounds, and the aiMP database, which includes overlapping entries for 1519 compounds. For these overlapping compounds, the enthalpy of formation, entropy at 298 K, and heat capacity at 298 K were analyzed. Any discrepancies exceeding the inherent error of the machine learning models were flagged. A literature survey was then conducted for compounds with larger discrepancies and erroneous data was confirmed in approximately 0.7% of the SGPS data.</div><div>Second, the aiMP database was used to estimate phase diagrams and identify potential new coating materials for SiC/SiC composites, which are under investigation as accident-tolerant fuel cladding materials.</div><div>Finally, it is shown that aiMP can serve as a starting point for both traditional and automated CALPHAD modeling. Three examples were explored Al-Ca, Mg-Si, and Ca-Li. These examples highlight the versatility of machine learning-enhanced thermodynamic databases in accelerating material discovery and improving database reliability.</div></div>","PeriodicalId":9436,"journal":{"name":"Calphad-computer Coupling of Phase Diagrams and Thermochemistry","volume":"90 ","pages":"Article 102838"},"PeriodicalIF":1.9000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Calphad-computer Coupling of Phase Diagrams and Thermochemistry","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0364591625000410","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
aiMP and aiOQ are databases derived from the 0 K density functional theory (DFT) calculations data stored in the Materials Project and Open Quantum Materials Database (OQMD) repositories, respectively. aiMP and aiOQ databases rely on methods to process 0 K DFT data using machine learning models trained on thousands of compounds. These models adjust formation enthalpies to improve consistency with existing CALPHAD (CALculation of PHAse Diagrams) databases and predict thermodynamic properties such as entropy and heat capacity as functions of temperature.
This work demonstrates three Materials Informatics applications of large-scale CALPHAD-compatible databases enabled by automated workflows.
First, a comparison was made between the SGTE Pure Substance database (SGPS), containing 3927 compounds, and the aiMP database, which includes overlapping entries for 1519 compounds. For these overlapping compounds, the enthalpy of formation, entropy at 298 K, and heat capacity at 298 K were analyzed. Any discrepancies exceeding the inherent error of the machine learning models were flagged. A literature survey was then conducted for compounds with larger discrepancies and erroneous data was confirmed in approximately 0.7% of the SGPS data.
Second, the aiMP database was used to estimate phase diagrams and identify potential new coating materials for SiC/SiC composites, which are under investigation as accident-tolerant fuel cladding materials.
Finally, it is shown that aiMP can serve as a starting point for both traditional and automated CALPHAD modeling. Three examples were explored Al-Ca, Mg-Si, and Ca-Li. These examples highlight the versatility of machine learning-enhanced thermodynamic databases in accelerating material discovery and improving database reliability.
aiMP和aiOQ数据库分别来源于存储在材料项目和开放量子材料数据库(OQMD)存储库中的0 K密度泛函理论(DFT)计算数据。aiMP和aiOQ数据库依赖于使用数千种化合物训练的机器学习模型来处理0 K DFT数据的方法。这些模型调整地层焓,以提高与现有CALPHAD(相图计算)数据库的一致性,并预测热力学性质,如熵和热容作为温度的函数。这项工作演示了三种材料信息学应用程序的大规模calphad兼容数据库启用自动化工作流程。首先,我们比较了包含3927种化合物的SGTE纯物质数据库(SGPS)和包含1519种化合物重叠条目的aiMP数据库。对这些重叠化合物的生成焓、298k时的熵和298k时的热容进行了分析。任何超出机器学习模型固有误差的差异都会被标记出来。然后对差异较大的化合物进行文献调查,在大约0.7%的SGPS数据中确认了错误数据。其次,aiMP数据库用于估计相图并确定潜在的SiC/SiC复合材料的新涂层材料,这些材料正在研究作为耐事故燃料包层材料。最后,表明aiMP可以作为传统和自动化CALPHAD建模的起点。以Al-Ca、Mg-Si和Ca-Li为例。这些例子突出了机器学习增强的热力学数据库在加速材料发现和提高数据库可靠性方面的多功能性。
期刊介绍:
The design of industrial processes requires reliable thermodynamic data. CALPHAD (Computer Coupling of Phase Diagrams and Thermochemistry) aims to promote computational thermodynamics through development of models to represent thermodynamic properties for various phases which permit prediction of properties of multicomponent systems from those of binary and ternary subsystems, critical assessment of data and their incorporation into self-consistent databases, development of software to optimize and derive thermodynamic parameters and the development and use of databanks for calculations to improve understanding of various industrial and technological processes. This work is disseminated through the CALPHAD journal and its annual conference.