{"title":"Classified dataset, regression and machine learning modeling for prediction of phase transformation temperatures in steels","authors":"Jinlei Lu, Guanglong Xu, Fuwen Chen, Yuwen Cui","doi":"10.1016/j.calphad.2024.102748","DOIUrl":null,"url":null,"abstract":"<div><p>The prediction of the characteristic Martensite Start (<em>M</em><sub><em>s</em></sub>) temperature and Austenitic Nose Tip Temperature (ANTT) in steels is of scientific and technological importance; however, it faces significant challenges due to multiphysical complexity.</p><p>In this study, we introduced a structured framework for data classification and hierarchical iterations aimed at predicting Ms (Martensite start temperature) and ANTT (Austenite non-transforming temperature). This framework was incorporated into two optimization models, leading to enhancements in accuracy, extrapolation capabilities, and generalization performance. First, we classified the collected Ms datasets hierarchically based on the alloying elements presented in steels, including carbon, austenite stabilizers, non-austenitization elements, and data credibility. Regression analyses of Ms temperatures concerning chemical compositions were then carried out using phenomenological variables from binary systems to multi-component systems in alignment with the spirit of CALPHAD modeling, which is renowned for its robust extrapolation abilities. By iteratively fitting the hierarchically classified datasets and implementing hierarchical iterations, we developed the CALPHAD-guided phenomenological variable (CGPV) Ms regression model. This model achieved improved accuracy levels, with R<sup>2</sup> values of 0.9 for training and 0.87 for testing, surpassing most conventional regression models that do not account for compositional interactions. Furthermore, the CALPHAD-guided machine learning (CGML) model, constructed based on the classified datasets and hierarchical iterations but without utilizing phenomenological variables, demonstrated strong performance with R<sup>2</sup> values of 0.98 and 0.86 for training and testing, respectively. The CGML model was demonstrated not only to reliably filter out problematic data in a dataset but also to unveil the unnoticed coupling between carbon and other alloying elements on <em>M</em><sub><em>s</em></sub>. Finally, the CGML method has been readily transferred to predict ANTT with high accuracy as well.</p></div>","PeriodicalId":9436,"journal":{"name":"Calphad-computer Coupling of Phase Diagrams and Thermochemistry","volume":"87 ","pages":"Article 102748"},"PeriodicalIF":1.9000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Calphad-computer Coupling of Phase Diagrams and Thermochemistry","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0364591624000907","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
The prediction of the characteristic Martensite Start (Ms) temperature and Austenitic Nose Tip Temperature (ANTT) in steels is of scientific and technological importance; however, it faces significant challenges due to multiphysical complexity.
In this study, we introduced a structured framework for data classification and hierarchical iterations aimed at predicting Ms (Martensite start temperature) and ANTT (Austenite non-transforming temperature). This framework was incorporated into two optimization models, leading to enhancements in accuracy, extrapolation capabilities, and generalization performance. First, we classified the collected Ms datasets hierarchically based on the alloying elements presented in steels, including carbon, austenite stabilizers, non-austenitization elements, and data credibility. Regression analyses of Ms temperatures concerning chemical compositions were then carried out using phenomenological variables from binary systems to multi-component systems in alignment with the spirit of CALPHAD modeling, which is renowned for its robust extrapolation abilities. By iteratively fitting the hierarchically classified datasets and implementing hierarchical iterations, we developed the CALPHAD-guided phenomenological variable (CGPV) Ms regression model. This model achieved improved accuracy levels, with R2 values of 0.9 for training and 0.87 for testing, surpassing most conventional regression models that do not account for compositional interactions. Furthermore, the CALPHAD-guided machine learning (CGML) model, constructed based on the classified datasets and hierarchical iterations but without utilizing phenomenological variables, demonstrated strong performance with R2 values of 0.98 and 0.86 for training and testing, respectively. The CGML model was demonstrated not only to reliably filter out problematic data in a dataset but also to unveil the unnoticed coupling between carbon and other alloying elements on Ms. Finally, the CGML method has been readily transferred to predict ANTT with high accuracy as well.
期刊介绍:
The design of industrial processes requires reliable thermodynamic data. CALPHAD (Computer Coupling of Phase Diagrams and Thermochemistry) aims to promote computational thermodynamics through development of models to represent thermodynamic properties for various phases which permit prediction of properties of multicomponent systems from those of binary and ternary subsystems, critical assessment of data and their incorporation into self-consistent databases, development of software to optimize and derive thermodynamic parameters and the development and use of databanks for calculations to improve understanding of various industrial and technological processes. This work is disseminated through the CALPHAD journal and its annual conference.