An ensemble learning strategy for multi-source hydrogen embrittlement data by introducing missing information

Xujie Gong, Ruichao Lei, Ruize Sun, Xue Jiang, Yanjing Su, Yu Yan
{"title":"An ensemble learning strategy for multi-source hydrogen embrittlement data by introducing missing information","authors":"Xujie Gong,&nbsp;Ruichao Lei,&nbsp;Ruize Sun,&nbsp;Xue Jiang,&nbsp;Yanjing Su,&nbsp;Yu Yan","doi":"10.1002/mgea.35","DOIUrl":null,"url":null,"abstract":"<p>Accurately and quickly predicting hydrogen embrittlement performance is critical for the service of metal materials. However, due to multi-source heterogeneity, existing hydrogen embrittlement data are missing, making it impractical to train reliable machine learning models. In this study, we proposed an ensemble learning training strategy for missing data based on the Adaboost algorithm. This method introduced a mask matrix with missing data and enabled each round of training to generate sub-datasets, considering missing value information. The strategy first trained a subset of features based on the existing dataset and a selected method and continuously focused on the combination of features with the highest error for iterative training, where the mask matrix of the missing data was used as the input to fit the weights of each base learner using a neural network. Compared with directly modeling on highly sparse data, the predictive ability of this strategy was significantly improved by approximately 20%. In addition, in the testing of new samples, the predicted mean absolute error of the new model was successfully reduced from 0.2 to 0.09. This strategy offers good adaptability to the hydrogen embrittlement sensitivity of different sizes and can avoid interference from feature importance caused by filling data.</p>","PeriodicalId":100889,"journal":{"name":"Materials Genome Engineering Advances","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mgea.35","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Genome Engineering Advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mgea.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Accurately and quickly predicting hydrogen embrittlement performance is critical for the service of metal materials. However, due to multi-source heterogeneity, existing hydrogen embrittlement data are missing, making it impractical to train reliable machine learning models. In this study, we proposed an ensemble learning training strategy for missing data based on the Adaboost algorithm. This method introduced a mask matrix with missing data and enabled each round of training to generate sub-datasets, considering missing value information. The strategy first trained a subset of features based on the existing dataset and a selected method and continuously focused on the combination of features with the highest error for iterative training, where the mask matrix of the missing data was used as the input to fit the weights of each base learner using a neural network. Compared with directly modeling on highly sparse data, the predictive ability of this strategy was significantly improved by approximately 20%. In addition, in the testing of new samples, the predicted mean absolute error of the new model was successfully reduced from 0.2 to 0.09. This strategy offers good adaptability to the hydrogen embrittlement sensitivity of different sizes and can avoid interference from feature importance caused by filling data.

Abstract Image

引入缺失信息的多源氢脆数据集合学习策略
准确、快速地预测氢脆性能对金属材料的服务至关重要。然而,由于多源异构性,现有的氢脆数据缺失,使得训练可靠的机器学习模型变得不切实际。在本研究中,我们提出了一种基于 Adaboost 算法的缺失数据集合学习训练策略。这种方法引入了一个包含缺失数据的掩码矩阵,每一轮训练都能生成子数据集,并考虑缺失值信息。该策略首先根据现有数据集和选定的方法训练一个特征子集,并持续关注误差最大的特征组合进行迭代训练,其中缺失数据的掩码矩阵被用作使用神经网络拟合每个基础学习器权重的输入。与直接对高稀疏数据建模相比,该策略的预测能力显著提高了约 20%。此外,在新样本测试中,新模型的预测平均绝对误差成功地从 0.2 降至 0.09。该策略对不同尺寸的氢脆敏感性具有良好的适应性,并能避免填充数据对特征重要性的干扰。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信