基于自动相关性确定的高斯过程嵌入式特征选择方法

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Yushi Deng, Mario Eden, Selen Cremaschi
{"title":"基于自动相关性确定的高斯过程嵌入式特征选择方法","authors":"Yushi Deng,&nbsp;Mario Eden,&nbsp;Selen Cremaschi","doi":"10.1016/j.compchemeng.2024.108852","DOIUrl":null,"url":null,"abstract":"<div><p>In Gaussian Process, feature importance is inversely proportional to the corresponding length scale when applying the Automatic Relevance Determination (ARD) structured kernel function. Features can be selected by ranking them according to their importance. Among the ARD-based feature selection methods, no uniform score exists for quantifying the output variation explained by feature subsets. This study proposes two feature selection approaches using two cumulative feature importance scores, one titled derivative decomposition ratio and the other normalized sensitivity, to determine the optimal feature subset. The performance of the approaches is assessed to test if irrelevant features are accurately identified and if the feature rankings are correct. The approaches are applied to identify relevant dimensionless inputs for a hybrid model estimating liquid entrainment fraction in two-phase flow. The results reveal that the proposed methods can identify the optimal feature subset for the hybrid model without significantly worsening its Root Mean Squared Error.</p></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"191 ","pages":"Article 108852"},"PeriodicalIF":3.9000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Gaussian process embedded feature selection method based on automatic relevance determination\",\"authors\":\"Yushi Deng,&nbsp;Mario Eden,&nbsp;Selen Cremaschi\",\"doi\":\"10.1016/j.compchemeng.2024.108852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In Gaussian Process, feature importance is inversely proportional to the corresponding length scale when applying the Automatic Relevance Determination (ARD) structured kernel function. Features can be selected by ranking them according to their importance. Among the ARD-based feature selection methods, no uniform score exists for quantifying the output variation explained by feature subsets. This study proposes two feature selection approaches using two cumulative feature importance scores, one titled derivative decomposition ratio and the other normalized sensitivity, to determine the optimal feature subset. The performance of the approaches is assessed to test if irrelevant features are accurately identified and if the feature rankings are correct. The approaches are applied to identify relevant dimensionless inputs for a hybrid model estimating liquid entrainment fraction in two-phase flow. The results reveal that the proposed methods can identify the optimal feature subset for the hybrid model without significantly worsening its Root Mean Squared Error.</p></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"191 \",\"pages\":\"Article 108852\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135424002709\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424002709","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

在高斯过程中,当应用自动相关性判定(ARD)结构核函数时,特征的重要性与相应的长度标度成反比。可以根据重要程度对特征进行排序来选择特征。在基于 ARD 的特征选择方法中,没有一种统一的分数可以量化特征子集所解释的输出变化。本研究提出了两种特征选择方法,使用两个累积特征重要性分数(一个是标题导数分解率,另一个是归一化灵敏度)来确定最佳特征子集。对这两种方法的性能进行了评估,以检验是否能准确识别无关特征以及特征排序是否正确。这些方法被应用于识别一个估算两相流中液体夹带分数的混合模型的相关无量纲输入。结果表明,所提出的方法可以为混合模型识别出最佳特征子集,而不会显著恶化其均方根误差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Gaussian process embedded feature selection method based on automatic relevance determination

In Gaussian Process, feature importance is inversely proportional to the corresponding length scale when applying the Automatic Relevance Determination (ARD) structured kernel function. Features can be selected by ranking them according to their importance. Among the ARD-based feature selection methods, no uniform score exists for quantifying the output variation explained by feature subsets. This study proposes two feature selection approaches using two cumulative feature importance scores, one titled derivative decomposition ratio and the other normalized sensitivity, to determine the optimal feature subset. The performance of the approaches is assessed to test if irrelevant features are accurately identified and if the feature rankings are correct. The approaches are applied to identify relevant dimensionless inputs for a hybrid model estimating liquid entrainment fraction in two-phase flow. The results reveal that the proposed methods can identify the optimal feature subset for the hybrid model without significantly worsening its Root Mean Squared Error.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信