High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) Algorithm in Classifying Height Indicators Through Social-life and Well-being Factors

Ziqian Zhuang, Wei Xu, Rahi Jain
{"title":"High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) Algorithm in Classifying Height Indicators Through Social-life and Well-being Factors","authors":"Ziqian Zhuang, Wei Xu, Rahi Jain","doi":"10.33137/utjph.v2i2.36764","DOIUrl":null,"url":null,"abstract":"Introduction: High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) algorithm can incorporate interaction terms and combine with existing techniques for feature selection. Simulation studies have validated the ability of HDSI-BO to select true features and consequently, improve prediction accuracy compared to standard algorithms. Our goal is to assess the applicability of HDSI-BO in combining different techniques and measure its predictive performance in a real data study of predicting height indicators by social-life and well-being factors. \nMethods: HDSI-BO was combined with logistic regression, ridge regression, LASSO, adaptive LASSO, and elastic net. Two-way interaction terms were considered. Hyperparameters used in HDSI-BO were optimized through genetic algorithms with five-fold cross-validation. To measure the performance of feature selection, we fitted final models by logistic regression based on the sets of selected features and used the model’s AUC as a measure. 30 trials were repeated to generate a range of the number of selected features and a 95% confidence interval for AUC. \nResults: When combined with all of the above methods, HDSI-BO methods achieved higher final AUC values both in terms of mean and confidence interval. In addition, HDSI-BO methods effectively narrowed down the sets of selected features and interaction terms compared with standard methods. \nConclusion: The HDSI-BO algorithm combines well with multiple standard methods and has comparable or better predictive performance compared with the standard methods. The computational and time complexity of HDSI-BO is higher but still acceptable. Considering AUC as the single metric cannot comprehensively measure the feature selection performance. More effective metrics of performance should be explored for future work.","PeriodicalId":265882,"journal":{"name":"University of Toronto Journal of Public Health","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"University of Toronto Journal of Public Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33137/utjph.v2i2.36764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: High dimensional Selection with Interactions for Binary Outcome (HDSI-BO) algorithm can incorporate interaction terms and combine with existing techniques for feature selection. Simulation studies have validated the ability of HDSI-BO to select true features and consequently, improve prediction accuracy compared to standard algorithms. Our goal is to assess the applicability of HDSI-BO in combining different techniques and measure its predictive performance in a real data study of predicting height indicators by social-life and well-being factors. Methods: HDSI-BO was combined with logistic regression, ridge regression, LASSO, adaptive LASSO, and elastic net. Two-way interaction terms were considered. Hyperparameters used in HDSI-BO were optimized through genetic algorithms with five-fold cross-validation. To measure the performance of feature selection, we fitted final models by logistic regression based on the sets of selected features and used the model’s AUC as a measure. 30 trials were repeated to generate a range of the number of selected features and a 95% confidence interval for AUC. Results: When combined with all of the above methods, HDSI-BO methods achieved higher final AUC values both in terms of mean and confidence interval. In addition, HDSI-BO methods effectively narrowed down the sets of selected features and interaction terms compared with standard methods. Conclusion: The HDSI-BO algorithm combines well with multiple standard methods and has comparable or better predictive performance compared with the standard methods. The computational and time complexity of HDSI-BO is higher but still acceptable. Considering AUC as the single metric cannot comprehensively measure the feature selection performance. More effective metrics of performance should be explored for future work.
基于社会生活和幸福因素对身高指标进行分类的高维选择二元结果交互(HDSI-BO)算法
介绍:HDSI-BO (High dimensional Selection with Interactions for Binary Outcome)算法可以将交互项与现有的特征选择技术相结合。仿真研究已经验证了HDSI-BO选择真实特征的能力,因此,与标准算法相比,提高了预测精度。我们的目标是评估HDSI-BO在结合不同技术时的适用性,并在通过社会生活和幸福因素预测身高指标的实际数据研究中测量其预测性能。方法:采用logistic回归、脊回归、LASSO、自适应LASSO、弹性网等方法对HDSI-BO进行综合评价。考虑了双向相互作用条件。通过遗传算法优化HDSI-BO中使用的超参数,并进行五次交叉验证。为了衡量特征选择的性能,我们基于选择的特征集通过逻辑回归拟合最终模型,并使用模型的AUC作为度量。重复进行30次试验,以产生所选特征的数量范围和AUC的95%置信区间。结果:HDSI-BO方法与上述所有方法联合使用时,在平均值和置信区间上均获得更高的最终AUC值。此外,与标准方法相比,HDSI-BO方法有效地缩小了所选特征和交互项的集合。结论:HDSI-BO算法与多种标准方法结合良好,预测性能与标准方法相当或更好。HDSI-BO的计算复杂度和时间复杂度较高,但仍然可以接受。将AUC作为单一指标不能全面衡量特征选择性能。应该为今后的工作探索更有效的绩效指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信