利用机器学习预测休闲活动参与度的数据驱动方法。

Seungbak Lee, Minsoo Kang
{"title":"利用机器学习预测休闲活动参与度的数据驱动方法。","authors":"Seungbak Lee, Minsoo Kang","doi":"10.1080/02701367.2024.2343815","DOIUrl":null,"url":null,"abstract":"<p><p><b>Purpose:</b> With the popularity of recreational activities, the study aimed to develop prediction models for recreational activity participation and explore the key factors affecting participation in recreational activities. <b>Methods:</b> A total of 12,712 participants, excluding individuals under 20, were selected from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018. The mean age of the sample was 46.86 years (±16.97), with a gender distribution of 6,721 males and 5,991 females. The variables included demographic, physical-related variables, and lifestyle variables. This study developed 42 prediction models using six machine learning methods, including logistic regression, Support Vector Machine (SVM), decision tree, random forest, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The relative importance of each variable was evaluated by permutation feature importance. <b>Results:</b> The results illustrated that the LightGBM was the most effective algorithm for predicting recreational activity participation (accuracy: .838, precision: .783, recall: .967, F1-score: .865, AUC: .826). In particular, prediction performance increased when the demographic and lifestyle datasets were used together. Next, as the result of the permutation feature importance based on the top models, education level and moderate-vigorous physical activity (MVPA) were found to be essential variables. <b>Conclusion:</b> These findings demonstrated the potential of a data-driven approach utilizing machine learning in a recreational discipline. Furthermore, this study interpreted the prediction model through feature importance analysis to overcome the limitation of machine learning interpretability.</p>","PeriodicalId":94191,"journal":{"name":"Research quarterly for exercise and sport","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Data-Driven Approach to Predicting Recreational Activity Participation Using Machine Learning.\",\"authors\":\"Seungbak Lee, Minsoo Kang\",\"doi\":\"10.1080/02701367.2024.2343815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Purpose:</b> With the popularity of recreational activities, the study aimed to develop prediction models for recreational activity participation and explore the key factors affecting participation in recreational activities. <b>Methods:</b> A total of 12,712 participants, excluding individuals under 20, were selected from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018. The mean age of the sample was 46.86 years (±16.97), with a gender distribution of 6,721 males and 5,991 females. The variables included demographic, physical-related variables, and lifestyle variables. This study developed 42 prediction models using six machine learning methods, including logistic regression, Support Vector Machine (SVM), decision tree, random forest, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The relative importance of each variable was evaluated by permutation feature importance. <b>Results:</b> The results illustrated that the LightGBM was the most effective algorithm for predicting recreational activity participation (accuracy: .838, precision: .783, recall: .967, F1-score: .865, AUC: .826). In particular, prediction performance increased when the demographic and lifestyle datasets were used together. Next, as the result of the permutation feature importance based on the top models, education level and moderate-vigorous physical activity (MVPA) were found to be essential variables. <b>Conclusion:</b> These findings demonstrated the potential of a data-driven approach utilizing machine learning in a recreational discipline. Furthermore, this study interpreted the prediction model through feature importance analysis to overcome the limitation of machine learning interpretability.</p>\",\"PeriodicalId\":94191,\"journal\":{\"name\":\"Research quarterly for exercise and sport\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research quarterly for exercise and sport\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/02701367.2024.2343815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research quarterly for exercise and sport","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/02701367.2024.2343815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的:随着娱乐活动的普及,本研究旨在建立娱乐活动参与度的预测模型,并探讨影响娱乐活动参与度的关键因素。研究方法从2011年至2018年的美国国家健康与营养调查(NHANES)中选取了12712名参与者,其中不包括20岁以下的个体。样本的平均年龄为 46.86 岁(±16.97),性别分布为男性 6721 人,女性 5991 人。变量包括人口统计学变量、身体相关变量和生活方式变量。本研究使用六种机器学习方法开发了 42 个预测模型,包括逻辑回归、支持向量机(SVM)、决策树、随机森林、极梯度提升(XGBoost)和轻梯度提升机(LightGBM)。每个变量的相对重要性是通过置换特征重要性来评估的。结果结果表明,LightGBM 是预测娱乐活动参与度最有效的算法(准确度:.838;精确度:.783;召回率:.967;F1-分数:.865;AUC:.826)。特别是,当同时使用人口统计学数据集和生活方式数据集时,预测性能会有所提高。其次,基于顶级模型的置换特征重要性结果表明,教育水平和中等强度体力活动(MVPA)是基本变量。结论这些发现证明了在娱乐学科中利用机器学习的数据驱动方法的潜力。此外,本研究还通过特征重要性分析来解释预测模型,从而克服了机器学习可解释性的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Data-Driven Approach to Predicting Recreational Activity Participation Using Machine Learning.

Purpose: With the popularity of recreational activities, the study aimed to develop prediction models for recreational activity participation and explore the key factors affecting participation in recreational activities. Methods: A total of 12,712 participants, excluding individuals under 20, were selected from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018. The mean age of the sample was 46.86 years (±16.97), with a gender distribution of 6,721 males and 5,991 females. The variables included demographic, physical-related variables, and lifestyle variables. This study developed 42 prediction models using six machine learning methods, including logistic regression, Support Vector Machine (SVM), decision tree, random forest, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The relative importance of each variable was evaluated by permutation feature importance. Results: The results illustrated that the LightGBM was the most effective algorithm for predicting recreational activity participation (accuracy: .838, precision: .783, recall: .967, F1-score: .865, AUC: .826). In particular, prediction performance increased when the demographic and lifestyle datasets were used together. Next, as the result of the permutation feature importance based on the top models, education level and moderate-vigorous physical activity (MVPA) were found to be essential variables. Conclusion: These findings demonstrated the potential of a data-driven approach utilizing machine learning in a recreational discipline. Furthermore, this study interpreted the prediction model through feature importance analysis to overcome the limitation of machine learning interpretability.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信