增强中风预测模型:机器学习中小规模数据集的数据增强和迁移学习的混合

Imam Tahyudin , Ade Nurhopipah , Ades Tikaningsih , Puji Lestari , Yaya Suryana , Edi Winarko , Eko Winarto , Nazwan Haza , Hidetaka Nambo
{"title":"增强中风预测模型:机器学习中小规模数据集的数据增强和迁移学习的混合","authors":"Imam Tahyudin ,&nbsp;Ade Nurhopipah ,&nbsp;Ades Tikaningsih ,&nbsp;Puji Lestari ,&nbsp;Yaya Suryana ,&nbsp;Edi Winarko ,&nbsp;Eko Winarto ,&nbsp;Nazwan Haza ,&nbsp;Hidetaka Nambo","doi":"10.1016/j.cmpbup.2025.100198","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning is a powerful technique for analysing datasets and making data-driven recommendations. However, in general, the performance of machine learning in recognising patterns is proportional to the size of the dataset. On the other hand, in some cases, such as in the medical field, providing an instance of a dataset takes a lot of work and budget. Therefore, additional data acquisition techniques are needed to increase data size and improve model quality.</div><div>This study applied Data Augmentation and Transfer Learning to solve small-scale dataset problems in analyzing stroke patient information in The Banyumas Regional General Hospital (RSUD Banyumas). The information is utilized to predict the patient's status when discharged from the hospital. The research compared the prediction accuracy from three solutions: Data Augmentation, Transfer Learning, and the mixing of both methods. The classification models employed in this study were four algorithms: Random Forest, Support Vector Machine, Gradient Boosting, and Extreme Gradient Boosting. We implemented the Synthetic Minority Over-sampling Technique for Nominal and Continuous to generate the artificial dataset. In the Transfer Learning process, we used a benchmark stroke dataset with a different target than ours, so we labelled it based on the nearest neighbours of the original dataset. Applying Data Augmentation in this study is a good decision because it leads to better performance than using only the original dataset. However, implementing the Transfer Learning technique does not give a satisfying result for XGBoost and SVM. Mixing Data Augmentation and Transfer Learning provides the best performance with accuracy and recall, both 0.813, the precision of 0.853497, and the F-1 score of 0.826628 given by the Random Forest model. The research can contribute significantly to developing better classification models so physicians can obtain more accurate information and help treat stroke cases more effectively and efficiently.</div></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"8 ","pages":"Article 100198"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing stroke prediction models: A mixing of data augmentation and transfer learning for small-scale dataset in machine learning\",\"authors\":\"Imam Tahyudin ,&nbsp;Ade Nurhopipah ,&nbsp;Ades Tikaningsih ,&nbsp;Puji Lestari ,&nbsp;Yaya Suryana ,&nbsp;Edi Winarko ,&nbsp;Eko Winarto ,&nbsp;Nazwan Haza ,&nbsp;Hidetaka Nambo\",\"doi\":\"10.1016/j.cmpbup.2025.100198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Machine learning is a powerful technique for analysing datasets and making data-driven recommendations. However, in general, the performance of machine learning in recognising patterns is proportional to the size of the dataset. On the other hand, in some cases, such as in the medical field, providing an instance of a dataset takes a lot of work and budget. Therefore, additional data acquisition techniques are needed to increase data size and improve model quality.</div><div>This study applied Data Augmentation and Transfer Learning to solve small-scale dataset problems in analyzing stroke patient information in The Banyumas Regional General Hospital (RSUD Banyumas). The information is utilized to predict the patient's status when discharged from the hospital. The research compared the prediction accuracy from three solutions: Data Augmentation, Transfer Learning, and the mixing of both methods. The classification models employed in this study were four algorithms: Random Forest, Support Vector Machine, Gradient Boosting, and Extreme Gradient Boosting. We implemented the Synthetic Minority Over-sampling Technique for Nominal and Continuous to generate the artificial dataset. In the Transfer Learning process, we used a benchmark stroke dataset with a different target than ours, so we labelled it based on the nearest neighbours of the original dataset. Applying Data Augmentation in this study is a good decision because it leads to better performance than using only the original dataset. However, implementing the Transfer Learning technique does not give a satisfying result for XGBoost and SVM. Mixing Data Augmentation and Transfer Learning provides the best performance with accuracy and recall, both 0.813, the precision of 0.853497, and the F-1 score of 0.826628 given by the Random Forest model. The research can contribute significantly to developing better classification models so physicians can obtain more accurate information and help treat stroke cases more effectively and efficiently.</div></div>\",\"PeriodicalId\":72670,\"journal\":{\"name\":\"Computer methods and programs in biomedicine update\",\"volume\":\"8 \",\"pages\":\"Article 100198\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine update\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666990025000229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990025000229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

机器学习是分析数据集和提出数据驱动建议的强大技术。然而,一般来说,机器学习在识别模式方面的表现与数据集的大小成正比。另一方面,在某些情况下,例如在医疗领域,提供数据集的实例需要大量的工作和预算。因此,需要额外的数据采集技术来增加数据大小和提高模型质量。本研究应用数据增强和迁移学习来解决Banyumas地区总医院(RSUD Banyumas)中风患者信息分析中的小规模数据集问题。这些信息被用来预测病人出院时的状态。该研究比较了三种解决方案的预测精度:数据增强、迁移学习和两种方法的混合。本研究使用的分类模型有四种算法:随机森林、支持向量机、梯度增强和极端梯度增强。我们实现了对标称和连续的合成少数派过采样技术来生成人工数据集。在迁移学习过程中,我们使用了一个与我们的目标不同的基准笔画数据集,因此我们基于原始数据集的最近邻居对其进行标记。在本研究中应用数据增强是一个很好的决定,因为它比只使用原始数据集带来更好的性能。然而,迁移学习技术的实现并没有给XGBoost和SVM带来令人满意的结果。混合数据增强和迁移学习的准确率和召回率均为0.813,精度为0.853497,随机森林模型给出的F-1分数为0.826628。这项研究可以为开发更好的分类模型做出重大贡献,这样医生就可以获得更准确的信息,并帮助更有效地治疗中风病例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing stroke prediction models: A mixing of data augmentation and transfer learning for small-scale dataset in machine learning
Machine learning is a powerful technique for analysing datasets and making data-driven recommendations. However, in general, the performance of machine learning in recognising patterns is proportional to the size of the dataset. On the other hand, in some cases, such as in the medical field, providing an instance of a dataset takes a lot of work and budget. Therefore, additional data acquisition techniques are needed to increase data size and improve model quality.
This study applied Data Augmentation and Transfer Learning to solve small-scale dataset problems in analyzing stroke patient information in The Banyumas Regional General Hospital (RSUD Banyumas). The information is utilized to predict the patient's status when discharged from the hospital. The research compared the prediction accuracy from three solutions: Data Augmentation, Transfer Learning, and the mixing of both methods. The classification models employed in this study were four algorithms: Random Forest, Support Vector Machine, Gradient Boosting, and Extreme Gradient Boosting. We implemented the Synthetic Minority Over-sampling Technique for Nominal and Continuous to generate the artificial dataset. In the Transfer Learning process, we used a benchmark stroke dataset with a different target than ours, so we labelled it based on the nearest neighbours of the original dataset. Applying Data Augmentation in this study is a good decision because it leads to better performance than using only the original dataset. However, implementing the Transfer Learning technique does not give a satisfying result for XGBoost and SVM. Mixing Data Augmentation and Transfer Learning provides the best performance with accuracy and recall, both 0.813, the precision of 0.853497, and the F-1 score of 0.826628 given by the Random Forest model. The research can contribute significantly to developing better classification models so physicians can obtain more accurate information and help treat stroke cases more effectively and efficiently.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.90
自引率
0.00%
发文量
0
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信