使用机器学习来预测疾病的未来发展

Lanxin Miao, Xuezhou Guo, H. Abbas, K. Qaraqe, Q. Abbasi
{"title":"使用机器学习来预测疾病的未来发展","authors":"Lanxin Miao, Xuezhou Guo, H. Abbas, K. Qaraqe, Q. Abbasi","doi":"10.1109/UCET51115.2020.9205373","DOIUrl":null,"url":null,"abstract":"The objective of this research is to develop a longterm risk model for the development of cardiovascular disease (CVD) because of type-2 diabetes (T2D). We use the support vector machine (SVM) and the K-nearest neighbours algorithms on the dataset collected from a longitudinal study called Framingham Heart Study, to develop the prediction models. The dataset was first balanced by the Synthetic Minority Oversampling Technique algorithm. The SVM algorithm was then used to train the model, and after tuning the parameters and training for 1000 times, the average accuracy to correctly predict the prevalence of CVD due to T2D came out as 96.5% and the average recall rate was 89.8%. Similarly, we also applied the KNN algorithm to train the dataset, and the recall rate even reaches 92.9%. The advantages of our model are: 1) it can predict with high accuracy both the risk of development of T2D and CVD simultaneously; 2) it can be used without the expensive and tedious oral glucose tolerance test. The model yielded high-performance results after training on the Framingham Heart Study dataset.","PeriodicalId":163493,"journal":{"name":"2020 International Conference on UK-China Emerging Technologies (UCET)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Using Machine Learning to Predict the Future Development of Disease\",\"authors\":\"Lanxin Miao, Xuezhou Guo, H. Abbas, K. Qaraqe, Q. Abbasi\",\"doi\":\"10.1109/UCET51115.2020.9205373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of this research is to develop a longterm risk model for the development of cardiovascular disease (CVD) because of type-2 diabetes (T2D). We use the support vector machine (SVM) and the K-nearest neighbours algorithms on the dataset collected from a longitudinal study called Framingham Heart Study, to develop the prediction models. The dataset was first balanced by the Synthetic Minority Oversampling Technique algorithm. The SVM algorithm was then used to train the model, and after tuning the parameters and training for 1000 times, the average accuracy to correctly predict the prevalence of CVD due to T2D came out as 96.5% and the average recall rate was 89.8%. Similarly, we also applied the KNN algorithm to train the dataset, and the recall rate even reaches 92.9%. The advantages of our model are: 1) it can predict with high accuracy both the risk of development of T2D and CVD simultaneously; 2) it can be used without the expensive and tedious oral glucose tolerance test. The model yielded high-performance results after training on the Framingham Heart Study dataset.\",\"PeriodicalId\":163493,\"journal\":{\"name\":\"2020 International Conference on UK-China Emerging Technologies (UCET)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on UK-China Emerging Technologies (UCET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UCET51115.2020.9205373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on UK-China Emerging Technologies (UCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCET51115.2020.9205373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

本研究的目的是为2型糖尿病(T2D)引起的心血管疾病(CVD)的发展建立一个长期风险模型。我们使用支持向量机(SVM)和k近邻算法对从弗雷明汉心脏研究纵向研究中收集的数据集进行开发预测模型。数据集首先通过合成少数派过采样技术算法进行平衡。然后使用SVM算法对模型进行训练,经过参数调整和1000次训练后,正确预测T2D所致CVD患病率的平均准确率为96.5%,平均召回率为89.8%。同样,我们也采用KNN算法对数据集进行训练,召回率甚至达到了92.9%。该模型的优点是:1)能够同时准确预测t2dm和CVD的发展风险;2)可省去昂贵、繁琐的口服葡萄糖耐量试验。在Framingham Heart Study数据集上训练后,该模型产生了高性能的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using Machine Learning to Predict the Future Development of Disease
The objective of this research is to develop a longterm risk model for the development of cardiovascular disease (CVD) because of type-2 diabetes (T2D). We use the support vector machine (SVM) and the K-nearest neighbours algorithms on the dataset collected from a longitudinal study called Framingham Heart Study, to develop the prediction models. The dataset was first balanced by the Synthetic Minority Oversampling Technique algorithm. The SVM algorithm was then used to train the model, and after tuning the parameters and training for 1000 times, the average accuracy to correctly predict the prevalence of CVD due to T2D came out as 96.5% and the average recall rate was 89.8%. Similarly, we also applied the KNN algorithm to train the dataset, and the recall rate even reaches 92.9%. The advantages of our model are: 1) it can predict with high accuracy both the risk of development of T2D and CVD simultaneously; 2) it can be used without the expensive and tedious oral glucose tolerance test. The model yielded high-performance results after training on the Framingham Heart Study dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信