PREDICTIVE SIMULATION FOR TYPE II DIABETES USING DATA MINING STRATEGIES APPLIED TO BIG DATA

M. Turnea, M. Ilea
{"title":"PREDICTIVE SIMULATION FOR TYPE II DIABETES USING DATA MINING STRATEGIES APPLIED TO BIG DATA","authors":"M. Turnea, M. Ilea","doi":"10.12753/2066-026x-18-213","DOIUrl":null,"url":null,"abstract":"By recent estimation, there are over 30 million people that have diabetes only in USA. From this, around 7 million are supposed to have undiagnosed diabetes. Different countries have been made efforts to predict and avoid the risk of developing complications from this disease. The implementation of Electronic Health Records and collection of data in a national register for all the patients that have been developed diabetes is an issue to make a valid predictor for diabetes mellitus evolution, e-health stage of population and risk assessment due to various causative factors responsible for T2DM (type 2 diabetes mellitus). One approach is frequently used in diabetes prediction inspired by data mining algorithms, the decision tree, single or as mixed techniques with SVM (support vector machine), inductive learning, and clustering techniques. Data mining is applied to existing diabetes record for many years. Data mining is applied in this case to analyzing and extract new knowledge for prediction and classification based on large amount of records. Decision trees and associative classification is used as tools in this paper. Genetic data are difficult to integrate in a predictor using big data collected at national level so the main individual attributes are collected from three sources: clinical data, anthropological measures and personal and family history (related to T2DM and vascular diseases). The irrelevant rules, below a threshold are deleted in a pruning process in order to make the classification tree more efficient. The preliminary results are present along with directions of future research. We propose an architecture that can collect and predict the risk for existent records and analyses the reis for a new record triggered by update or append operation with possible storage in cloud computing.","PeriodicalId":371908,"journal":{"name":"14th International Conference eLearning and Software for Education","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"14th International Conference eLearning and Software for Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12753/2066-026x-18-213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

By recent estimation, there are over 30 million people that have diabetes only in USA. From this, around 7 million are supposed to have undiagnosed diabetes. Different countries have been made efforts to predict and avoid the risk of developing complications from this disease. The implementation of Electronic Health Records and collection of data in a national register for all the patients that have been developed diabetes is an issue to make a valid predictor for diabetes mellitus evolution, e-health stage of population and risk assessment due to various causative factors responsible for T2DM (type 2 diabetes mellitus). One approach is frequently used in diabetes prediction inspired by data mining algorithms, the decision tree, single or as mixed techniques with SVM (support vector machine), inductive learning, and clustering techniques. Data mining is applied to existing diabetes record for many years. Data mining is applied in this case to analyzing and extract new knowledge for prediction and classification based on large amount of records. Decision trees and associative classification is used as tools in this paper. Genetic data are difficult to integrate in a predictor using big data collected at national level so the main individual attributes are collected from three sources: clinical data, anthropological measures and personal and family history (related to T2DM and vascular diseases). The irrelevant rules, below a threshold are deleted in a pruning process in order to make the classification tree more efficient. The preliminary results are present along with directions of future research. We propose an architecture that can collect and predict the risk for existent records and analyses the reis for a new record triggered by update or append operation with possible storage in cloud computing.
应用大数据的数据挖掘策略对ii型糖尿病进行预测模拟
根据最近的估计,仅在美国就有超过3000万人患有糖尿病。由此推算,约有700万人患有未确诊的糖尿病。不同的国家已作出努力,以预测和避免这种疾病产生并发症的风险。实施电子健康记录和在国家登记中收集所有糖尿病患者的数据是一个问题,可以有效预测糖尿病的演变、人口的电子健康阶段和由于导致2型糖尿病的各种病因因素而进行的风险评估。受数据挖掘算法、决策树、支持向量机(SVM)、归纳学习和聚类技术的单一或混合技术的启发,一种方法经常用于糖尿病预测。数据挖掘应用于现有的糖尿病记录已有多年的历史。在这种情况下,应用数据挖掘技术对大量的记录进行分析和提取新的知识,用于预测和分类。本文使用决策树和关联分类作为工具。遗传数据很难整合到使用国家层面收集的大数据的预测器中,因此主要的个人属性是从三个来源收集的:临床数据、人类学测量和个人和家族史(与2型糖尿病和血管疾病有关)。在剪枝过程中删除低于阈值的不相关规则,以提高分类树的效率。本文给出了初步的研究结果,并提出了今后的研究方向。我们提出了一种架构,可以收集和预测现有记录的风险,并分析由更新或追加操作触发的新记录的风险,并可能在云计算中存储。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信