{"title":"PREDICTIVE SIMULATION FOR TYPE II DIABETES USING DATA MINING STRATEGIES APPLIED TO BIG DATA","authors":"M. Turnea, M. Ilea","doi":"10.12753/2066-026x-18-213","DOIUrl":null,"url":null,"abstract":"By recent estimation, there are over 30 million people that have diabetes only in USA. From this, around 7 million are supposed to have undiagnosed diabetes. Different countries have been made efforts to predict and avoid the risk of developing complications from this disease. The implementation of Electronic Health Records and collection of data in a national register for all the patients that have been developed diabetes is an issue to make a valid predictor for diabetes mellitus evolution, e-health stage of population and risk assessment due to various causative factors responsible for T2DM (type 2 diabetes mellitus). One approach is frequently used in diabetes prediction inspired by data mining algorithms, the decision tree, single or as mixed techniques with SVM (support vector machine), inductive learning, and clustering techniques. Data mining is applied to existing diabetes record for many years. Data mining is applied in this case to analyzing and extract new knowledge for prediction and classification based on large amount of records. Decision trees and associative classification is used as tools in this paper. Genetic data are difficult to integrate in a predictor using big data collected at national level so the main individual attributes are collected from three sources: clinical data, anthropological measures and personal and family history (related to T2DM and vascular diseases). The irrelevant rules, below a threshold are deleted in a pruning process in order to make the classification tree more efficient. The preliminary results are present along with directions of future research. We propose an architecture that can collect and predict the risk for existent records and analyses the reis for a new record triggered by update or append operation with possible storage in cloud computing.","PeriodicalId":371908,"journal":{"name":"14th International Conference eLearning and Software for Education","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"14th International Conference eLearning and Software for Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12753/2066-026x-18-213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
By recent estimation, there are over 30 million people that have diabetes only in USA. From this, around 7 million are supposed to have undiagnosed diabetes. Different countries have been made efforts to predict and avoid the risk of developing complications from this disease. The implementation of Electronic Health Records and collection of data in a national register for all the patients that have been developed diabetes is an issue to make a valid predictor for diabetes mellitus evolution, e-health stage of population and risk assessment due to various causative factors responsible for T2DM (type 2 diabetes mellitus). One approach is frequently used in diabetes prediction inspired by data mining algorithms, the decision tree, single or as mixed techniques with SVM (support vector machine), inductive learning, and clustering techniques. Data mining is applied to existing diabetes record for many years. Data mining is applied in this case to analyzing and extract new knowledge for prediction and classification based on large amount of records. Decision trees and associative classification is used as tools in this paper. Genetic data are difficult to integrate in a predictor using big data collected at national level so the main individual attributes are collected from three sources: clinical data, anthropological measures and personal and family history (related to T2DM and vascular diseases). The irrelevant rules, below a threshold are deleted in a pruning process in order to make the classification tree more efficient. The preliminary results are present along with directions of future research. We propose an architecture that can collect and predict the risk for existent records and analyses the reis for a new record triggered by update or append operation with possible storage in cloud computing.