Implementing WEKA for medical data classification and early disease prediction

2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT) Pub Date : 2017-02-01 DOI:10.1109/CIACT.2017.7977277

Narander Kumar, Sabita Khatri

{"title":"Implementing WEKA for medical data classification and early disease prediction","authors":"Narander Kumar, Sabita Khatri","doi":"10.1109/CIACT.2017.7977277","DOIUrl":null,"url":null,"abstract":"In recent years, the advent of latest web and data technologies has encouraged massive data growth in almost every sector. Businesses and leading industries are viewing these huge data repositories as a tool to design future strategies, prediction models by analyzing patterns and gaining knowledge from this unstructured data by applying different data mining techniques. Medical domain has now become richer in term of maintaining digital records of patients related to their diagnosis and treatment. These huge data repositories can range from patient personnel data, diagnosis, treatment histories, test diagnosis, images and various scans. This terabytes of medical data is quantity rich but weaker in information in terms of knowledge and robust tools to identify hidden patterns of knowledge specifically in medical sector. Data Mining as a field of research has already well proven capabilities of identifying hidden patterns, analysis and knowledge applied on different research domains, now gaining popularity day by day among researchers and scientist towards generating novel and deep insights of these large biomedical datasets also. Uncovering new biomedical and healthcare related knowledge in order to support clinical decision making, is another dimension of data mining. Through massive literature survey, it is found that early disease prediction is the most demanded area of research in health care sector. As health care domain is bit wider domain and having different disease characteristics, different techniques have their own prediction efficiencies, which can be enhanced and changed in order to get into most optimize way. In this research work, authors have comprehensively compared different data classification techniques and their prediction accuracy for chronic kidney disease. Authors have compared J48, Naive Bayes, Random Forest, SVM and k-NN classifiers using performance measures like ROC, kappa statistics, RMSE and MAE using WEKA tool. Authors have also compared these classifiers on various accuracy measures like TP rate, FP rate, precision, recall and f-measure by implementing on WEKA. Experimental result shows that random forest classifier has better classification accuracy over others for chronic kidney disease dataset.","PeriodicalId":218079,"journal":{"name":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIACT.2017.7977277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

Abstract

In recent years, the advent of latest web and data technologies has encouraged massive data growth in almost every sector. Businesses and leading industries are viewing these huge data repositories as a tool to design future strategies, prediction models by analyzing patterns and gaining knowledge from this unstructured data by applying different data mining techniques. Medical domain has now become richer in term of maintaining digital records of patients related to their diagnosis and treatment. These huge data repositories can range from patient personnel data, diagnosis, treatment histories, test diagnosis, images and various scans. This terabytes of medical data is quantity rich but weaker in information in terms of knowledge and robust tools to identify hidden patterns of knowledge specifically in medical sector. Data Mining as a field of research has already well proven capabilities of identifying hidden patterns, analysis and knowledge applied on different research domains, now gaining popularity day by day among researchers and scientist towards generating novel and deep insights of these large biomedical datasets also. Uncovering new biomedical and healthcare related knowledge in order to support clinical decision making, is another dimension of data mining. Through massive literature survey, it is found that early disease prediction is the most demanded area of research in health care sector. As health care domain is bit wider domain and having different disease characteristics, different techniques have their own prediction efficiencies, which can be enhanced and changed in order to get into most optimize way. In this research work, authors have comprehensively compared different data classification techniques and their prediction accuracy for chronic kidney disease. Authors have compared J48, Naive Bayes, Random Forest, SVM and k-NN classifiers using performance measures like ROC, kappa statistics, RMSE and MAE using WEKA tool. Authors have also compared these classifiers on various accuracy measures like TP rate, FP rate, precision, recall and f-measure by implementing on WEKA. Experimental result shows that random forest classifier has better classification accuracy over others for chronic kidney disease dataset.

查看原文本刊更多论文

实现医疗数据分类和疾病早期预测的WEKA

近年来，最新的网络和数据技术的出现鼓励了几乎每个领域的大规模数据增长。企业和领先行业正在将这些庞大的数据存储库视为设计未来战略的工具，通过分析模式来预测模型，并通过应用不同的数据挖掘技术从这些非结构化数据中获取知识。医疗领域现在在维护与他们的诊断和治疗有关的病人的数字记录方面变得更加丰富。这些庞大的数据存储库包括患者个人数据、诊断、治疗历史、测试诊断、图像和各种扫描。这些tb级的医疗数据数量丰富，但在知识和识别隐藏的知识模式(特别是医疗部门)的强大工具方面的信息较弱。数据挖掘作为一个研究领域已经很好地证明了识别隐藏模式，分析和应用于不同研究领域的知识的能力，现在日益受到研究人员和科学家的欢迎，以产生这些大型生物医学数据集的新颖和深刻的见解。揭示新的生物医学和医疗保健相关知识，以支持临床决策，是数据挖掘的另一个方面。通过大量的文献调查发现，疾病早期预测是卫生保健领域最需要研究的领域。由于卫生保健领域的范围较广，疾病特征不同，不同的预测技术具有不同的预测效率，这些预测效率可以通过提高和改变来达到最优。在这项研究工作中，作者全面比较了不同的数据分类技术及其对慢性肾脏疾病的预测精度。作者比较了J48、朴素贝叶斯、随机森林、支持向量机和k-NN分类器，使用WEKA工具使用ROC、kappa统计、RMSE和MAE等性能度量。作者还通过在WEKA上实现，比较了这些分类器在TP率、FP率、准确率、召回率和f-measure等各种准确度指标上的差异。实验结果表明，随机森林分类器对慢性肾脏病数据集具有更好的分类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)

自引率

0.00%

发文量