R. Ramani , S. Edwin Raja , D. Dhinakaran , S. Jagan , G. Prabaharan
{"title":"MapReduce based big data framework using associative Kruskal poly Kernel classifier for diabetic disease prediction","authors":"R. Ramani , S. Edwin Raja , D. Dhinakaran , S. Jagan , G. Prabaharan","doi":"10.1016/j.mex.2025.103210","DOIUrl":null,"url":null,"abstract":"<div><div>Recent trendy applications of Artificial Intelligence are Machine Learning (ML) algorithms, which have been extensively utilized for processes like pattern recognition, object classification, effective prediction of disease etc. However, ML techniques are reasonable solutions to computation methods and modeling, especially when the data size is enormous. These facts are established due to the reason that big data field has received considerable attention from both the industrial experts and academicians. The computation process must be accelerated to achieve early disease prediction in order to accomplish the prospects of ML for big data applications. In this paper, a method named “Associative Kruskal Wallis and MapReduce Poly Kernel (AKW-MRPK)\" is presented for early disease prediction. Initially, significant attributes are selected by applying Associative Kruskal Wallis Feature Selection model. This study parallelizes polynomial kernel vector using MapReduce based on the significant qualities gained, which will become a significant computing model to facilitate the early prognosis of disease. The proposed AKW-MRPK framework achieves up to 92 % accuracy, reduces computational time to as low as 0.875 ms for 25 patients, and demonstrates superior speedup efficiency with a value of 1.9 ms using two computational nodes, consistently outperforming supervised machine learning algorithms and Hadoop-based clusters across these critical metrics.<ul><li><span>•</span><span><div>The AKW-MRPK method selects attributes and accelerates computations for predictions.</div></span></li><li><span>•</span><span><div>Parallelizing polynomial kernels improves accuracy and speed in healthcare data analysis.</div></span></li></ul></div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"14 ","pages":"Article 103210"},"PeriodicalIF":1.6000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125000573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Recent trendy applications of Artificial Intelligence are Machine Learning (ML) algorithms, which have been extensively utilized for processes like pattern recognition, object classification, effective prediction of disease etc. However, ML techniques are reasonable solutions to computation methods and modeling, especially when the data size is enormous. These facts are established due to the reason that big data field has received considerable attention from both the industrial experts and academicians. The computation process must be accelerated to achieve early disease prediction in order to accomplish the prospects of ML for big data applications. In this paper, a method named “Associative Kruskal Wallis and MapReduce Poly Kernel (AKW-MRPK)" is presented for early disease prediction. Initially, significant attributes are selected by applying Associative Kruskal Wallis Feature Selection model. This study parallelizes polynomial kernel vector using MapReduce based on the significant qualities gained, which will become a significant computing model to facilitate the early prognosis of disease. The proposed AKW-MRPK framework achieves up to 92 % accuracy, reduces computational time to as low as 0.875 ms for 25 patients, and demonstrates superior speedup efficiency with a value of 1.9 ms using two computational nodes, consistently outperforming supervised machine learning algorithms and Hadoop-based clusters across these critical metrics.
•
The AKW-MRPK method selects attributes and accelerates computations for predictions.
•
Parallelizing polynomial kernels improves accuracy and speed in healthcare data analysis.