Optimization of machine learning models through quantization and data bit reduction in healthcare datasets

Franklin Open Pub Date : 2024-07-18 DOI:10.1016/j.fraope.2024.100136

Mitul Goswami, Suneeta Mohanty, Prasant Kumar Pattnaik

{"title":"Optimization of machine learning models through quantization and data bit reduction in healthcare datasets","authors":"Mitul Goswami, Suneeta Mohanty, Prasant Kumar Pattnaik","doi":"10.1016/j.fraope.2024.100136","DOIUrl":null,"url":null,"abstract":"<div><p>This study focuses on enhancing complex machine learning models through quantization and data bit reduction. The primary goal is to reduce processing time while maintaining model performance, which is particularly relevant for intricate models with prolonged execution times. The research employs two medical datasets, namely Heart Disease Prediction and Breast Cancer Detection, and applies optimization techniques to K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) machine-learning models. To achieve optimization, the study employs effective quantization and data bit reduction techniques such as QuantileTransformer, Numpy.round, and KBinsDiscretizer functions. These techniques are utilized to convert input data from float64 to float32 and int32, resulting in a streamlined data representation. The trade-off between processing time and model accuracy is explored, acknowledging that some compromise in accuracy might occur after optimization. The experimentation reveals that there is a noticeable reduction in time complexity after optimization, with a marginal impact on model accuracy. Interestingly, the study concludes that the outcome and efficiency of optimization techniques are influenced not only by the specific technique used but also by the nature of the dataset and machine learning model under consideration. This comprehensive research showcases the applicability of optimization techniques, specifically quantization and data bit reduction, in complex machine learning models. By conducting experiments on medical datasets and analyzing KNN and SVM models, the study underscores the delicate balance between processing time and model accuracy. The findings emphasize that the success of optimization strategies is context-dependent, relying not only on the chosen technique but also on the interplay between the technique, model, and dataset.</p></div>","PeriodicalId":100554,"journal":{"name":"Franklin Open","volume":"8 ","pages":"Article 100136"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2773186324000665/pdfft?md5=747579cc40b43fe2e9974f0adcbb501a&pid=1-s2.0-S2773186324000665-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Franklin Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773186324000665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This study focuses on enhancing complex machine learning models through quantization and data bit reduction. The primary goal is to reduce processing time while maintaining model performance, which is particularly relevant for intricate models with prolonged execution times. The research employs two medical datasets, namely Heart Disease Prediction and Breast Cancer Detection, and applies optimization techniques to K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) machine-learning models. To achieve optimization, the study employs effective quantization and data bit reduction techniques such as QuantileTransformer, Numpy.round, and KBinsDiscretizer functions. These techniques are utilized to convert input data from float64 to float32 and int32, resulting in a streamlined data representation. The trade-off between processing time and model accuracy is explored, acknowledging that some compromise in accuracy might occur after optimization. The experimentation reveals that there is a noticeable reduction in time complexity after optimization, with a marginal impact on model accuracy. Interestingly, the study concludes that the outcome and efficiency of optimization techniques are influenced not only by the specific technique used but also by the nature of the dataset and machine learning model under consideration. This comprehensive research showcases the applicability of optimization techniques, specifically quantization and data bit reduction, in complex machine learning models. By conducting experiments on medical datasets and analyzing KNN and SVM models, the study underscores the delicate balance between processing time and model accuracy. The findings emphasize that the success of optimization strategies is context-dependent, relying not only on the chosen technique but also on the interplay between the technique, model, and dataset.

查看原文本刊更多论文

通过量化和减少医疗数据集中的数据位，优化机器学习模型

本研究的重点是通过量化和减少数据位来增强复杂的机器学习模型。其主要目标是在保持模型性能的同时减少处理时间，这对于执行时间较长的复杂模型尤为重要。研究采用了两个医学数据集，即心脏病预测和乳腺癌检测，并将优化技术应用于 K-Nearest Neighbors (KNN) 和支持向量机 (SVM) 机器学习模型。为实现优化，该研究采用了有效的量化和数据位缩减技术，如 QuantileTransformer、Numpy.round 和 KBinsDiscretizer 函数。利用这些技术将输入数据从 float64 转换为 float32 和 int32，从而简化了数据表示。我们探讨了处理时间与模型准确性之间的权衡问题，并承认优化后可能会在准确性方面出现一些折衷。实验表明，优化后时间复杂性明显降低，但对模型准确性的影响微乎其微。有趣的是，研究得出结论，优化技术的结果和效率不仅受到所使用的特定技术的影响，还受到所考虑的数据集和机器学习模型性质的影响。这项综合研究展示了优化技术，特别是量化和数据位缩减技术在复杂机器学习模型中的适用性。通过在医学数据集上进行实验并分析 KNN 和 SVM 模型，该研究强调了处理时间与模型准确性之间的微妙平衡。研究结果强调，优化策略的成功与否取决于具体情况，不仅取决于所选技术，还取决于技术、模型和数据集之间的相互作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Franklin Open

自引率

0.00%

发文量