基于微扰的Jaya算法和蜻蜓启发算法的隐私保护与分类

Franklin Open Pub Date : 2025-04-18 DOI:10.1016/j.fraope.2025.100266

Dipanwita Sen, Bhupati Bhusan Mishra, Prasant Kumar Pattnaik

{"title":"基于微扰的Jaya算法和蜻蜓启发算法的隐私保护与分类","authors":"Dipanwita Sen, Bhupati Bhusan Mishra, Prasant Kumar Pattnaik","doi":"10.1016/j.fraope.2025.100266","DOIUrl":null,"url":null,"abstract":"<div><div>Healthcare datasets are very sensitive datasets. In case of unauthorized access, sensitive datasets could potentially cause damage, discrimination and unsolicited scrutiny. The patients’ health details constitute private personal information. They should not be disclosed. However, data might get stored in cloud without any protection. That is why, privacy preservation of data for healthcare dataset is a significant consideration. In this work, the Wisconsin Prognostic Breast Cancer (WBC) dataset has been used. At first, a privacy preserving schema making use of perturbation implementing Jaya Algorithm has been elucidated. Out of 30 numerical attributes in the dataset, 6 are chosen for perturbation based on their relatively high Pearson’s Correlation coefficient values. They form the initial population of Jaya Algorithm. The objective function is defined and we opt for a minimization problem. After each iteration, the algorithm generates a new population from the previous population. Thereafter, the accuracies obtained by a few traditional classification algorithms as well as classifiers based on some meta-heuristic algorithms, are observed . The classical classifiers used are Decision Tree, Random Forest, AdaBoost, KNN and GNB. The standard evaluation metrics are recorded thereafter. For privacy, the evaluation metrics used are Secrecy, Value Difference(VD), RP, RK, CP and CK. For utility, the metrics are Accuracy, Precision, Recall, F1-Score and Area Under the Curve(AUC). Jaya Algorithm is then compared with traditional perturbation algorithms like 2DRT and 3DRT. It is seen that Jaya preserves more privacy and retains more utility as suggested by mean Friedman Test Rankings.. Among the metaheuristic optimization based classifiers, only the Dragonfly inspired Classifier(DIC) classifies over 90 % of the records correctly for the perturbed dataset. For classification, the perturbed dataset is fed into the DIC as the original population. The target is to minimize the distance between the testing dataset points and the centroids assigned to them. The new centroids are calculated using the updated training set points only. The updated dragonflies are assigned new centroids at each stage. All these simulations have been implemented in Python environment.</div></div>","PeriodicalId":100554,"journal":{"name":"Franklin Open","volume":"11 ","pages":"Article 100266"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Perturbation based privacy preservation and classification using Jaya Algorithm and Dragonfly Inspired Algorithm\",\"authors\":\"Dipanwita Sen, Bhupati Bhusan Mishra, Prasant Kumar Pattnaik\",\"doi\":\"10.1016/j.fraope.2025.100266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Healthcare datasets are very sensitive datasets. In case of unauthorized access, sensitive datasets could potentially cause damage, discrimination and unsolicited scrutiny. The patients’ health details constitute private personal information. They should not be disclosed. However, data might get stored in cloud without any protection. That is why, privacy preservation of data for healthcare dataset is a significant consideration. In this work, the Wisconsin Prognostic Breast Cancer (WBC) dataset has been used. At first, a privacy preserving schema making use of perturbation implementing Jaya Algorithm has been elucidated. Out of 30 numerical attributes in the dataset, 6 are chosen for perturbation based on their relatively high Pearson’s Correlation coefficient values. They form the initial population of Jaya Algorithm. The objective function is defined and we opt for a minimization problem. After each iteration, the algorithm generates a new population from the previous population. Thereafter, the accuracies obtained by a few traditional classification algorithms as well as classifiers based on some meta-heuristic algorithms, are observed . The classical classifiers used are Decision Tree, Random Forest, AdaBoost, KNN and GNB. The standard evaluation metrics are recorded thereafter. For privacy, the evaluation metrics used are Secrecy, Value Difference(VD), RP, RK, CP and CK. For utility, the metrics are Accuracy, Precision, Recall, F1-Score and Area Under the Curve(AUC). Jaya Algorithm is then compared with traditional perturbation algorithms like 2DRT and 3DRT. It is seen that Jaya preserves more privacy and retains more utility as suggested by mean Friedman Test Rankings.. Among the metaheuristic optimization based classifiers, only the Dragonfly inspired Classifier(DIC) classifies over 90 % of the records correctly for the perturbed dataset. For classification, the perturbed dataset is fed into the DIC as the original population. The target is to minimize the distance between the testing dataset points and the centroids assigned to them. The new centroids are calculated using the updated training set points only. The updated dragonflies are assigned new centroids at each stage. All these simulations have been implemented in Python environment.</div></div>\",\"PeriodicalId\":100554,\"journal\":{\"name\":\"Franklin Open\",\"volume\":\"11 \",\"pages\":\"Article 100266\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Franklin Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2773186325000568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Franklin Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773186325000568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

医疗保健数据集是非常敏感的数据集。在未经授权访问的情况下，敏感数据集可能会造成损害、歧视和未经请求的审查。患者的健康细节构成个人隐私信息。它们不应该被披露。然而，数据可能在没有任何保护的情况下存储在云中。这就是为什么医疗数据集数据的隐私保护是一个重要的考虑因素。在这项工作中，威斯康星预后乳腺癌（WBC）数据集已被使用。首先，提出了一种利用微扰实现Jaya算法的隐私保护模式。在数据集中的30个数值属性中，选择6个基于相对较高的Pearson相关系数值进行扰动。它们构成了Jaya算法的初始种群。我们定义了目标函数，并选择最小化问题。每次迭代后，算法都会从之前的种群中生成一个新的种群。然后，观察了几种传统分类算法以及基于元启发式算法的分类器所获得的准确率。使用的经典分类器有决策树、随机森林、AdaBoost、KNN和GNB。随后记录标准评估度量。对于隐私，使用的评估指标是保密性、价值差异（VD）、RP、RK、CP和CK。对于效用，度量标准是准确性、精密度、召回率、f1分数和曲线下面积（AUC）。然后将Jaya算法与传统的扰动算法如2DRT和3DRT进行比较。可以看出，Jaya保留了更多的隐私，并保留了更多的效用，正如弗里德曼测试排名所建议的那样。在基于元启发式优化的分类器中，只有蜻蜓启发分类器（DIC）对扰动数据集的分类正确率超过90%。对于分类，扰动数据集作为原始总体输入到DIC中。目标是最小化测试数据集点与分配给它们的质心之间的距离。新的质心只使用更新的训练集点计算。更新后的蜻蜓在每个阶段被分配新的质心。所有这些模拟都是在Python环境中实现的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Perturbation based privacy preservation and classification using Jaya Algorithm and Dragonfly Inspired Algorithm

Healthcare datasets are very sensitive datasets. In case of unauthorized access, sensitive datasets could potentially cause damage, discrimination and unsolicited scrutiny. The patients’ health details constitute private personal information. They should not be disclosed. However, data might get stored in cloud without any protection. That is why, privacy preservation of data for healthcare dataset is a significant consideration. In this work, the Wisconsin Prognostic Breast Cancer (WBC) dataset has been used. At first, a privacy preserving schema making use of perturbation implementing Jaya Algorithm has been elucidated. Out of 30 numerical attributes in the dataset, 6 are chosen for perturbation based on their relatively high Pearson’s Correlation coefficient values. They form the initial population of Jaya Algorithm. The objective function is defined and we opt for a minimization problem. After each iteration, the algorithm generates a new population from the previous population. Thereafter, the accuracies obtained by a few traditional classification algorithms as well as classifiers based on some meta-heuristic algorithms, are observed . The classical classifiers used are Decision Tree, Random Forest, AdaBoost, KNN and GNB. The standard evaluation metrics are recorded thereafter. For privacy, the evaluation metrics used are Secrecy, Value Difference(VD), RP, RK, CP and CK. For utility, the metrics are Accuracy, Precision, Recall, F1-Score and Area Under the Curve(AUC). Jaya Algorithm is then compared with traditional perturbation algorithms like 2DRT and 3DRT. It is seen that Jaya preserves more privacy and retains more utility as suggested by mean Friedman Test Rankings.. Among the metaheuristic optimization based classifiers, only the Dragonfly inspired Classifier(DIC) classifies over 90 % of the records correctly for the perturbed dataset. For classification, the perturbed dataset is fed into the DIC as the original population. The target is to minimize the distance between the testing dataset points and the centroids assigned to them. The new centroids are calculated using the updated training set points only. The updated dragonflies are assigned new centroids at each stage. All these simulations have been implemented in Python environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Franklin Open

自引率

0.00%

发文量