Dipanwita Sen, Bhupati Bhusan Mishra, Prasant Kumar Pattnaik
{"title":"基于微扰的Jaya算法和蜻蜓启发算法的隐私保护与分类","authors":"Dipanwita Sen, Bhupati Bhusan Mishra, Prasant Kumar Pattnaik","doi":"10.1016/j.fraope.2025.100266","DOIUrl":null,"url":null,"abstract":"<div><div>Healthcare datasets are very sensitive datasets. In case of unauthorized access, sensitive datasets could potentially cause damage, discrimination and unsolicited scrutiny. The patients’ health details constitute private personal information. They should not be disclosed. However, data might get stored in cloud without any protection. That is why, privacy preservation of data for healthcare dataset is a significant consideration. In this work, the Wisconsin Prognostic Breast Cancer (WBC) dataset has been used. At first, a privacy preserving schema making use of perturbation implementing Jaya Algorithm has been elucidated. Out of 30 numerical attributes in the dataset, 6 are chosen for perturbation based on their relatively high Pearson’s Correlation coefficient values. They form the initial population of Jaya Algorithm. The objective function is defined and we opt for a minimization problem. After each iteration, the algorithm generates a new population from the previous population. Thereafter, the accuracies obtained by a few traditional classification algorithms as well as classifiers based on some meta-heuristic algorithms, are observed . The classical classifiers used are Decision Tree, Random Forest, AdaBoost, KNN and GNB. The standard evaluation metrics are recorded thereafter. For privacy, the evaluation metrics used are Secrecy, Value Difference(VD), RP, RK, CP and CK. For utility, the metrics are Accuracy, Precision, Recall, F1-Score and Area Under the Curve(AUC). Jaya Algorithm is then compared with traditional perturbation algorithms like 2DRT and 3DRT. It is seen that Jaya preserves more privacy and retains more utility as suggested by mean Friedman Test Rankings.. Among the metaheuristic optimization based classifiers, only the Dragonfly inspired Classifier(DIC) classifies over 90 % of the records correctly for the perturbed dataset. For classification, the perturbed dataset is fed into the DIC as the original population. The target is to minimize the distance between the testing dataset points and the centroids assigned to them. The new centroids are calculated using the updated training set points only. The updated dragonflies are assigned new centroids at each stage. All these simulations have been implemented in Python environment.</div></div>","PeriodicalId":100554,"journal":{"name":"Franklin Open","volume":"11 ","pages":"Article 100266"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Perturbation based privacy preservation and classification using Jaya Algorithm and Dragonfly Inspired Algorithm\",\"authors\":\"Dipanwita Sen, Bhupati Bhusan Mishra, Prasant Kumar Pattnaik\",\"doi\":\"10.1016/j.fraope.2025.100266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Healthcare datasets are very sensitive datasets. In case of unauthorized access, sensitive datasets could potentially cause damage, discrimination and unsolicited scrutiny. The patients’ health details constitute private personal information. They should not be disclosed. However, data might get stored in cloud without any protection. That is why, privacy preservation of data for healthcare dataset is a significant consideration. In this work, the Wisconsin Prognostic Breast Cancer (WBC) dataset has been used. At first, a privacy preserving schema making use of perturbation implementing Jaya Algorithm has been elucidated. Out of 30 numerical attributes in the dataset, 6 are chosen for perturbation based on their relatively high Pearson’s Correlation coefficient values. They form the initial population of Jaya Algorithm. The objective function is defined and we opt for a minimization problem. After each iteration, the algorithm generates a new population from the previous population. Thereafter, the accuracies obtained by a few traditional classification algorithms as well as classifiers based on some meta-heuristic algorithms, are observed . The classical classifiers used are Decision Tree, Random Forest, AdaBoost, KNN and GNB. The standard evaluation metrics are recorded thereafter. For privacy, the evaluation metrics used are Secrecy, Value Difference(VD), RP, RK, CP and CK. For utility, the metrics are Accuracy, Precision, Recall, F1-Score and Area Under the Curve(AUC). Jaya Algorithm is then compared with traditional perturbation algorithms like 2DRT and 3DRT. It is seen that Jaya preserves more privacy and retains more utility as suggested by mean Friedman Test Rankings.. Among the metaheuristic optimization based classifiers, only the Dragonfly inspired Classifier(DIC) classifies over 90 % of the records correctly for the perturbed dataset. For classification, the perturbed dataset is fed into the DIC as the original population. The target is to minimize the distance between the testing dataset points and the centroids assigned to them. The new centroids are calculated using the updated training set points only. The updated dragonflies are assigned new centroids at each stage. All these simulations have been implemented in Python environment.</div></div>\",\"PeriodicalId\":100554,\"journal\":{\"name\":\"Franklin Open\",\"volume\":\"11 \",\"pages\":\"Article 100266\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Franklin Open\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2773186325000568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Franklin Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773186325000568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Perturbation based privacy preservation and classification using Jaya Algorithm and Dragonfly Inspired Algorithm
Healthcare datasets are very sensitive datasets. In case of unauthorized access, sensitive datasets could potentially cause damage, discrimination and unsolicited scrutiny. The patients’ health details constitute private personal information. They should not be disclosed. However, data might get stored in cloud without any protection. That is why, privacy preservation of data for healthcare dataset is a significant consideration. In this work, the Wisconsin Prognostic Breast Cancer (WBC) dataset has been used. At first, a privacy preserving schema making use of perturbation implementing Jaya Algorithm has been elucidated. Out of 30 numerical attributes in the dataset, 6 are chosen for perturbation based on their relatively high Pearson’s Correlation coefficient values. They form the initial population of Jaya Algorithm. The objective function is defined and we opt for a minimization problem. After each iteration, the algorithm generates a new population from the previous population. Thereafter, the accuracies obtained by a few traditional classification algorithms as well as classifiers based on some meta-heuristic algorithms, are observed . The classical classifiers used are Decision Tree, Random Forest, AdaBoost, KNN and GNB. The standard evaluation metrics are recorded thereafter. For privacy, the evaluation metrics used are Secrecy, Value Difference(VD), RP, RK, CP and CK. For utility, the metrics are Accuracy, Precision, Recall, F1-Score and Area Under the Curve(AUC). Jaya Algorithm is then compared with traditional perturbation algorithms like 2DRT and 3DRT. It is seen that Jaya preserves more privacy and retains more utility as suggested by mean Friedman Test Rankings.. Among the metaheuristic optimization based classifiers, only the Dragonfly inspired Classifier(DIC) classifies over 90 % of the records correctly for the perturbed dataset. For classification, the perturbed dataset is fed into the DIC as the original population. The target is to minimize the distance between the testing dataset points and the centroids assigned to them. The new centroids are calculated using the updated training set points only. The updated dragonflies are assigned new centroids at each stage. All these simulations have been implemented in Python environment.