PPS-FPCM: PRIVACY-PRESERVING SEMI-FUZZY POSSIBILISTIC C-MEANS

International Journal of Advanced Research in Computer Science Pub Date : 2023-06-20 DOI:10.26483/ijarcs.v14i3.6991

M. Mahfouz

{"title":"PPS-FPCM: PRIVACY-PRESERVING SEMI-FUZZY POSSIBILISTIC C-MEANS","authors":"M. Mahfouz","doi":"10.26483/ijarcs.v14i3.6991","DOIUrl":null,"url":null,"abstract":"Applying traditional clustering techniques to big data on the cloud while preserving the privacy of the data is a challenge due to the required division and exponential operations in each iteration, which complicate its implementation on encrypted data. Several existing approaches are based on approximating the formulas of centers, weights, and memberships as three polynomial functions according to the multivariate Taylor formula. However, they usually suffer an increase in complexity and a slight drop in accuracy. In this paper, a novel Privacy-Preserving semi-fuzzy clustering algorithm based on the possibilistic paradigm, termed PPS-FPCM, is presented. Its main feature is that it avoids exponentiation and division operations, at each iteration, without losing accuracy. By restricting the typicality to an ordered set of discrete values between zero and one decided by the data owner (DO), the computation is simplified. The second key idea is the use of this soft typicality to detect outliers and compute the corresponding semi-fuzzy memberships, which is used to increase the in-between cluster distance. However, the initial typicality requires a magnitude relation comparison, which is still difficult to do over encrypted data. In this research study, we show how the existing incomplete re-encryption method can be used to tackle this problem. In each iteration, centers and distances to the new centers are computed on a calculator cloud server (CaCS) which is responsible for storing the cipher texts of the (DO)’s data and processing them. Then, CaCS sends the incompletely re-encrypted difference between these distances and iteratively updated bin values that correspond to the discrete possibilistic memberships that are initially decided by the (DO) to the comparator cloud server (CoCS). CoCS decrypts the difference and returns the results of comparisons. When CaCS receives the results of comparison from CoCS, it decides on an appropriate soft typicality or resends the difference of the same distance to another bin value. The required number of comparisons is O(log the number of bins). CaCS iteratively computes the corresponding semi-fuzzy memberships, computes the refined memberships, and updates the centers. In the end, CaCS sends the final soft memberships and centers to the (DO). The proposed algorithm is applicable to normal data and homomorphically encrypted data, is more effective than several related algorithms, and can produce accurate results using large enough (16 or more) discrete values with a high reduction on runtime as the number of comparisons is much less complex than exponential and division operations with added communication cost between CaCS and CoCS.","PeriodicalId":287911,"journal":{"name":"International Journal of Advanced Research in Computer Science","volume":"139 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Research in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26483/ijarcs.v14i3.6991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Applying traditional clustering techniques to big data on the cloud while preserving the privacy of the data is a challenge due to the required division and exponential operations in each iteration, which complicate its implementation on encrypted data. Several existing approaches are based on approximating the formulas of centers, weights, and memberships as three polynomial functions according to the multivariate Taylor formula. However, they usually suffer an increase in complexity and a slight drop in accuracy. In this paper, a novel Privacy-Preserving semi-fuzzy clustering algorithm based on the possibilistic paradigm, termed PPS-FPCM, is presented. Its main feature is that it avoids exponentiation and division operations, at each iteration, without losing accuracy. By restricting the typicality to an ordered set of discrete values between zero and one decided by the data owner (DO), the computation is simplified. The second key idea is the use of this soft typicality to detect outliers and compute the corresponding semi-fuzzy memberships, which is used to increase the in-between cluster distance. However, the initial typicality requires a magnitude relation comparison, which is still difficult to do over encrypted data. In this research study, we show how the existing incomplete re-encryption method can be used to tackle this problem. In each iteration, centers and distances to the new centers are computed on a calculator cloud server (CaCS) which is responsible for storing the cipher texts of the (DO)’s data and processing them. Then, CaCS sends the incompletely re-encrypted difference between these distances and iteratively updated bin values that correspond to the discrete possibilistic memberships that are initially decided by the (DO) to the comparator cloud server (CoCS). CoCS decrypts the difference and returns the results of comparisons. When CaCS receives the results of comparison from CoCS, it decides on an appropriate soft typicality or resends the difference of the same distance to another bin value. The required number of comparisons is O(log the number of bins). CaCS iteratively computes the corresponding semi-fuzzy memberships, computes the refined memberships, and updates the centers. In the end, CaCS sends the final soft memberships and centers to the (DO). The proposed algorithm is applicable to normal data and homomorphically encrypted data, is more effective than several related algorithms, and can produce accurate results using large enough (16 or more) discrete values with a high reduction on runtime as the number of comparisons is much less complex than exponential and division operations with added communication cost between CaCS and CoCS.

查看原文本刊更多论文

Pps-fpcm:隐私保护的半模糊可能性c均值

将传统的聚类技术应用于云上的大数据，同时保持数据的隐私性是一个挑战，因为每次迭代都需要除法和指数运算，这使得其在加密数据上的实现变得复杂。现有的几种方法是根据多元泰勒公式将中心、权值和隶属度的公式近似为三个多项式函数。然而，它们通常会增加复杂性和略微降低准确性。提出了一种新的基于可能性范式的隐私保护半模糊聚类算法PPS-FPCM。它的主要特点是在每次迭代中避免了取幂和除法操作，而不会失去精度。通过将典型性限制为由数据所有者(DO)决定的0到1之间的有序离散值集，简化了计算。第二个关键思想是利用这种软典型性来检测异常值并计算相应的半模糊隶属度，用于增加簇间距离。然而，初始典型性需要进行幅度关系比较，这在加密数据上仍然很难做到。在本研究中，我们展示了如何使用现有的不完全重加密方法来解决这个问题。在每次迭代中，中心和到新中心的距离在计算器云服务器(CaCS)上计算，该服务器负责存储(DO)数据的密文并对其进行处理。然后，CaCS将这些距离和迭代更新的bin值之间未完全重新加密的差异发送给比较器云服务器(CoCS)，这些值与最初由(DO)决定的离散可能性成员关系相对应。CoCS解密差异并返回比较结果。当CaCS接收到CoCS的比较结果时，它决定适当的软典型性或将相同距离的差值重发到另一个bin值。所需的比较次数是0(记录箱子的数量)。CaCS迭代计算相应的半模糊隶属度，计算精化隶属度，并更新中心。最后，CaCS将最终的软会员和中心发送给DO。该算法适用于正常数据和同态加密数据，比几种相关算法更有效，并且可以使用足够大(16或更多)的离散值产生准确的结果，并且运行时大幅减少，因为比较次数远低于指数运算和除法运算，但增加了CaCS和CoCS之间的通信成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Advanced Research in Computer Science

自引率

0.00%

发文量