{"title":"Parallelly Running and Privacy-Preserving Agglomerative Hierarchical Clustering in Outsourced Cloud Computing Environments","authors":"Jeongsu Park;Dong Hoon Lee","doi":"10.1109/TBDATA.2024.3403375","DOIUrl":null,"url":null,"abstract":"As a Big Data analysis technique, hierarchical clustering is helpful in summarizing data since it returns the clusters of the data and their clustering history. Cloud computing is the most suitable option to efficiently perform hierarchical clustering over numerous data. However, since compromised cloud service providers can cause serious privacy problems by revealing data, it is necessary to solve the problems prior to using the external cloud computing service. Privacy-preserving hierarchical clustering protocol in an outsourced computing environment has never been proposed in existing works. Existing protocols have several problems that limit the number of participating data owners or disclose the information of data. In this article, we propose a parallelly running and privacy-preserving agglomerative hierarchical clustering (ppAHC) over the union of datasets of multiple data owners in an outsourced computing environment, which is the first protocol to the best of our knowledge. The proposed ppAHC does not disclose any information about input and output, including the data access patterns. The proposed ppAHC is highly efficient and suitable for Big Data analysis to handle numerous data since its cost for one round is independent of the amount of data. It allows data owners without sufficient computing capability to participate in a collaborative hierarchical clustering.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"174-189"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10535212/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
As a Big Data analysis technique, hierarchical clustering is helpful in summarizing data since it returns the clusters of the data and their clustering history. Cloud computing is the most suitable option to efficiently perform hierarchical clustering over numerous data. However, since compromised cloud service providers can cause serious privacy problems by revealing data, it is necessary to solve the problems prior to using the external cloud computing service. Privacy-preserving hierarchical clustering protocol in an outsourced computing environment has never been proposed in existing works. Existing protocols have several problems that limit the number of participating data owners or disclose the information of data. In this article, we propose a parallelly running and privacy-preserving agglomerative hierarchical clustering (ppAHC) over the union of datasets of multiple data owners in an outsourced computing environment, which is the first protocol to the best of our knowledge. The proposed ppAHC does not disclose any information about input and output, including the data access patterns. The proposed ppAHC is highly efficient and suitable for Big Data analysis to handle numerous data since its cost for one round is independent of the amount of data. It allows data owners without sufficient computing capability to participate in a collaborative hierarchical clustering.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.