Parallelly Running and Privacy-Preserving Agglomerative Hierarchical Clustering in Outsourced Cloud Computing Environments

IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Jeongsu Park;Dong Hoon Lee
{"title":"Parallelly Running and Privacy-Preserving Agglomerative Hierarchical Clustering in Outsourced Cloud Computing Environments","authors":"Jeongsu Park;Dong Hoon Lee","doi":"10.1109/TBDATA.2024.3403375","DOIUrl":null,"url":null,"abstract":"As a Big Data analysis technique, hierarchical clustering is helpful in summarizing data since it returns the clusters of the data and their clustering history. Cloud computing is the most suitable option to efficiently perform hierarchical clustering over numerous data. However, since compromised cloud service providers can cause serious privacy problems by revealing data, it is necessary to solve the problems prior to using the external cloud computing service. Privacy-preserving hierarchical clustering protocol in an outsourced computing environment has never been proposed in existing works. Existing protocols have several problems that limit the number of participating data owners or disclose the information of data. In this article, we propose a parallelly running and privacy-preserving agglomerative hierarchical clustering (ppAHC) over the union of datasets of multiple data owners in an outsourced computing environment, which is the first protocol to the best of our knowledge. The proposed ppAHC does not disclose any information about input and output, including the data access patterns. The proposed ppAHC is highly efficient and suitable for Big Data analysis to handle numerous data since its cost for one round is independent of the amount of data. It allows data owners without sufficient computing capability to participate in a collaborative hierarchical clustering.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"174-189"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10535212/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

As a Big Data analysis technique, hierarchical clustering is helpful in summarizing data since it returns the clusters of the data and their clustering history. Cloud computing is the most suitable option to efficiently perform hierarchical clustering over numerous data. However, since compromised cloud service providers can cause serious privacy problems by revealing data, it is necessary to solve the problems prior to using the external cloud computing service. Privacy-preserving hierarchical clustering protocol in an outsourced computing environment has never been proposed in existing works. Existing protocols have several problems that limit the number of participating data owners or disclose the information of data. In this article, we propose a parallelly running and privacy-preserving agglomerative hierarchical clustering (ppAHC) over the union of datasets of multiple data owners in an outsourced computing environment, which is the first protocol to the best of our knowledge. The proposed ppAHC does not disclose any information about input and output, including the data access patterns. The proposed ppAHC is highly efficient and suitable for Big Data analysis to handle numerous data since its cost for one round is independent of the amount of data. It allows data owners without sufficient computing capability to participate in a collaborative hierarchical clustering.
外包云计算环境中并行运行和隐私保护的聚类层次聚类
作为一种大数据分析技术,分层聚类可以返回数据的聚类及其聚类历史,有助于对数据进行汇总。云计算是在大量数据上有效执行分层聚类的最合适选择。但是,由于被入侵的云服务提供商可能会泄露数据,从而造成严重的隐私问题,因此有必要在使用外部云计算服务之前解决这些问题。外包计算环境下保护隐私的分层聚类协议在现有文献中尚未被提出。现有协议存在几个问题,即限制参与数据所有者的数量或泄露数据信息。在本文中,我们在外包计算环境中针对多个数据所有者的数据集联合提出了并行运行和保护隐私的聚合分层聚类(ppAHC),这是我们所知的第一个协议。提议的ppAHC没有披露任何关于输入和输出的信息,包括数据访问模式。所提出的ppAHC是高效的,适合于处理大量数据的大数据分析,因为它的一轮成本与数据量无关。它允许没有足够计算能力的数据所有者参与协作分层集群。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.80
自引率
2.80%
发文量
114
期刊介绍: The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信