Saikiran Bulusu;Venkata Gandikota;Arya Mazumdar;Ankit Singh Rawat;Pramod K. Varshney
{"title":"Robust Distributed Clustering With Redundant Data Assignment","authors":"Saikiran Bulusu;Venkata Gandikota;Arya Mazumdar;Ankit Singh Rawat;Pramod K. Varshney","doi":"10.1109/TIT.2025.3536323","DOIUrl":null,"url":null,"abstract":"In this work, we present distributed clustering algorithms that can handle large-scale data across multiple machines in the presence of faulty machines. These faulty machines can either be straggling machines that fail to respond within a stipulated time or Byzantines that send arbitrary responses. We propose redundant data assignment schemes that enable us to obtain clustering solutions based on the entire dataset, even when some machines are stragglers or adversarial in nature. Our proposed robust clustering algorithms generate a constant factor approximate solution in the presence of stragglers or Byzantines. We also provide various constructions of the data assignment scheme that provide resilience against a large fraction of faulty machines. Simulation results show that the distributed algorithms based on the proposed assignment scheme provide good-quality solutions for a variety of clustering problems.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 4","pages":"2888-2908"},"PeriodicalIF":2.2000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10857440/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this work, we present distributed clustering algorithms that can handle large-scale data across multiple machines in the presence of faulty machines. These faulty machines can either be straggling machines that fail to respond within a stipulated time or Byzantines that send arbitrary responses. We propose redundant data assignment schemes that enable us to obtain clustering solutions based on the entire dataset, even when some machines are stragglers or adversarial in nature. Our proposed robust clustering algorithms generate a constant factor approximate solution in the presence of stragglers or Byzantines. We also provide various constructions of the data assignment scheme that provide resilience against a large fraction of faulty machines. Simulation results show that the distributed algorithms based on the proposed assignment scheme provide good-quality solutions for a variety of clustering problems.
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.