Jiayun Fu, Xiaojing Ma, Bin B. Zhu, Pingyi Hu, Ruixin Zhao, Yaru Jia, Peng Xu, Hai Jin, Dongmei Zhang
{"title":"Focusing on Pinocchio's Nose: A Gradients Scrutinizer to Thwart Split-Learning Hijacking Attacks Using Intrinsic Attributes","authors":"Jiayun Fu, Xiaojing Ma, Bin B. Zhu, Pingyi Hu, Ruixin Zhao, Yaru Jia, Peng Xu, Hai Jin, Dongmei Zhang","doi":"10.14722/ndss.2023.24874","DOIUrl":null,"url":null,"abstract":"—Split learning is privacy-preserving distributed learning that has gained momentum recently. It also faces new security challenges. FSHA [37] is a serious threat to split learning. In FSHA, a malicious server hijacks training to trick clients to train the encoder of an autoencoder instead of a classification model. Intermediate results sent to the server by a client are actually latent codes of private training samples, which can be reconstructed with high fidelity from the received codes with the decoder of the autoencoder. SplitGuard [10] is the only existing effective defense against hijacking attacks. It is an active method that injects falsely labeled data to incur abnormal behaviors to detect hijacking attacks. Such injection also incurs an adverse impact on honest training of intended models. In this paper, we first show that SplitGuard is vulnerable to an adaptive hijacking attack named SplitSpy. SplitSpy exploits the same property that SplitGuard exploits to detect hijacking attacks. In SplitSpy, a malicious server maintains a shadow model that performs the intended task to detect falsely labeled data and evade SplitGuard. Our experimental evaluation indicates that SplitSpy can effectively evade SplitGuard. Then we propose a novel passive detection method, named Gradients Scrutinizer, which relies on intrinsic differences between gradients from an intended model and those from a malicious model: the expected similarity among gradients of same-label samples differs from the expected similarity among gradients of different-label samples for an intended model, while they are the same for a malicious model. This intrinsic distinguishability","PeriodicalId":199733,"journal":{"name":"Proceedings 2023 Network and Distributed System Security Symposium","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2023 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2023.24874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
—Split learning is privacy-preserving distributed learning that has gained momentum recently. It also faces new security challenges. FSHA [37] is a serious threat to split learning. In FSHA, a malicious server hijacks training to trick clients to train the encoder of an autoencoder instead of a classification model. Intermediate results sent to the server by a client are actually latent codes of private training samples, which can be reconstructed with high fidelity from the received codes with the decoder of the autoencoder. SplitGuard [10] is the only existing effective defense against hijacking attacks. It is an active method that injects falsely labeled data to incur abnormal behaviors to detect hijacking attacks. Such injection also incurs an adverse impact on honest training of intended models. In this paper, we first show that SplitGuard is vulnerable to an adaptive hijacking attack named SplitSpy. SplitSpy exploits the same property that SplitGuard exploits to detect hijacking attacks. In SplitSpy, a malicious server maintains a shadow model that performs the intended task to detect falsely labeled data and evade SplitGuard. Our experimental evaluation indicates that SplitSpy can effectively evade SplitGuard. Then we propose a novel passive detection method, named Gradients Scrutinizer, which relies on intrinsic differences between gradients from an intended model and those from a malicious model: the expected similarity among gradients of same-label samples differs from the expected similarity among gradients of different-label samples for an intended model, while they are the same for a malicious model. This intrinsic distinguishability