Anisa Halimi, Leonard Dervishi, Erman Ayday, Apostolos Pyrgelis, Juan Ramón Troncoso-Pastoriza, Jean-Pierre Hubaux, Xiaoqian Jiang, Jaideep Vaidya
{"title":"Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies.","authors":"Anisa Halimi, Leonard Dervishi, Erman Ayday, Apostolos Pyrgelis, Juan Ramón Troncoso-Pastoriza, Jean-Pierre Hubaux, Xiaoqian Jiang, Jaideep Vaidya","doi":"10.56553/popets-2022-0094","DOIUrl":"10.56553/popets-2022-0094","url":null,"abstract":"<p><p>Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. In this work, we propose a framework that verifies the correctness of the aggregate statistics obtained as a result of a genome-wide association study (GWAS) conducted by a researcher while protecting individuals' privacy in the researcher's dataset. In GWAS, the goal of the researcher is to identify highly associated point mutations (variants) with a given phenotype. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another GWAS (conducted using publicly available datasets) to distinguish between correct statistics and incorrect ones. For evaluation, we use real genomic data and show that the correctness of the workflow output can be verified with high accuracy even when the aggregate statistics of a small number of variants are provided. We also quantify the privacy leakage due to the provided workflow and its associated metadata and show that the additional privacy risk due to the provided metadata does not increase the existing privacy risk due to sharing of the research results. Thus, our results show that the workflow output (i.e., research results) can be verified with high confidence in a privacy-preserving way. We believe that this work will be a valuable step towards providing provenance in a privacy-preserving way while providing guarantees to the users about the correctness of the results.</p>","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 3","pages":"732-753"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9536480/pdf/nihms-1802603.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33517178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FP-Radar: Longitudinal Measurement and Early Detection of Browser Fingerprinting","authors":"Pouneh Nikkhah Bahrami, Umar Iqbal, Zubair Shafiq","doi":"10.2478/popets-2022-0056","DOIUrl":"https://doi.org/10.2478/popets-2022-0056","url":null,"abstract":"Abstract Browser fingerprinting is a stateless tracking technique that aims to combine information exposed by multiple different web APIs to create a unique identifier for tracking users across the web. Over the last decade, trackers have abused several existing and newly proposed web APIs to further enhance the browser fingerprint. Existing approaches are limited to detecting a specific fingerprinting technique(s) at a particular point in time. Thus, they are unable to systematically detect novel fingerprinting techniques that abuse different web APIs. In this paper, we propose FP-Radar, a machine learning approach that leverages longitudinal measurements of web API usage on top-100K websites over the last decade for early detection of new and evolving browser fingerprinting techniques. The results show that FP-Radar is able to early detect the abuse of newly introduced properties of already known (e.g., WebGL, Sensor) and as well as previously unknown (e.g., Gamepad, Clipboard) APIs for browser fingerprinting. To the best of our knowledge, FP-Radar is the first to detect the abuse of the Visibility API for ephemeral fingerprinting in the wild.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"557 - 577"},"PeriodicalIF":0.0,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45750502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SoK: Plausibly Deniable Storage","authors":"Chen Chen, Xiao Liang, Bogdan Carbunar, R. Sion","doi":"10.2478/popets-2022-0039","DOIUrl":"https://doi.org/10.2478/popets-2022-0039","url":null,"abstract":"Abstract Data privacy is critical in instilling trust and empowering the societal pacts of modern technology-driven democracies. Unfortunately it is under continuous attack by overreaching or outright oppressive governments, including some of the world’s oldest democracies. Increasingly-intrusive anti-encryption laws severely limit the ability of standard encryption to protect privacy. New defense mechanisms are needed. Plausible deniability (PD) is a powerful property, enabling users to hide the existence of sensitive information in a system under direct inspection by adversaries. Popular encrypted storage systems such as TrueCrypt and other research efforts have attempted to also provide plausible deniability. Unfortunately, these efforts have often operated under less well-defined assumptions and adversarial models. Careful analyses often uncover not only high overheads but also outright security compromise. Further, our understanding of adversaries, the underlying storage technologies, as well as the available plausible deniable solutions have evolved dramatically in the past two decades. The main goal of this work is to systematize this knowledge. It aims to: (1) identify key PD properties, requirements and approaches; (2) present a direly-needed unified framework for evaluating security and performance; (3) explore the challenges arising from the critical interplay between PD and modern system layered stacks; (4) propose a new “trace-oriented” PD paradigm, able to decouple security guarantees from the underlying systems and thus ensure a higher level of flexibility and security independent of the technology stack. This work is meant also as a trusted guide for system and security practitioners around the major challenges in understanding, designing and implementing plausible deniability into new or existing systems.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"132 - 151"},"PeriodicalIF":0.0,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42768085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michal Tereszkowski-Kaminski, S. Pastrana, Jorge Blasco, Guillermo Suarez-Tangil
{"title":"Towards Improving Code Stylometry Analysis in Underground Forums","authors":"Michal Tereszkowski-Kaminski, S. Pastrana, Jorge Blasco, Guillermo Suarez-Tangil","doi":"10.2478/popets-2022-0007","DOIUrl":"https://doi.org/10.2478/popets-2022-0007","url":null,"abstract":"Abstract Code Stylometry has emerged as a powerful mechanism to identify programmers. While there have been significant advances in the field, existing mechanisms underperform in challenging domains. One such domain is studying the provenance of code shared in underground forums, where code posts tend to have small or incomplete source code fragments. This paper proposes a method designed to deal with the idiosyncrasies of code snippets shared in these forums. Our system fuses a forum-specific learning pipeline with Conformal Prediction to generate predictions with precise confidence levels as a novelty. We see that identifying unreliable code snippets is paramount to generate high-accuracy predictions, and this is a task where traditional learning settings fail. Overall, our method performs as twice as well as the state-of-the-art in a constrained setting with a large number of authors (i.e., 100). When dealing with a smaller number of authors (i.e., 20), it performs at high accuracy (89%). We also evaluate our work on an open-world assumption and see that our method is more effective at retaining samples.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"126 - 147"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42962108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Polaris: Transparent Succinct Zero-Knowledge Arguments for R1CS with Efficient Verifier","authors":"Shihui Fu, G. Gong","doi":"10.2478/popets-2022-0027","DOIUrl":"https://doi.org/10.2478/popets-2022-0027","url":null,"abstract":"Abstract We present a new zero-knowledge succinct argument of knowledge (zkSNARK) scheme for Rank-1 Constraint Satisfaction (RICS), a widely deployed NP-complete language that generalizes arithmetic circuit satisfiability. By instantiating with different commitment schemes, we obtain several zkSNARKs where the verifier’s costs and the proof size range from O(log2 N) to O(N) Oleft( {sqrt N } right) depending on the underlying polynomial commitment schemes when applied to an N-gate arithmetic circuit. All these schemes do not require a trusted setup. It is plausibly post-quantum secure when instantiated with a secure collision-resistant hash function. We report on experiments for evaluating the performance of our proposed system. For instance, for verifying a SHA-256 preimage (less than 23k AND gates) in zero-knowledge with 128 bits security, the proof size is less than 150kB and the verification time is less than 11ms, both competitive to existing systems.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"544 - 564"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41683580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hussein Darir, Hussein Sibai, Chin-Yu Cheng, N. Borisov, G. Dullerud, S. Mitra
{"title":"MLEFlow: Learning from History to Improve Load Balancing in Tor","authors":"Hussein Darir, Hussein Sibai, Chin-Yu Cheng, N. Borisov, G. Dullerud, S. Mitra","doi":"10.2478/popets-2022-0005","DOIUrl":"https://doi.org/10.2478/popets-2022-0005","url":null,"abstract":"Abstract Tor has millions of daily users seeking privacy while browsing the Internet. It has thousands of relays to route users’ packets while anonymizing their sources and destinations. Users choose relays to forward their traffic according to probability distributions published by the Tor authorities. The authorities generate these probability distributions based on estimates of the capacities of the relays. They compute these estimates based on the bandwidths of probes sent to the relays. These estimates are necessary for better load balancing. Unfortunately, current methods fall short of providing accurate estimates leaving the network underutilized and its capacities unfairly distributed between the users’ paths. We present MLEFlow, a maximum likelihood approach for estimating relay capacities for optimal load balancing in Tor. We show that MLEFlow generalizes a version of Tor capacity estimation, TorFlow-P, by making better use of measurement history. We prove that the mean of our estimate converges to a small interval around the actual capacities, while the variance converges to zero. We present two versions of MLEFlow: MLEFlow-CF, a closed-form approximation of the MLE and MLEFlow-Q, a discretization and iterative approximation of the MLE which can account for noisy observations. We demonstrate the practical benefits of MLEFlow by simulating it using a flow-based Python simulator of a full Tor network and packet-based Shadow simulation of a scaled down version. In our simulations MLEFlow provides significantly more accurate estimates, which result in improved user performance, with median download speeds increasing by 30%.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"75 - 104"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41710793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Polymath: Low-Latency MPC via Secure Polynomial Evaluations and Its Applications","authors":"Donghang Lu, Albert Yu, Aniket Kate, H. K. Maji","doi":"10.2478/popets-2022-0020","DOIUrl":"https://doi.org/10.2478/popets-2022-0020","url":null,"abstract":"Abstract While the practicality of secure multi-party computation (MPC) has been extensively analyzed and improved over the past decade, we are hitting the limits of efficiency with the traditional approaches of representing the computed functionalities as generic arithmetic or Boolean circuits. This work follows the design principle of identifying and constructing fast and provably-secure MPC protocols to evaluate useful high-level algebraic abstractions; thus, improving the efficiency of all applications relying on them. We present Polymath, a constant-round secure computation protocol suite for the secure evaluation of (multi-variate) polynomials of scalars and matrices, functionalities essential to numerous data-processing applications. Using precise natural precomputation and high-degree of parallelism prevalent in the modern computing environments, Polymath can make latency of secure polynomial evaluations of scalars and matrices independent of polynomial degree and matrix dimensions. We implement our protocols over the HoneyBadgerMPC library and apply it to two prominent secure computation tasks: privacy-preserving evaluation of decision trees and privacy-preserving evaluation of Markov processes. For the decision tree evaluation problem, we demonstrate the feasibility of evaluating high-depth decision tree models in a general n-party setting. For the Markov process application, we demonstrate that Poly-math can compute large powers of transition matrices with better online time and less communication.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"396 - 416"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41468067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jochen Schäfer, Christian Müller, Frederik Armknecht
{"title":"If You Like Me, Please Don’t “Like” Me: Inferring Vendor Bitcoin Addresses From Positive Reviews","authors":"Jochen Schäfer, Christian Müller, Frederik Armknecht","doi":"10.2478/popets-2022-0022","DOIUrl":"https://doi.org/10.2478/popets-2022-0022","url":null,"abstract":"Abstract Bitcoin and similar cryptocurrencies are becoming increasingly popular as a payment method in both legitimate and illegitimate online markets. Such markets usually deploy a review system that allows users to rate their purchases and help others to determine reliable vendors. Consequently, vendors are interested into accumulating as many positive reviews (likes) as possible and to make these public. However, we present an attack that exploits these publicly available information to identify cryptocurrency addresses potentially belonging to vendors. In its basic variant, it focuses on vendors that reuse their addresses. We also show an extended variant that copes with the case that addresses are used only once. We demonstrate the applicability of the attack by modeling Bitcoin transactions based on vendor reviews of two separate darknet markets and retrieve matching transactions from the blockchain. By doing so, we can identify Bitcoin addresses likely belonging to darknet market vendors.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"440 - 459"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49008510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitrii Usynin, D. Rueckert, Jonathan Passerat-Palmbach, Georgios Kaissis
{"title":"Zen and the art of model adaptation: Low-utility-cost attack mitigations in collaborative machine learning","authors":"Dmitrii Usynin, D. Rueckert, Jonathan Passerat-Palmbach, Georgios Kaissis","doi":"10.2478/popets-2022-0014","DOIUrl":"https://doi.org/10.2478/popets-2022-0014","url":null,"abstract":"Abstract In this study, we aim to bridge the gap between the theoretical understanding of attacks against collaborative machine learning workflows and their practical ramifications by considering the effects of model architecture, learning setting and hyperparameters on the resilience against attacks. We refer to such mitigations as model adaptation. Through extensive experimentation on both, benchmark and real-life datasets, we establish a more practical threat model for collaborative learning scenarios. In particular, we evaluate the impact of model adaptation by implementing a range of attacks belonging to the broader categories of model inversion and membership inference. Our experiments yield two noteworthy outcomes: they demonstrate the difficulty of actually conducting successful attacks under realistic settings when model adaptation is employed and they highlight the challenge inherent in successfully combining model adaptation and formal privacy-preserving techniques to retain the optimal balance between model utility and attack resilience.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"274 - 290"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42376043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy-Preserving High-dimensional Data Collection with Federated Generative Autoencoder","authors":"Xue Jiang, Xuebing Zhou, Jens Grossklags","doi":"10.2478/popets-2022-0024","DOIUrl":"https://doi.org/10.2478/popets-2022-0024","url":null,"abstract":"Abstract Business intelligence and AI services often involve the collection of copious amounts of multidimensional personal data. Since these data usually contain sensitive information of individuals, the direct collection can lead to privacy violations. Local differential privacy (LDP) is currently considered a state-ofthe-art solution for privacy-preserving data collection. However, existing LDP algorithms are not applicable to high-dimensional data; not only because of the increase in computation and communication cost, but also poor data utility. In this paper, we aim at addressing the curse-of-dimensionality problem in LDP-based high-dimensional data collection. Based on the idea of machine learning and data synthesis, we propose DP-Fed-Wae, an efficient privacy-preserving framework for collecting high-dimensional categorical data. With the combination of a generative autoencoder, federated learning, and differential privacy, our framework is capable of privately learning the statistical distributions of local data and generating high utility synthetic data on the server side without revealing users’ private information. We have evaluated the framework in terms of data utility and privacy protection on a number of real-world datasets containing 68–124 classification attributes. We show that our framework outperforms the LDP-based baseline algorithms in capturing joint distributions and correlations of attributes and generating high-utility synthetic data. With a local privacy guarantee ∈ = 8, the machine learning models trained with the synthetic data generated by the baseline algorithm cause an accuracy loss of 10% ~ 30%, whereas the accuracy loss is significantly reduced to less than 3% and at best even less than 1% with our framework. Extensive experimental results demonstrate the capability and efficiency of our framework in synthesizing high-dimensional data while striking a satisfactory utility-privacy balance.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"481 - 500"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46243940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}