{"title":"贝叶斯时间序列匹配与隐私","authors":"Ke Li, H. Pishro-Nik, D. Goeckel","doi":"10.1109/ACSSC.2017.8335645","DOIUrl":null,"url":null,"abstract":"A user's privacy can be compromised by matching the statistical characteristics of an anonymized trace of interest to prior behavior of the user. Here, we address this matching problem from first principles in the Bayesian case, where user parameters are drawn from a known distribution, to understand the relationship between the length of the observed traces, the characteristics of the distribution defining the differences between user behavior, and user privacy. First, we establish optimal tests (of two hypotheses and extended to multiple hypotheses as well) for the cases with: 1) continuous alphabets, in particular i.i.d. Gaussian observations with a different (unknown) mean for each user, where the means are drawn from a general a priori distribution; 2) binary alphabets where i.i.d. observations are drawn from a Bernoulli distribution, with each user having an (unknown) probability of being in the \"0\" state drawn from some certain a priori distribution. Next, for the case with Gaussian observations, we provide general (non-asymptotic) bounds to the performance of the tests and also employ these to show the scaling behavior of privacy. Finally, we present simulation results to demonstrate the accuracy of our analytical bounds.","PeriodicalId":296208,"journal":{"name":"2017 51st Asilomar Conference on Signals, Systems, and Computers","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Bayesian time series matching and privacy\",\"authors\":\"Ke Li, H. Pishro-Nik, D. Goeckel\",\"doi\":\"10.1109/ACSSC.2017.8335645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A user's privacy can be compromised by matching the statistical characteristics of an anonymized trace of interest to prior behavior of the user. Here, we address this matching problem from first principles in the Bayesian case, where user parameters are drawn from a known distribution, to understand the relationship between the length of the observed traces, the characteristics of the distribution defining the differences between user behavior, and user privacy. First, we establish optimal tests (of two hypotheses and extended to multiple hypotheses as well) for the cases with: 1) continuous alphabets, in particular i.i.d. Gaussian observations with a different (unknown) mean for each user, where the means are drawn from a general a priori distribution; 2) binary alphabets where i.i.d. observations are drawn from a Bernoulli distribution, with each user having an (unknown) probability of being in the \\\"0\\\" state drawn from some certain a priori distribution. Next, for the case with Gaussian observations, we provide general (non-asymptotic) bounds to the performance of the tests and also employ these to show the scaling behavior of privacy. Finally, we present simulation results to demonstrate the accuracy of our analytical bounds.\",\"PeriodicalId\":296208,\"journal\":{\"name\":\"2017 51st Asilomar Conference on Signals, Systems, and Computers\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 51st Asilomar Conference on Signals, Systems, and Computers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACSSC.2017.8335645\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 51st Asilomar Conference on Signals, Systems, and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.2017.8335645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A user's privacy can be compromised by matching the statistical characteristics of an anonymized trace of interest to prior behavior of the user. Here, we address this matching problem from first principles in the Bayesian case, where user parameters are drawn from a known distribution, to understand the relationship between the length of the observed traces, the characteristics of the distribution defining the differences between user behavior, and user privacy. First, we establish optimal tests (of two hypotheses and extended to multiple hypotheses as well) for the cases with: 1) continuous alphabets, in particular i.i.d. Gaussian observations with a different (unknown) mean for each user, where the means are drawn from a general a priori distribution; 2) binary alphabets where i.i.d. observations are drawn from a Bernoulli distribution, with each user having an (unknown) probability of being in the "0" state drawn from some certain a priori distribution. Next, for the case with Gaussian observations, we provide general (non-asymptotic) bounds to the performance of the tests and also employ these to show the scaling behavior of privacy. Finally, we present simulation results to demonstrate the accuracy of our analytical bounds.