{"title":"保隐私后随机化数据下的高效模型无关参数估计","authors":"Qinglong Tian, Jiwei Zhao","doi":"10.1002/cjs.70003","DOIUrl":null,"url":null,"abstract":"<p>Balancing data privacy with public access is critical for sensitive datasets. However, even after de-identification, the data are still vulnerable to, for example, inference attacks (by matching some keywords with external datasets). Statistical disclosure control (SDC) methods offer additional protection, and the post-randomization method (PRAM) adds noise to data to achieve this goal. However, PRAM-perturbed data pose challenges for analysis, as directly using the perturbed data leads to biased parameter estimates. This article addresses parameter estimation when data are perturbed using PRAM for privacy. While existing methods suffer from limitations like being parameter-specific, model-dependent and lacking optimality guarantees, our proposed method overcomes these limitations. Our approach applies to general parameters defined through estimating equations and makes no assumptions about the underlying data model. Furthermore, we prove that the proposed estimator achieves the semiparametric efficiency bound, making it asymptotically optimal in terms of estimation efficiency.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 3","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70003","citationCount":"0","resultStr":"{\"title\":\"Efficient and model-agnostic parameter estimation under privacy-preserving post-randomization data\",\"authors\":\"Qinglong Tian, Jiwei Zhao\",\"doi\":\"10.1002/cjs.70003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Balancing data privacy with public access is critical for sensitive datasets. However, even after de-identification, the data are still vulnerable to, for example, inference attacks (by matching some keywords with external datasets). Statistical disclosure control (SDC) methods offer additional protection, and the post-randomization method (PRAM) adds noise to data to achieve this goal. However, PRAM-perturbed data pose challenges for analysis, as directly using the perturbed data leads to biased parameter estimates. This article addresses parameter estimation when data are perturbed using PRAM for privacy. While existing methods suffer from limitations like being parameter-specific, model-dependent and lacking optimality guarantees, our proposed method overcomes these limitations. Our approach applies to general parameters defined through estimating equations and makes no assumptions about the underlying data model. Furthermore, we prove that the proposed estimator achieves the semiparametric efficiency bound, making it asymptotically optimal in terms of estimation efficiency.</p>\",\"PeriodicalId\":55281,\"journal\":{\"name\":\"Canadian Journal of Statistics-Revue Canadienne De Statistique\",\"volume\":\"53 3\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70003\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Canadian Journal of Statistics-Revue Canadienne De Statistique\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cjs.70003\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Statistics-Revue Canadienne De Statistique","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cjs.70003","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Efficient and model-agnostic parameter estimation under privacy-preserving post-randomization data
Balancing data privacy with public access is critical for sensitive datasets. However, even after de-identification, the data are still vulnerable to, for example, inference attacks (by matching some keywords with external datasets). Statistical disclosure control (SDC) methods offer additional protection, and the post-randomization method (PRAM) adds noise to data to achieve this goal. However, PRAM-perturbed data pose challenges for analysis, as directly using the perturbed data leads to biased parameter estimates. This article addresses parameter estimation when data are perturbed using PRAM for privacy. While existing methods suffer from limitations like being parameter-specific, model-dependent and lacking optimality guarantees, our proposed method overcomes these limitations. Our approach applies to general parameters defined through estimating equations and makes no assumptions about the underlying data model. Furthermore, we prove that the proposed estimator achieves the semiparametric efficiency bound, making it asymptotically optimal in terms of estimation efficiency.
期刊介绍:
The Canadian Journal of Statistics is the official journal of the Statistical Society of Canada. It has a reputation internationally as an excellent journal. The editorial board is comprised of statistical scientists with applied, computational, methodological, theoretical and probabilistic interests. Their role is to ensure that the journal continues to provide an international forum for the discipline of Statistics.
The journal seeks papers making broad points of interest to many readers, whereas papers making important points of more specific interest are better placed in more specialized journals. The levels of innovation and impact are key in the evaluation of submitted manuscripts.