{"title":"对或然率表进行高度私有化的大样本测试","authors":"Sungkyu Jung, Seung Woo Kwak","doi":"10.1002/sta4.658","DOIUrl":null,"url":null,"abstract":"Differential privacy is a foundational concept for safeguarding sensitive individual information when releasing data or statistical analysis results. In this study, we concentrate on the protection of privacy in the context of goodness‐of‐fit (GOF) and independence tests, utilizing perturbed contingency tables that adhere to Gaussian differential privacy within the high‐privacy regime, where the degrees of privacy protection increase as the sample size increases. We introduce private test procedures for GOF, independence of two variables and the equality of proportions in paired samples, similar to McNemar's test. For each of these hypothesis testing situations, we propose private test statistics based on the statistics and establish their asymptotic null distributions. We numerically confirm that Type I error rates of the proposed private test procedures are well controlled and have adequate power for larger sample sizes and effect sizes. The proposal is demonstrated in private inferences based on the American Time Use Survey data.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"109 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Highly private large‐sample tests for contingency tables\",\"authors\":\"Sungkyu Jung, Seung Woo Kwak\",\"doi\":\"10.1002/sta4.658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Differential privacy is a foundational concept for safeguarding sensitive individual information when releasing data or statistical analysis results. In this study, we concentrate on the protection of privacy in the context of goodness‐of‐fit (GOF) and independence tests, utilizing perturbed contingency tables that adhere to Gaussian differential privacy within the high‐privacy regime, where the degrees of privacy protection increase as the sample size increases. We introduce private test procedures for GOF, independence of two variables and the equality of proportions in paired samples, similar to McNemar's test. For each of these hypothesis testing situations, we propose private test statistics based on the statistics and establish their asymptotic null distributions. We numerically confirm that Type I error rates of the proposed private test procedures are well controlled and have adequate power for larger sample sizes and effect sizes. The proposal is demonstrated in private inferences based on the American Time Use Survey data.\",\"PeriodicalId\":56159,\"journal\":{\"name\":\"Stat\",\"volume\":\"109 1\",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stat\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1002/sta4.658\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stat","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.658","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
摘要
差分隐私是在发布数据或统计分析结果时保护敏感个人信息的基本概念。在本研究中,我们将重点放在拟合优度(GOF)和独立性检验中的隐私保护上,利用扰动的或然率表,在高隐私机制下坚持高斯差分隐私,即隐私保护程度随着样本量的增加而增加。我们为 GOF、两个变量的独立性和配对样本中的比例相等(类似于 McNemar 检验)引入了隐私检验程序。对于上述每种假设检验情况,我们都提出了基于统计量的私有检验统计量,并建立了它们的渐近零分布。我们用数字证实了所提出的私人检验程序的 I 类错误率得到了很好的控制,并且对于较大的样本量和效应量具有足够的功率。我们在基于美国时间使用调查数据的私人推断中演示了这一建议。
Highly private large‐sample tests for contingency tables
Differential privacy is a foundational concept for safeguarding sensitive individual information when releasing data or statistical analysis results. In this study, we concentrate on the protection of privacy in the context of goodness‐of‐fit (GOF) and independence tests, utilizing perturbed contingency tables that adhere to Gaussian differential privacy within the high‐privacy regime, where the degrees of privacy protection increase as the sample size increases. We introduce private test procedures for GOF, independence of two variables and the equality of proportions in paired samples, similar to McNemar's test. For each of these hypothesis testing situations, we propose private test statistics based on the statistics and establish their asymptotic null distributions. We numerically confirm that Type I error rates of the proposed private test procedures are well controlled and have adequate power for larger sample sizes and effect sizes. The proposal is demonstrated in private inferences based on the American Time Use Survey data.
StatDecision Sciences-Statistics, Probability and Uncertainty
CiteScore
1.10
自引率
0.00%
发文量
85
期刊介绍:
Stat is an innovative electronic journal for the rapid publication of novel and topical research results, publishing compact articles of the highest quality in all areas of statistical endeavour. Its purpose is to provide a means of rapid sharing of important new theoretical, methodological and applied research. Stat is a joint venture between the International Statistical Institute and Wiley-Blackwell.
Stat is characterised by:
• Speed - a high-quality review process that aims to reach a decision within 20 days of submission.
• Concision - a maximum article length of 10 pages of text, not including references.
• Supporting materials - inclusion of electronic supporting materials including graphs, video, software, data and images.
• Scope - addresses all areas of statistics and interdisciplinary areas.
Stat is a scientific journal for the international community of statisticians and researchers and practitioners in allied quantitative disciplines.