Expanding the Role of Synthetic Data at the U.S. Census Bureau

U.S. Census Bureau Center for Economic Studies research paper series Pub Date : 2014-02-01 DOI:10.2139/ssrn.2408030

Ron S. Jarmin, T. Louis, Javier Miranda

{"title":"Expanding the Role of Synthetic Data at the U.S. Census Bureau","authors":"Ron S. Jarmin, T. Louis, Javier Miranda","doi":"10.2139/ssrn.2408030","DOIUrl":null,"url":null,"abstract":"National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public- use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss re- cent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.","PeriodicalId":92154,"journal":{"name":"U.S. Census Bureau Center for Economic Studies research paper series","volume":"61 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"U.S. Census Bureau Center for Economic Studies research paper series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.2408030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public- use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss re- cent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.

查看原文本刊更多论文

扩大综合数据在美国人口普查局的作用

国家统计局(NSOs)根据从调查对象、政府行政记录和其他来源收集的数据创建官方统计数据。原始源数据通常被认为是机密的。在美国人口普查局的情况下，调查和行政记录微数据的机密性是由法规规定的，而保护机密性的这一规定往往与用户从数据中提取尽可能多的信息的需求相冲突。传统的披露保护技术导致官方数据产品不能充分利用底层微数据的信息内容。通常，这些产品采用简单汇总表格的形式。在少数情况下，提供了匿名的公共使用微样本，但由于公共领域中个人和公司信息的数量不断增加，这些样本面临着重新识别的日益增加的风险。克服这些风险的一种方法是发布基于合成数据的产品，其中的值是从旨在模拟底层微数据的(联合)分布的统计模型中模拟出来的。我们讨论了最近人口普查局开发和部署此类产品的工作。我们讨论了在官方统计中扩大合成数据产品范围所涉及的好处和挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

U.S. Census Bureau Center for Economic Studies research paper series

自引率

0.00%

发文量