A National Synthetic Populations Dataset for the United States.

IF 6.9 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
James Rineer, Nicholas Kruskamp, Caroline Kery, Kasey Jones, Rainer Hilscher, Georgiy Bobashev
{"title":"A National Synthetic Populations Dataset for the United States.","authors":"James Rineer, Nicholas Kruskamp, Caroline Kery, Kasey Jones, Rainer Hilscher, Georgiy Bobashev","doi":"10.1038/s41597-025-04380-7","DOIUrl":null,"url":null,"abstract":"<p><p>Geospatially explicit and statistically accurate person and household data allow researchers to study community-and neighborhood-level effects and design and test hypotheses that would otherwise not be possible without the generation of synthetic data. In this article, we demonstrate the workflow for generating spatially explicit household- and individual-level synthetic populations for the United States representing the year 2019. We use publicly available U.S. Census American Community Survey (ACS) 5-year estimates from the 2015-2019 ACS. We use Iterative Proportional Fitting (IPF) to create our synthetic population and use the resulting joint counts to sample representative households and people directly from microdata. Our dataset contains records for 120,754,708 households and 303,128,287 individuals across the United States. We spatially allocate households using the Environmental Protection Agency (EPA) Integrated Climate and Land Use Scenarios (ICLUS) project household distribution estimates to create a spatially explicit dataset. Our validation shows strong correlation with original census variables, with many categories reporting a greater than 0.99 Pearson's r correlation coefficient.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"144"},"PeriodicalIF":6.9000,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11762717/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-04380-7","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Geospatially explicit and statistically accurate person and household data allow researchers to study community-and neighborhood-level effects and design and test hypotheses that would otherwise not be possible without the generation of synthetic data. In this article, we demonstrate the workflow for generating spatially explicit household- and individual-level synthetic populations for the United States representing the year 2019. We use publicly available U.S. Census American Community Survey (ACS) 5-year estimates from the 2015-2019 ACS. We use Iterative Proportional Fitting (IPF) to create our synthetic population and use the resulting joint counts to sample representative households and people directly from microdata. Our dataset contains records for 120,754,708 households and 303,128,287 individuals across the United States. We spatially allocate households using the Environmental Protection Agency (EPA) Integrated Climate and Land Use Scenarios (ICLUS) project household distribution estimates to create a spatially explicit dataset. Our validation shows strong correlation with original census variables, with many categories reporting a greater than 0.99 Pearson's r correlation coefficient.

Abstract Image

Abstract Image

Abstract Image

美国国家综合人口数据集。
地理空间上的明确和统计上准确的个人和家庭数据使研究人员能够研究社区和邻里水平的影响,设计和检验假设,否则没有合成数据的产生是不可能的。在本文中,我们演示了用于生成代表2019年的美国空间明确的家庭和个人层面合成人口的工作流程。我们使用公开的美国人口普查美国社区调查(ACS) 2015-2019年ACS的5年估计数。我们使用迭代比例拟合(IPF)来创建我们的合成人口,并使用由此产生的联合计数直接从微数据中对代表性家庭和人员进行抽样。我们的数据集包含美国120,754,708个家庭和303,128,287个个人的记录。我们使用环境保护局(EPA)综合气候和土地利用情景(ICLUS)项目家庭分布估算来创建空间明确的数据集。我们的验证显示与原始人口普查变量有很强的相关性,许多类别报告的Pearson’s r相关系数大于0.99。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Scientific Data
Scientific Data Social Sciences-Education
CiteScore
11.20
自引率
4.10%
发文量
689
审稿时长
16 weeks
期刊介绍: Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信