在基于手机的大规模数据中发现代表性偏差:北卡罗来纳州案例研究

IF 3.3 3区 地球科学 Q1 GEOGRAPHY
Hanna V. Jardel, Paul L. Delamater
{"title":"在基于手机的大规模数据中发现代表性偏差:北卡罗来纳州案例研究","authors":"Hanna V. Jardel,&nbsp;Paul L. Delamater","doi":"10.1111/gean.12399","DOIUrl":null,"url":null,"abstract":"<p>Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.</p>","PeriodicalId":12533,"journal":{"name":"Geographical Analysis","volume":"56 4","pages":"723-745"},"PeriodicalIF":3.3000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Uncovering Representation Bias in Large-scale Cellular Phone-based Data: A Case Study in North Carolina\",\"authors\":\"Hanna V. Jardel,&nbsp;Paul L. Delamater\",\"doi\":\"10.1111/gean.12399\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.</p>\",\"PeriodicalId\":12533,\"journal\":{\"name\":\"Geographical Analysis\",\"volume\":\"56 4\",\"pages\":\"723-745\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geographical Analysis\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/gean.12399\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geographical Analysis","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/gean.12399","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0

摘要

基于手机的大型移动数据集是人类移动研究的重要新数据源。我们调查并说明了大型移动数据集在普查街区组、片区和县一级的代表性偏差。我们将 2019 年美国社区调查(ACS)数据与 SafeGraph(SG)手机移动数据配对,通过将 ACS 估算人口与 SG 数据中的设备数量进行对比,并按照收入、黑人人口比例、55 岁以上人口比例、18-65 岁人口比例、拥挤居住条件人口比例和城市化水平等关键社会人口变量进行分层,来阐明 SG 数据中的潜在偏差。我们通过对 10 个月期间的研究,评估了偏差是否随时间而变化。这种偏差会随着主要人口特征的变化和时间的推移而变化。具体而言,我们发现在所有汇总水平上,黑人人口比例最高的地区代表性不足。我们还发现,在工作年龄居民比例最高的地区以及收入中位数最低的地区,所有层面的代表性都不足。研究人员在使用流动性数据集时应谨慎,因为关键社会人口因素和收集时间不同会造成偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Uncovering Representation Bias in Large-scale Cellular Phone-based Data: A Case Study in North Carolina

Large cellular phone-based mobility datasets are an important new data source for research on human movement. We investigate and illustrate bias in representation in a large mobility data set at the census block group, tract, and county levels. We paired American Community Survey (ACS) 2019 data with SafeGraph (SG) cell phone mobility data to elucidate potential bias in SG data by examining ACS estimated population against the number of devices in the SG data, stratifying by key sociodemographic variables such as income, percent Black population, percent of population over 55 years, percent of population 18–65 years, percent of people living in crowded living conditions, and urbanization level. We evaluated whether the bias varied over time by examining a 10-month period. This bias changes with key demographic characteristics and changes over time. Specifically, we see underrepresentation in areas that have the highest percentage of Black population at all aggregation levels. We also see underrepresentation at all levels in areas with the highest percentage of working age residents as well as areas with the lowest median incomes. Researchers should be cautious when using mobility datasets because of bias differential on key sociodemographic factors and collection time.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.70
自引率
5.60%
发文量
40
期刊介绍: First in its specialty area and one of the most frequently cited publications in geography, Geographical Analysis has, since 1969, presented significant advances in geographical theory, model building, and quantitative methods to geographers and scholars in a wide spectrum of related fields. Traditionally, mathematical and nonmathematical articulations of geographical theory, and statements and discussions of the analytic paradigm are published in the journal. Spatial data analyses and spatial econometrics and statistics are strongly represented.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信