Yanjia Cao, Jiue-An Yang, Atsushi Nara, Marta M. Jankowska
{"title":"设计和评估多源地理空间数据集食品店匹配分层框架:圣地亚哥县案例研究","authors":"Yanjia Cao, Jiue-An Yang, Atsushi Nara, Marta M. Jankowska","doi":"10.1007/s11524-023-00817-9","DOIUrl":null,"url":null,"abstract":"<p>Research on retail food environment (RFE) relies on data availability and accuracy. However, the discrepancies in RFE datasets may lead to imprecision when measuring association with health outcomes. In this research, we present a two-tier hierarchical point of interest (POI) matching framework to compare and triangulate food outlets across multiple geospatial data sources. Two matching parameters were used including the geodesic distance between businesses and the similarity of business names according to Levenshtein distance (LD) and Double Metaphone (DM). Sensitivity analysis was conducted to determine thresholds of matching parameters. Our Tier 1 matching used more restricted parameters to generate high confidence-matched POIs, whereas in Tier 2 we opted for relaxed matching parameters and applied a weighted multi-attribute model on the previously unmatched records. Our case study in San Diego County, California used government, commercial, and crowdsourced data and returned 20.2% matched records from Tier 1 and 18.6% matched from Tier 2. Our manual validation shows a 100% matching rate for Tier 1 and up to 30.6% for Tier 2. Matched and unmatched records from Tier 1 were further analyzed for spatial patterns and categorical differences. Our hierarchical POI matching framework generated highly confident food POIs by conflating datasets and identified some food POIs that are unique to specific data sources. Triangulating RFE data can reduce uncertain and invalid POI listings when representing food environment using multiple data sources. Studies investigating associations between food environment and health outcomes may benefit from improved quality of RFE.</p>","PeriodicalId":17506,"journal":{"name":"Journal of Urban Health","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Designing and Evaluating a Hierarchical Framework for Matching Food Outlets across Multi-sourced Geospatial Datasets: a Case Study of San Diego County\",\"authors\":\"Yanjia Cao, Jiue-An Yang, Atsushi Nara, Marta M. Jankowska\",\"doi\":\"10.1007/s11524-023-00817-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Research on retail food environment (RFE) relies on data availability and accuracy. However, the discrepancies in RFE datasets may lead to imprecision when measuring association with health outcomes. In this research, we present a two-tier hierarchical point of interest (POI) matching framework to compare and triangulate food outlets across multiple geospatial data sources. Two matching parameters were used including the geodesic distance between businesses and the similarity of business names according to Levenshtein distance (LD) and Double Metaphone (DM). Sensitivity analysis was conducted to determine thresholds of matching parameters. Our Tier 1 matching used more restricted parameters to generate high confidence-matched POIs, whereas in Tier 2 we opted for relaxed matching parameters and applied a weighted multi-attribute model on the previously unmatched records. Our case study in San Diego County, California used government, commercial, and crowdsourced data and returned 20.2% matched records from Tier 1 and 18.6% matched from Tier 2. Our manual validation shows a 100% matching rate for Tier 1 and up to 30.6% for Tier 2. Matched and unmatched records from Tier 1 were further analyzed for spatial patterns and categorical differences. Our hierarchical POI matching framework generated highly confident food POIs by conflating datasets and identified some food POIs that are unique to specific data sources. Triangulating RFE data can reduce uncertain and invalid POI listings when representing food environment using multiple data sources. Studies investigating associations between food environment and health outcomes may benefit from improved quality of RFE.</p>\",\"PeriodicalId\":17506,\"journal\":{\"name\":\"Journal of Urban Health\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Urban Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11524-023-00817-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Urban Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11524-023-00817-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
有关零售食品环境(RFE)的研究依赖于数据的可用性和准确性。然而,零售食品环境数据集的差异可能会导致在测量与健康结果的关联时不够精确。在这项研究中,我们提出了一个两层兴趣点(POI)匹配框架,用于比较和三角测量多个地理空间数据源中的食品店。我们使用了两个匹配参数,包括企业之间的大地测量距离,以及根据莱文斯坦距离(LD)和双隐喻(DM)计算的企业名称相似度。我们进行了敏感性分析,以确定匹配参数的阈值。我们的一级匹配使用了更多限制性参数,以生成高置信度匹配的 POI,而在二级匹配中,我们选择了宽松的匹配参数,并对之前未匹配的记录应用了加权多属性模型。我们在加利福尼亚州圣迭戈县进行的案例研究使用了政府、商业和众包数据,结果显示,一级匹配的记录占 20.2%,二级匹配的记录占 18.6%。我们的人工验证显示,第 1 层的匹配率为 100%,第 2 层的匹配率高达 30.6%。我们进一步分析了第 1 层的匹配记录和未匹配记录的空间模式和分类差异。我们的分层 POI 匹配框架通过混淆数据集生成了高度可信的食品 POI,并识别出一些特定数据源独有的食品 POI。在使用多种数据源表示食品环境时,三角化 RFE 数据可减少不确定和无效的 POI 列表。对食品环境与健康结果之间关系的调查研究可能会从提高《食物权证》质量中获益。
Designing and Evaluating a Hierarchical Framework for Matching Food Outlets across Multi-sourced Geospatial Datasets: a Case Study of San Diego County
Research on retail food environment (RFE) relies on data availability and accuracy. However, the discrepancies in RFE datasets may lead to imprecision when measuring association with health outcomes. In this research, we present a two-tier hierarchical point of interest (POI) matching framework to compare and triangulate food outlets across multiple geospatial data sources. Two matching parameters were used including the geodesic distance between businesses and the similarity of business names according to Levenshtein distance (LD) and Double Metaphone (DM). Sensitivity analysis was conducted to determine thresholds of matching parameters. Our Tier 1 matching used more restricted parameters to generate high confidence-matched POIs, whereas in Tier 2 we opted for relaxed matching parameters and applied a weighted multi-attribute model on the previously unmatched records. Our case study in San Diego County, California used government, commercial, and crowdsourced data and returned 20.2% matched records from Tier 1 and 18.6% matched from Tier 2. Our manual validation shows a 100% matching rate for Tier 1 and up to 30.6% for Tier 2. Matched and unmatched records from Tier 1 were further analyzed for spatial patterns and categorical differences. Our hierarchical POI matching framework generated highly confident food POIs by conflating datasets and identified some food POIs that are unique to specific data sources. Triangulating RFE data can reduce uncertain and invalid POI listings when representing food environment using multiple data sources. Studies investigating associations between food environment and health outcomes may benefit from improved quality of RFE.