Henrique F. de Arruda, Sandro M. Reia, Shiyang Ruan, Kuldip S. Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser
{"title":"从 OpenStreetMap 数据中提取美国建筑类型","authors":"Henrique F. de Arruda, Sandro M. Reia, Shiyang Ruan, Kuldip S. Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser","doi":"arxiv-2409.05692","DOIUrl":null,"url":null,"abstract":"Building type information is crucial for population estimation, traffic\nplanning, urban planning, and emergency response applications. Although\nessential, such data is often not readily available. To alleviate this problem,\nthis work creates a comprehensive dataset by providing\nresidential/non-residential building classification covering the entire United\nStates. We propose and utilize an unsupervised machine learning method to\nclassify building types based on building footprints and available\nOpenStreetMap information. The classification result is validated using\nauthoritative ground truth data for select counties in the U.S. The validation\nshows a high precision for non-residential building classification and a high\nrecall for residential buildings. We identified various approaches to improving\nthe quality of the classification, such as removing sheds and garages from the\ndataset. Furthermore, analyzing the misclassifications revealed that they are\nmainly due to missing and scarce metadata in OSM. A major result of this work\nis the resulting dataset of classifying 67,705,475 buildings. We hope that this\ndata is of value to the scientific community, including urban and\ntransportation planners.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extracting the U.S. building types from OpenStreetMap data\",\"authors\":\"Henrique F. de Arruda, Sandro M. Reia, Shiyang Ruan, Kuldip S. Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser\",\"doi\":\"arxiv-2409.05692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Building type information is crucial for population estimation, traffic\\nplanning, urban planning, and emergency response applications. Although\\nessential, such data is often not readily available. To alleviate this problem,\\nthis work creates a comprehensive dataset by providing\\nresidential/non-residential building classification covering the entire United\\nStates. We propose and utilize an unsupervised machine learning method to\\nclassify building types based on building footprints and available\\nOpenStreetMap information. The classification result is validated using\\nauthoritative ground truth data for select counties in the U.S. The validation\\nshows a high precision for non-residential building classification and a high\\nrecall for residential buildings. We identified various approaches to improving\\nthe quality of the classification, such as removing sheds and garages from the\\ndataset. Furthermore, analyzing the misclassifications revealed that they are\\nmainly due to missing and scarce metadata in OSM. A major result of this work\\nis the resulting dataset of classifying 67,705,475 buildings. We hope that this\\ndata is of value to the scientific community, including urban and\\ntransportation planners.\",\"PeriodicalId\":501032,\"journal\":{\"name\":\"arXiv - CS - Social and Information Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Social and Information Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05692\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
建筑类型信息对于人口估计、交通规划、城市规划和应急响应应用至关重要。尽管非常重要,但此类数据往往不易获得。为了缓解这一问题,这项工作通过提供覆盖全美的住宅/非住宅建筑分类,创建了一个综合数据集。我们提出并使用了一种无监督机器学习方法,根据建筑物占地面积和可用的 OpenStreetMap 信息对建筑物类型进行分类。我们使用美国部分郡县的权威地面实况数据对分类结果进行了验证。验证结果表明,非住宅建筑分类的精确度很高,而住宅建筑分类的召回率很高。我们确定了提高分类质量的各种方法,例如从数据集中移除棚屋和车库。此外,对错误分类的分析表明,这些错误分类主要是由于 OSM 中元数据的缺失和匮乏造成的。这项工作的一个主要成果是建立了一个数据集,对 67 705 475 幢建筑物进行了分类。我们希望这些数据能对科学界,包括城市和交通规划者有所帮助。
Extracting the U.S. building types from OpenStreetMap data
Building type information is crucial for population estimation, traffic
planning, urban planning, and emergency response applications. Although
essential, such data is often not readily available. To alleviate this problem,
this work creates a comprehensive dataset by providing
residential/non-residential building classification covering the entire United
States. We propose and utilize an unsupervised machine learning method to
classify building types based on building footprints and available
OpenStreetMap information. The classification result is validated using
authoritative ground truth data for select counties in the U.S. The validation
shows a high precision for non-residential building classification and a high
recall for residential buildings. We identified various approaches to improving
the quality of the classification, such as removing sheds and garages from the
dataset. Furthermore, analyzing the misclassifications revealed that they are
mainly due to missing and scarce metadata in OSM. A major result of this work
is the resulting dataset of classifying 67,705,475 buildings. We hope that this
data is of value to the scientific community, including urban and
transportation planners.