Extracting the U.S. building types from OpenStreetMap data

arXiv - CS - Social and Information Networks Pub Date : 2024-09-09 DOI:arxiv-2409.05692

Henrique F. de Arruda, Sandro M. Reia, Shiyang Ruan, Kuldip S. Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser

{"title":"Extracting the U.S. building types from OpenStreetMap data","authors":"Henrique F. de Arruda, Sandro M. Reia, Shiyang Ruan, Kuldip S. Atwal, Hamdi Kavak, Taylor Anderson, Dieter Pfoser","doi":"arxiv-2409.05692","DOIUrl":null,"url":null,"abstract":"Building type information is crucial for population estimation, traffic\nplanning, urban planning, and emergency response applications. Although\nessential, such data is often not readily available. To alleviate this problem,\nthis work creates a comprehensive dataset by providing\nresidential/non-residential building classification covering the entire United\nStates. We propose and utilize an unsupervised machine learning method to\nclassify building types based on building footprints and available\nOpenStreetMap information. The classification result is validated using\nauthoritative ground truth data for select counties in the U.S. The validation\nshows a high precision for non-residential building classification and a high\nrecall for residential buildings. We identified various approaches to improving\nthe quality of the classification, such as removing sheds and garages from the\ndataset. Furthermore, analyzing the misclassifications revealed that they are\nmainly due to missing and scarce metadata in OSM. A major result of this work\nis the resulting dataset of classifying 67,705,475 buildings. We hope that this\ndata is of value to the scientific community, including urban and\ntransportation planners.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"120 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Building type information is crucial for population estimation, traffic planning, urban planning, and emergency response applications. Although essential, such data is often not readily available. To alleviate this problem, this work creates a comprehensive dataset by providing residential/non-residential building classification covering the entire United States. We propose and utilize an unsupervised machine learning method to classify building types based on building footprints and available OpenStreetMap information. The classification result is validated using authoritative ground truth data for select counties in the U.S. The validation shows a high precision for non-residential building classification and a high recall for residential buildings. We identified various approaches to improving the quality of the classification, such as removing sheds and garages from the dataset. Furthermore, analyzing the misclassifications revealed that they are mainly due to missing and scarce metadata in OSM. A major result of this work is the resulting dataset of classifying 67,705,475 buildings. We hope that this data is of value to the scientific community, including urban and transportation planners.

查看原文本刊更多论文

从 OpenStreetMap 数据中提取美国建筑类型

建筑类型信息对于人口估计、交通规划、城市规划和应急响应应用至关重要。尽管非常重要，但此类数据往往不易获得。为了缓解这一问题，这项工作通过提供覆盖全美的住宅/非住宅建筑分类，创建了一个综合数据集。我们提出并使用了一种无监督机器学习方法，根据建筑物占地面积和可用的 OpenStreetMap 信息对建筑物类型进行分类。我们使用美国部分郡县的权威地面实况数据对分类结果进行了验证。验证结果表明，非住宅建筑分类的精确度很高，而住宅建筑分类的召回率很高。我们确定了提高分类质量的各种方法，例如从数据集中移除棚屋和车库。此外，对错误分类的分析表明，这些错误分类主要是由于 OSM 中元数据的缺失和匮乏造成的。这项工作的一个主要成果是建立了一个数据集，对 67 705 475 幢建筑物进行了分类。我们希望这些数据能对科学界，包括城市和交通规划者有所帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Social and Information Networks

自引率

0.00%

发文量