大型语言模型时代的城市规划：评估OpenAI 01在556个任务中的性能和能力

IF 8.3 1区地球科学 Q1 ENVIRONMENTAL STUDIES

Computers Environment and Urban Systems Pub Date : 2025-08-01 DOI:10.1016/j.compenvurbsys.2025.102332

Xukai Zhao , He Huang , Tao Yang , Yuxing Lu , Lu Zhang , Ruoyu Wang , Zhengliang Liu , Tianyang Zhong , Tianming Liu

{"title":"大型语言模型时代的城市规划：评估OpenAI 01在556个任务中的性能和能力","authors":"Xukai Zhao , He Huang , Tao Yang , Yuxing Lu , Lu Zhang , Ruoyu Wang , Zhengliang Liu , Tianyang Zhong , Tianming Liu","doi":"10.1016/j.compenvurbsys.2025.102332","DOIUrl":null,"url":null,"abstract":"<div><div>Integrating Large Language Models (LLMs) into urban planning presents significant opportunities to enhance efficiency and support data-driven city development strategies. Despite their potential, the specific capabilities of LLMs within the urban planning context remain underexplored, and the field lacks standardized benchmarks for systematic evaluation. This study presents the first comprehensive evaluation focused on OpenAI o1's performance and capabilities in urban planning, systematically benchmarking it against GPT-3.5 and GPT-4o using an original open-source benchmark comprising 556 tasks across five critical categories: urban planning documentation, examinations, routine data analysis, AI algorithm support, and thesis writing. Through rigorous testing and manual analysis of 170,627 words of generated output, OpenAI o1 consistently outperformed its counterparts, achieving an average performance score of 84.08 % compared to 69.30 % for GPT-4o and 45.27 % for GPT-3.5. Our findings highlight o1's strengths in domain knowledge mastery, basic operational competence, and coding capabilities, demonstrating its potential applications in information retrieval, urban data analytics, planning decision support, educational assistance, and LLM-based agent development. However, significant limitations were identified, including inability in urban design, susceptibility to fabricating information, moderate academic writing quality, challenges in high-level professional examinations, and spatial reasoning, and limited support for specialized or emerging AI algorithms. Future optimizations should prioritize enhancing multimodal integration, implementing robust validation mechanisms, adopting adaptive learning strategies, and enabling domain-specific fine-tuning to meet urban planners' specialized needs. These advancements would enable LLMs to better support the evolving demands of urban planning, allowing professionals to focus more on strategic decision-making and the creative aspects of the field.</div></div>","PeriodicalId":48241,"journal":{"name":"Computers Environment and Urban Systems","volume":"121 ","pages":"Article 102332"},"PeriodicalIF":8.3000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Urban planning in the age of large language models: Assessing OpenAI o1's performance and capabilities across 556 tasks\",\"authors\":\"Xukai Zhao , He Huang , Tao Yang , Yuxing Lu , Lu Zhang , Ruoyu Wang , Zhengliang Liu , Tianyang Zhong , Tianming Liu\",\"doi\":\"10.1016/j.compenvurbsys.2025.102332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Integrating Large Language Models (LLMs) into urban planning presents significant opportunities to enhance efficiency and support data-driven city development strategies. Despite their potential, the specific capabilities of LLMs within the urban planning context remain underexplored, and the field lacks standardized benchmarks for systematic evaluation. This study presents the first comprehensive evaluation focused on OpenAI o1's performance and capabilities in urban planning, systematically benchmarking it against GPT-3.5 and GPT-4o using an original open-source benchmark comprising 556 tasks across five critical categories: urban planning documentation, examinations, routine data analysis, AI algorithm support, and thesis writing. Through rigorous testing and manual analysis of 170,627 words of generated output, OpenAI o1 consistently outperformed its counterparts, achieving an average performance score of 84.08 % compared to 69.30 % for GPT-4o and 45.27 % for GPT-3.5. Our findings highlight o1's strengths in domain knowledge mastery, basic operational competence, and coding capabilities, demonstrating its potential applications in information retrieval, urban data analytics, planning decision support, educational assistance, and LLM-based agent development. However, significant limitations were identified, including inability in urban design, susceptibility to fabricating information, moderate academic writing quality, challenges in high-level professional examinations, and spatial reasoning, and limited support for specialized or emerging AI algorithms. Future optimizations should prioritize enhancing multimodal integration, implementing robust validation mechanisms, adopting adaptive learning strategies, and enabling domain-specific fine-tuning to meet urban planners' specialized needs. These advancements would enable LLMs to better support the evolving demands of urban planning, allowing professionals to focus more on strategic decision-making and the creative aspects of the field.</div></div>\",\"PeriodicalId\":48241,\"journal\":{\"name\":\"Computers Environment and Urban Systems\",\"volume\":\"121 \",\"pages\":\"Article 102332\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers Environment and Urban Systems\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0198971525000857\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL STUDIES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers Environment and Urban Systems","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0198971525000857","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}

引用次数: 0

摘要

将大型语言模型（llm）整合到城市规划中，为提高效率和支持数据驱动的城市发展战略提供了重要机会。尽管具有潜力，法学硕士在城市规划背景下的具体能力仍未得到充分探索，该领域缺乏系统评估的标准化基准。本研究首次对OpenAI o1在城市规划中的性能和能力进行了全面评估，使用原始的开源基准对其进行了系统的基准测试，该基准测试包括五个关键类别的556个任务：城市规划文档、考试、常规数据分析、人工智能算法支持和论文写作。通过对生成输出的170,627个单词的严格测试和人工分析，OpenAI 01始终优于同类产品，平均性能得分为84.08%，而gpt - 40和GPT-3.5的平均性能得分分别为69.30%和45.27%。我们的研究结果突出了o1在领域知识掌握、基本操作能力和编码能力方面的优势，展示了它在信息检索、城市数据分析、规划决策支持、教育辅助和基于llm的代理开发方面的潜在应用。然而，我们发现了显著的局限性，包括城市设计能力不足、易受虚假信息的影响、学术写作质量不高、在高水平专业考试和空间推理方面面临挑战，以及对专业或新兴人工智能算法的支持有限。未来的优化应优先考虑增强多模态集成，实现稳健的验证机制，采用自适应学习策略，并使特定领域的微调能够满足城市规划者的专门需求。这些进步将使法学硕士能够更好地支持城市规划不断变化的需求，使专业人员能够更多地关注该领域的战略决策和创造性方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Urban planning in the age of large language models: Assessing OpenAI o1's performance and capabilities across 556 tasks

Integrating Large Language Models (LLMs) into urban planning presents significant opportunities to enhance efficiency and support data-driven city development strategies. Despite their potential, the specific capabilities of LLMs within the urban planning context remain underexplored, and the field lacks standardized benchmarks for systematic evaluation. This study presents the first comprehensive evaluation focused on OpenAI o1's performance and capabilities in urban planning, systematically benchmarking it against GPT-3.5 and GPT-4o using an original open-source benchmark comprising 556 tasks across five critical categories: urban planning documentation, examinations, routine data analysis, AI algorithm support, and thesis writing. Through rigorous testing and manual analysis of 170,627 words of generated output, OpenAI o1 consistently outperformed its counterparts, achieving an average performance score of 84.08 % compared to 69.30 % for GPT-4o and 45.27 % for GPT-3.5. Our findings highlight o1's strengths in domain knowledge mastery, basic operational competence, and coding capabilities, demonstrating its potential applications in information retrieval, urban data analytics, planning decision support, educational assistance, and LLM-based agent development. However, significant limitations were identified, including inability in urban design, susceptibility to fabricating information, moderate academic writing quality, challenges in high-level professional examinations, and spatial reasoning, and limited support for specialized or emerging AI algorithms. Future optimizations should prioritize enhancing multimodal integration, implementing robust validation mechanisms, adopting adaptive learning strategies, and enabling domain-specific fine-tuning to meet urban planners' specialized needs. These advancements would enable LLMs to better support the evolving demands of urban planning, allowing professionals to focus more on strategic decision-making and the creative aspects of the field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers Environment and Urban Systems Multiple-

CiteScore

13.30

自引率

7.40%

发文量

111

审稿时长

32 days

期刊介绍： Computers, Environment and Urban Systemsis an interdisciplinary journal publishing cutting-edge and innovative computer-based research on environmental and urban systems, that privileges the geospatial perspective. The journal welcomes original high quality scholarship of a theoretical, applied or technological nature, and provides a stimulating presentation of perspectives, research developments, overviews of important new technologies and uses of major computational, information-based, and visualization innovations. Applied and theoretical contributions demonstrate the scope of computer-based analysis fostering a better understanding of environmental and urban systems, their spatial scope and their dynamics.