论 GeoAI 基础模型的机遇与挑战（远景规划论文）

IF 1.2 Q4 REMOTE SENSING

ACM Transactions on Spatial Algorithms and Systems Pub Date : 2024-03-20 DOI:10.1145/3653070

Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, Chris Cundy, Ziyuan Li, Rui Zhu, Ni Lao

{"title":"论 GeoAI 基础模型的机遇与挑战（远景规划论文）","authors":"Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, Chris Cundy, Ziyuan Li, Rui Zhu, Ni Lao","doi":"10.1145/3653070","DOIUrl":null,"url":null,"abstract":"\n Large pre-trained models, also known as\n foundation models\n (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial domains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, the task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a foundation model for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.\n","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)\",\"authors\":\"Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, Chris Cundy, Ziyuan Li, Rui Zhu, Ni Lao\",\"doi\":\"10.1145/3653070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Large pre-trained models, also known as\\n foundation models\\n (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial domains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, the task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a foundation model for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.\\n\",\"PeriodicalId\":43641,\"journal\":{\"name\":\"ACM Transactions on Spatial Algorithms and Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2024-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Spatial Algorithms and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3653070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Spatial Algorithms and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3653070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

摘要

大型预训练模型（也称为基础模型（FM））是在大规模数据上以与任务无关的方式进行训练的，可以通过微调、少量学习甚至零点学习来适应各种下游任务。尽管在语言和视觉任务中取得了成功，但我们尚未看到为地理空间人工智能（GeoAI）开发基础模型的尝试。在这项工作中，我们将探索为 GeoAI 开发多模态基础模型的前景和挑战。我们首先调查了许多现有基础模型的潜力，测试了它们在多个地理空间领域（包括地理空间语义学、健康地理学、城市地理学和遥感）的七项任务中的表现。我们的结果表明，在地名识别、位置描述识别和美国州级/县级痴呆症时间序列预测等只涉及文本模式的地理空间任务中，任务无关的 LLM 在零点学习或少点学习环境下的表现优于特定任务的完全监督模型。然而，在其他地理空间任务中，特别是涉及多种数据模式的任务（如基于 POI 的城市功能分类、基于街景图像的城市噪声强度分类和遥感图像场景分类），现有的基础模型仍然不如特定任务模型。基于这些观察结果，我们提出为 GeoAI 开发基础模型的主要挑战之一是解决地理空间任务的多模态特性。在讨论了每种地理空间数据模式的独特挑战之后，我们提出了多模式基础模型的可能性，该模型可以通过地理空间排列对各种类型的地理空间数据进行推理。最后，我们讨论了为 GeoAI 开发这种模型的独特风险和挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)

Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial domains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, the task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a foundation model for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Spatial Algorithms and Systems REMOTE SENSING-

CiteScore

4.40

自引率

5.30%

发文量

期刊介绍： ACM Transactions on Spatial Algorithms and Systems (TSAS) is a scholarly journal that publishes the highest quality papers on all aspects of spatial algorithms and systems and closely related disciplines. It has a multi-disciplinary perspective in that it spans a large number of areas where spatial data is manipulated or visualized (regardless of how it is specified - i.e., geometrically or textually) such as geography, geographic information systems (GIS), geospatial and spatiotemporal databases, spatial and metric indexing, location-based services, web-based spatial applications, geographic information retrieval (GIR), spatial reasoning and mining, security and privacy, as well as the related visual computing areas of computer graphics, computer vision, geometric modeling, and visualization where the spatial, geospatial, and spatiotemporal data is central.