空间任务中 ChatGPT-4、Gemini、Claude-3 和 Copilot 的正确性比较

IF 2.1 3区 地球科学 Q2 GEOGRAPHY
Hartwig H. Hochmair, Levente Juhász, Takoda Kemp
{"title":"空间任务中 ChatGPT-4、Gemini、Claude-3 和 Copilot 的正确性比较","authors":"Hartwig H. Hochmair, Levente Juhász, Takoda Kemp","doi":"10.1111/tgis.13233","DOIUrl":null,"url":null,"abstract":"Generative AI including large language models (LLMs) has recently gained significant interest in the geoscience community through its versatile task‐solving capabilities including programming, arithmetic reasoning, generation of sample data, time‐series forecasting, toponym recognition, or image classification. Existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero‐shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, that is, ChatGPT‐4, Gemini, Claude‐3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in the correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.","PeriodicalId":47842,"journal":{"name":"Transactions in GIS","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Correctness Comparison of ChatGPT‐4, Gemini, Claude‐3, and Copilot for Spatial Tasks\",\"authors\":\"Hartwig H. Hochmair, Levente Juhász, Takoda Kemp\",\"doi\":\"10.1111/tgis.13233\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generative AI including large language models (LLMs) has recently gained significant interest in the geoscience community through its versatile task‐solving capabilities including programming, arithmetic reasoning, generation of sample data, time‐series forecasting, toponym recognition, or image classification. Existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero‐shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, that is, ChatGPT‐4, Gemini, Claude‐3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in the correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.\",\"PeriodicalId\":47842,\"journal\":{\"name\":\"Transactions in GIS\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions in GIS\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1111/tgis.13233\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions in GIS","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1111/tgis.13233","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0

摘要

包括大型语言模型(LLMs)在内的生成式人工智能具有多种任务解决能力,包括编程、算术推理、样本数据生成、时间序列预测、地名识别或图像分类,因此最近在地球科学界引起了极大的兴趣。现有的空间任务 LLM 性能评估主要集中在 ChatGPT 上,而其他聊天机器人受到的关注较少。为了缩小这一研究差距,本研究对分配给四个著名聊天机器人(即 ChatGPT-4、Gemini、Claude-3 和 Copilot)的七个任务类别的 76 个空间任务进行了零误差正确性评估。这些聊天机器人在与空间素养、GIS 理论以及程序代码和函数解释相关的任务中表现一般,但在绘图、代码编写和空间推理方面表现较弱。此外,四个聊天机器人在结果的正确性方面也存在显著差异。从分配给每个聊天机器人的重复任务的回答来看,四个聊天机器人的回答具有高度的一致性,大多数任务类别的匹配率都超过了 80%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Correctness Comparison of ChatGPT‐4, Gemini, Claude‐3, and Copilot for Spatial Tasks
Generative AI including large language models (LLMs) has recently gained significant interest in the geoscience community through its versatile task‐solving capabilities including programming, arithmetic reasoning, generation of sample data, time‐series forecasting, toponym recognition, or image classification. Existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero‐shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, that is, ChatGPT‐4, Gemini, Claude‐3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in the correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Transactions in GIS
Transactions in GIS GEOGRAPHY-
CiteScore
4.60
自引率
8.30%
发文量
116
期刊介绍: Transactions in GIS is an international journal which provides a forum for high quality, original research articles, review articles, short notes and book reviews that focus on: - practical and theoretical issues influencing the development of GIS - the collection, analysis, modelling, interpretation and display of spatial data within GIS - the connections between GIS and related technologies - new GIS applications which help to solve problems affecting the natural or built environments, or business
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信