Your Prompt is My Command: On Assessing the Human-Centred Generality of Multimodal Models

IF 4.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research Pub Date : 2023-06-12 DOI:10.1613/jair.1.14157

Wout Schellaert, Fernando Martínez-Plumed, Karina Vold, John Burden, Pablo Antonio Moreno Casares, B. S. Loe, Roi Reichart, Seán Ó hÉigeartaigh, A. Korhonen, J. Hernández-Orallo

{"title":"Your Prompt is My Command: On Assessing the Human-Centred Generality of Multimodal Models","authors":"Wout Schellaert, Fernando Martínez-Plumed, Karina Vold, John Burden, Pablo Antonio Moreno Casares, B. S. Loe, Roi Reichart, Seán Ó hÉigeartaigh, A. Korhonen, J. Hernández-Orallo","doi":"10.1613/jair.1.14157","DOIUrl":null,"url":null,"abstract":"Even with obvious deficiencies, large prompt-commanded multimodal models are proving to be flexible cognitive tools representing an unprecedented generality. But the directness, diversity, and degree of user interaction create a distinctive “human-centred generality” (HCG), rather than a fully autonomous one. HCG implies that —for a specific user— a system is only as general as it is effective for the user’s relevant tasks and their prevalent ways of prompting. A human-centred evaluation of general-purpose AI systems therefore needs to reflect the personal nature of interaction, tasks and cognition. We argue that the best way to understand these systems is as highly-coupled cognitive extenders, and to analyse the bidirectional cognitive adaptations between them and humans. In this paper, we give a formulation of HCG, as well as a high-level overview of the elements and trade-offs involved in the prompting process. We end the paper by outlining some essential research questions and suggestions for improving evaluation practices, which we envision as characteristic for the evaluation of general artificial intelligence in the future.\nThis paper appears in the AI & Society track.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"11 1","pages":"377-394"},"PeriodicalIF":4.5000,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1613/jair.1.14157","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Even with obvious deficiencies, large prompt-commanded multimodal models are proving to be flexible cognitive tools representing an unprecedented generality. But the directness, diversity, and degree of user interaction create a distinctive “human-centred generality” (HCG), rather than a fully autonomous one. HCG implies that —for a specific user— a system is only as general as it is effective for the user’s relevant tasks and their prevalent ways of prompting. A human-centred evaluation of general-purpose AI systems therefore needs to reflect the personal nature of interaction, tasks and cognition. We argue that the best way to understand these systems is as highly-coupled cognitive extenders, and to analyse the bidirectional cognitive adaptations between them and humans. In this paper, we give a formulation of HCG, as well as a high-level overview of the elements and trade-offs involved in the prompting process. We end the paper by outlining some essential research questions and suggestions for improving evaluation practices, which we envision as characteristic for the evaluation of general artificial intelligence in the future. This paper appears in the AI & Society track.

查看原文本刊更多论文

你的提示就是我的命令:关于评估以人为中心的多模态模型的一般性

尽管存在明显的缺陷，但大型即时命令多模态模型被证明是一种灵活的认知工具，代表了前所未有的普遍性。但是，用户交互的直接性、多样性和程度创造了一种独特的“以人为中心的普遍性”(HCG)，而不是完全自主的普遍性。HCG意味着——对于一个特定的用户——一个系统只有在它对用户的相关任务和他们普遍的提示方式有效时才具有普遍性。因此，以人为中心的通用人工智能系统评估需要反映交互、任务和认知的个人性质。我们认为，理解这些系统的最好方法是作为高度耦合的认知扩展器，并分析它们与人类之间的双向认知适应。在本文中，我们给出了HCG的配方，以及提示过程中涉及的元素和权衡的高层次概述。最后，我们概述了一些重要的研究问题和改进评估实践的建议，我们认为这是未来通用人工智能评估的特征。这篇论文发表在人工智能与社会轨道上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Artificial Intelligence Research 工程技术-计算机：人工智能

CiteScore

9.60

自引率

4.00%

发文量

审稿时长

4 months

期刊介绍： JAIR(ISSN 1076 - 9757) covers all areas of artificial intelligence (AI), publishing refereed research articles, survey articles, and technical notes. Established in 1993 as one of the first electronic scientific journals, JAIR is indexed by INSPEC, Science Citation Index, and MathSciNet. JAIR reviews papers within approximately three months of submission and publishes accepted articles on the internet immediately upon receiving the final versions. JAIR articles are published for free distribution on the internet by the AI Access Foundation, and for purchase in bound volumes by AAAI Press.