实现稳健的视觉理解：计算机视觉从识别到推理的范式转变

IF 3.2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Ai Magazine Pub Date : 2024-09-22 DOI:10.1002/aaai.12194

Tejas Gokhale

{"title":"实现稳健的视觉理解：计算机视觉从识别到推理的范式转变","authors":"Tejas Gokhale","doi":"10.1002/aaai.12194","DOIUrl":null,"url":null,"abstract":"<p>Models that learn from data are widely and rapidly being deployed today for real-world use, but they suffer from unforeseen failures that limit their reliability. These failures often have several causes such as distribution shift; adversarial attacks; calibration errors; scarcity of data and/or ground-truth labels; noisy, corrupted, or partial data; and limitations of evaluation metrics. But many failures also occur because many modern AI tasks require reasoning beyond pattern matching and such reasoning abilities are difficult to formulate as data-based input–output function fitting. The reliability problem has become increasingly important under the new paradigm of semantic “multimodal” learning. In this article, I will discuss findings from our work to provide avenues for the development of robust and reliable computer vision systems, particularly by leveraging the interactions between vision and language. This article expands upon the invited talk at AAAI 2024 and covers three thematic areas: robustness of visual recognition systems, open-domain reliability for visual reasoning, and challenges and opportunities associated with generative models in vision.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"45 3","pages":"429-435"},"PeriodicalIF":3.2000,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12194","citationCount":"0","resultStr":"{\"title\":\"Towards robust visual understanding: A paradigm shift in computer vision from recognition to reasoning\",\"authors\":\"Tejas Gokhale\",\"doi\":\"10.1002/aaai.12194\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Models that learn from data are widely and rapidly being deployed today for real-world use, but they suffer from unforeseen failures that limit their reliability. These failures often have several causes such as distribution shift; adversarial attacks; calibration errors; scarcity of data and/or ground-truth labels; noisy, corrupted, or partial data; and limitations of evaluation metrics. But many failures also occur because many modern AI tasks require reasoning beyond pattern matching and such reasoning abilities are difficult to formulate as data-based input–output function fitting. The reliability problem has become increasingly important under the new paradigm of semantic “multimodal” learning. In this article, I will discuss findings from our work to provide avenues for the development of robust and reliable computer vision systems, particularly by leveraging the interactions between vision and language. This article expands upon the invited talk at AAAI 2024 and covers three thematic areas: robustness of visual recognition systems, open-domain reliability for visual reasoning, and challenges and opportunities associated with generative models in vision.</p>\",\"PeriodicalId\":7854,\"journal\":{\"name\":\"Ai Magazine\",\"volume\":\"45 3\",\"pages\":\"429-435\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12194\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ai Magazine\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/aaai.12194\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ai Magazine","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aaai.12194","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

如今，从数据中学习的模型正被广泛、快速地部署到现实世界中使用，但这些模型会出现不可预见的故障，从而限制了其可靠性。这些故障通常有几个原因，如分布偏移；对抗性攻击；校准错误；数据和/或地面实况标签稀缺；数据嘈杂、损坏或不完整；以及评估指标的局限性。但是，许多失败的原因还在于，许多现代人工智能任务需要进行模式匹配之外的推理，而这种推理能力很难表述为基于数据的输入输出函数拟合。在语义 "多模态 "学习的新范式下，可靠性问题变得越来越重要。在本文中，我将讨论我们的研究成果，为开发稳健可靠的计算机视觉系统提供途径，特别是通过利用视觉与语言之间的互动。本文是对 2024 年 AAAI 大会特邀演讲的进一步阐述，涵盖三个主题领域：视觉识别系统的鲁棒性、视觉推理的开放域可靠性以及与视觉生成模型相关的挑战和机遇。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards robust visual understanding: A paradigm shift in computer vision from recognition to reasoning

Models that learn from data are widely and rapidly being deployed today for real-world use, but they suffer from unforeseen failures that limit their reliability. These failures often have several causes such as distribution shift; adversarial attacks; calibration errors; scarcity of data and/or ground-truth labels; noisy, corrupted, or partial data; and limitations of evaluation metrics. But many failures also occur because many modern AI tasks require reasoning beyond pattern matching and such reasoning abilities are difficult to formulate as data-based input–output function fitting. The reliability problem has become increasingly important under the new paradigm of semantic “multimodal” learning. In this article, I will discuss findings from our work to provide avenues for the development of robust and reliable computer vision systems, particularly by leveraging the interactions between vision and language. This article expands upon the invited talk at AAAI 2024 and covers three thematic areas: robustness of visual recognition systems, open-domain reliability for visual reasoning, and challenges and opportunities associated with generative models in vision.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ai Magazine 工程技术-计算机：人工智能

CiteScore

3.90

自引率

11.10%

发文量

审稿时长

>12 weeks

期刊介绍： AI Magazine publishes original articles that are reasonably self-contained and aimed at a broad spectrum of the AI community. Technical content should be kept to a minimum. In general, the magazine does not publish articles that have been published elsewhere in whole or in part. The magazine welcomes the contribution of articles on the theory and practice of AI as well as general survey articles, tutorial articles on timely topics, conference or symposia or workshop reports, and timely columns on topics of interest to AI scientists.