多模态机器人图像文本匹配方法

IF 3.4 3区管理学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Organizational and End User Computing Pub Date : 2023-12-08 DOI:10.4018/joeuc.334701

Ke Zheng, Zhou Li

{"title":"多模态机器人图像文本匹配方法","authors":"Ke Zheng, Zhou Li","doi":"10.4018/joeuc.334701","DOIUrl":null,"url":null,"abstract":"With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.","PeriodicalId":49029,"journal":{"name":"Journal of Organizational and End User Computing","volume":"40 28","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Image-Text Matching Method for Multi-Modal Robots\",\"authors\":\"Ke Zheng, Zhou Li\",\"doi\":\"10.4018/joeuc.334701\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.\",\"PeriodicalId\":49029,\"journal\":{\"name\":\"Journal of Organizational and End User Computing\",\"volume\":\"40 28\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Organizational and End User Computing\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.4018/joeuc.334701\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Organizational and End User Computing","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.4018/joeuc.334701","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着人工智能和深度学习的快速发展，图像-文本匹配逐渐成为跨模态领域的重要研究课题。实现正确的图像-文本匹配需要对视觉和文本信息之间的对应关系有深刻的理解。近年来，基于深度学习的图像-文本匹配方法取得了显著的成功。然而，图像-文本匹配需要深入理解模态内信息，并探索图像区域和文本单词之间的细粒度对齐。如何将这两个方面集成到一个模型中仍然是一个挑战。此外，降低模型的内部复杂性，有效地构建和利用先验知识也是值得探索的领域，从而解决现有细粒度匹配方法计算复杂度过高和缺乏多视角匹配的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Image-Text Matching Method for Multi-Modal Robots

With the rapid development of artificial intelligence and deep learning, image-text matching has gradually become an important research topic in cross-modal fields. Achieving correct image-text matching requires a strong understanding of the correspondence between visual and textual information. In recent years, deep learning-based image-text matching methods have achieved significant success. However, image-text matching requires a deep understanding of intra-modal information and the exploration of fine-grained alignment between image regions and textual words. How to integrate these two aspects into a single model remains a challenge. Additionally, reducing the internal complexity of the model and effectively constructing and utilizing prior knowledge are also areas worth exploring, therefore addressing the issues of excessive computational complexity in existing fine-grained matching methods and the lack of multi-perspective matching.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Organizational and End User Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

6.00

自引率

9.20%

发文量

期刊介绍： The Journal of Organizational and End User Computing (JOEUC) provides a forum to information technology educators, researchers, and practitioners to advance the practice and understanding of organizational and end user computing. The journal features a major emphasis on how to increase organizational and end user productivity and performance, and how to achieve organizational strategic and competitive advantage. JOEUC publishes full-length research manuscripts, insightful research and practice notes, and case studies from all areas of organizational and end user computing that are selected after a rigorous blind review by experts in the field.