打破语言学与人工智能之间的界限

IF 3.4 3区管理学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Organizational and End User Computing Pub Date : 2023-11-21 DOI:10.4018/joeuc.334013

Jinhai Wang, Yi Tie, Xia Jiang, Yilin Xu

{"title":"打破语言学与人工智能之间的界限","authors":"Jinhai Wang, Yi Tie, Xia Jiang, Yilin Xu","doi":"10.4018/joeuc.334013","DOIUrl":null,"url":null,"abstract":"There is a wide connection between linguistics and artificial intelligence (AI), including the multimodal language matching. Multi-modal robots possess the capability to process various sensory modalities, including vision, auditory, language, and touch, offering extensive prospects for applications across various domains. Despite significant advancements in perception and interaction, the task of visual-language matching remains a challenging one for multi-modal robots. Existing methods often struggle to achieve accurate matching when dealing with complex multi-modal data, leading to potential misinterpretation or incomplete understanding of information. Additionally, the heterogeneity among different sensory modalities adds complexity to the matching process. To address these challenges, we propose an approach called vision-language matching with semantically aligned embeddings (VLMS), aimed at improving the visual-language matching performance of multi-modal robots.","PeriodicalId":49029,"journal":{"name":"Journal of Organizational and End User Computing","volume":"12 5","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Breaking Boundaries Between Linguistics and Artificial Intelligence\",\"authors\":\"Jinhai Wang, Yi Tie, Xia Jiang, Yilin Xu\",\"doi\":\"10.4018/joeuc.334013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a wide connection between linguistics and artificial intelligence (AI), including the multimodal language matching. Multi-modal robots possess the capability to process various sensory modalities, including vision, auditory, language, and touch, offering extensive prospects for applications across various domains. Despite significant advancements in perception and interaction, the task of visual-language matching remains a challenging one for multi-modal robots. Existing methods often struggle to achieve accurate matching when dealing with complex multi-modal data, leading to potential misinterpretation or incomplete understanding of information. Additionally, the heterogeneity among different sensory modalities adds complexity to the matching process. To address these challenges, we propose an approach called vision-language matching with semantically aligned embeddings (VLMS), aimed at improving the visual-language matching performance of multi-modal robots.\",\"PeriodicalId\":49029,\"journal\":{\"name\":\"Journal of Organizational and End User Computing\",\"volume\":\"12 5\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Organizational and End User Computing\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.4018/joeuc.334013\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Organizational and End User Computing","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.4018/joeuc.334013","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

语言学与人工智能（AI）之间有着广泛的联系，其中包括多模态语言匹配。多模态机器人具有处理各种感官模式的能力，包括视觉、听觉、语言和触觉，为各个领域提供了广泛的应用前景。尽管在感知和交互方面取得了重大进展，但视觉语言匹配任务对于多模态机器人来说仍然是一项具有挑战性的任务。在处理复杂的多模态数据时，现有方法往往难以实现准确匹配，从而导致对信息的潜在误读或不完全理解。此外，不同感官模式之间的异质性也增加了匹配过程的复杂性。为了应对这些挑战，我们提出了一种名为 "视觉语言匹配与语义对齐嵌入（VLMS）"的方法，旨在提高多模态机器人的视觉语言匹配性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Breaking Boundaries Between Linguistics and Artificial Intelligence

There is a wide connection between linguistics and artificial intelligence (AI), including the multimodal language matching. Multi-modal robots possess the capability to process various sensory modalities, including vision, auditory, language, and touch, offering extensive prospects for applications across various domains. Despite significant advancements in perception and interaction, the task of visual-language matching remains a challenging one for multi-modal robots. Existing methods often struggle to achieve accurate matching when dealing with complex multi-modal data, leading to potential misinterpretation or incomplete understanding of information. Additionally, the heterogeneity among different sensory modalities adds complexity to the matching process. To address these challenges, we propose an approach called vision-language matching with semantically aligned embeddings (VLMS), aimed at improving the visual-language matching performance of multi-modal robots.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Organizational and End User Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

6.00

自引率

9.20%

发文量

期刊介绍： The Journal of Organizational and End User Computing (JOEUC) provides a forum to information technology educators, researchers, and practitioners to advance the practice and understanding of organizational and end user computing. The journal features a major emphasis on how to increase organizational and end user productivity and performance, and how to achieve organizational strategic and competitive advantage. JOEUC publishes full-length research manuscripts, insightful research and practice notes, and case studies from all areas of organizational and end user computing that are selected after a rigorous blind review by experts in the field.