基于基本模糊概念的新型多模态生成学习模型

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Huankun Sheng, Hongwei Mo, Tengteng Zhang
{"title":"基于基本模糊概念的新型多模态生成学习模型","authors":"Huankun Sheng, Hongwei Mo, Tengteng Zhang","doi":"10.1007/s12559-024-10336-7","DOIUrl":null,"url":null,"abstract":"<p>Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts\",\"authors\":\"Huankun Sheng, Hongwei Mo, Tengteng Zhang\",\"doi\":\"10.1007/s12559-024-10336-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.</p>\",\"PeriodicalId\":51243,\"journal\":{\"name\":\"Cognitive Computation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12559-024-10336-7\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-024-10336-7","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

多模态模型的设计目的是在单一生成框架内处理不同类型的数据。以往方法的普遍策略是学习不同模态之间共享的联合表征。这些联合表征通常是通过连接特定模态网络的顶层而获得的。最近,在从文本生成图像以及反向生成图像方面取得了重大进展。尽管取得了这些成就,但目前的模型往往忽略了模糊概念的作用,而模糊概念是至关重要的,因为人类的认知过程本身就存在高度的模糊性。因此,识别并纳入模糊概念对于提高多模态认知模型的有效性至关重要。本文提出了一个名为模糊概念学习模型(FCLM)的新框架,用于处理基于模糊概念的模态。FCLM 中不同模态之间的高级抽象由 "模糊概念函数 "表示。经过训练后,FCLM 能够根据属性描述生成图像,并推断输入图像的属性。此外,它还能提出不同抽象程度的模糊概念。我们在 dSprites 和 3D Chairs 数据集上进行了广泛的实验。这些实验的定性和定量结果都证明了拟议框架的有效性和效率。FCLM 将模糊认知机制与环境的统计特征相结合。这个创新的认知启发框架为处理多模态信息提供了一个新的视角。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts

A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts

Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cognitive Computation
Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES
CiteScore
9.30
自引率
3.70%
发文量
116
审稿时长
>12 weeks
期刊介绍: Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信