{"title":"基于模糊聚类、owa融合和Siamese神经网络的多模态多媒体信息检索","authors":"Saeid Sattari , Sinan Kalkan , Adnan Yazici","doi":"10.1016/j.fss.2025.109419","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents an end-to-end, scalable, and flexible framework for multimodal multimedia information retrieval (MMIR). This framework is designed to handle multiple data modalities, such as visual, audio, and text, frequently encountered in real-world applications. By integrating these different data types, this framework facilitates a more holistic understanding of information, thus improving the accuracy and reliability of retrieval tasks. One of the strengths of this framework is its ability to learn semantic relationships within and between modalities through advanced deep neural networks. These networks are trained on query-hit pairs generated from query logs. A major innovation of this approach lies in the efficient handling of multimodal data uncertainty through an improved fuzzy clustering technique. Additionally, the search process is refined through the use of triplet-loss Siamese networks for sophisticated reranking, as well as a novel fusion approach using the ordered weighted average (OWA) operator to combine the ranks of different retrieval systems. This framework leverages parallel processing and transfer learning for efficient feature extraction across different modalities, thus significantly improving scalability and adaptability. Performance has been rigorously evaluated through comprehensive testing on six widely recognized multimodal datasets. The results indicate that this integrated approach, which combines clustering ranking, triplet loss Siamese network for reranking, OWA-based fusion, and the alternative adaptive fuzzy means method (AAFCM) for soft clustering, consistently outperforms all previous configurations reported in the literature. Our experimental results, supported by extensive statistical analysis, confirm the effectiveness and robustness of this approach in MMIR.</div></div>","PeriodicalId":55130,"journal":{"name":"Fuzzy Sets and Systems","volume":"515 ","pages":"Article 109419"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal multimedia information retrieval through the integration of fuzzy clustering, OWA-based fusion, and Siamese neural networks\",\"authors\":\"Saeid Sattari , Sinan Kalkan , Adnan Yazici\",\"doi\":\"10.1016/j.fss.2025.109419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents an end-to-end, scalable, and flexible framework for multimodal multimedia information retrieval (MMIR). This framework is designed to handle multiple data modalities, such as visual, audio, and text, frequently encountered in real-world applications. By integrating these different data types, this framework facilitates a more holistic understanding of information, thus improving the accuracy and reliability of retrieval tasks. One of the strengths of this framework is its ability to learn semantic relationships within and between modalities through advanced deep neural networks. These networks are trained on query-hit pairs generated from query logs. A major innovation of this approach lies in the efficient handling of multimodal data uncertainty through an improved fuzzy clustering technique. Additionally, the search process is refined through the use of triplet-loss Siamese networks for sophisticated reranking, as well as a novel fusion approach using the ordered weighted average (OWA) operator to combine the ranks of different retrieval systems. This framework leverages parallel processing and transfer learning for efficient feature extraction across different modalities, thus significantly improving scalability and adaptability. Performance has been rigorously evaluated through comprehensive testing on six widely recognized multimodal datasets. The results indicate that this integrated approach, which combines clustering ranking, triplet loss Siamese network for reranking, OWA-based fusion, and the alternative adaptive fuzzy means method (AAFCM) for soft clustering, consistently outperforms all previous configurations reported in the literature. Our experimental results, supported by extensive statistical analysis, confirm the effectiveness and robustness of this approach in MMIR.</div></div>\",\"PeriodicalId\":55130,\"journal\":{\"name\":\"Fuzzy Sets and Systems\",\"volume\":\"515 \",\"pages\":\"Article 109419\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fuzzy Sets and Systems\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0165011425001587\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fuzzy Sets and Systems","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165011425001587","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Multimodal multimedia information retrieval through the integration of fuzzy clustering, OWA-based fusion, and Siamese neural networks
This paper presents an end-to-end, scalable, and flexible framework for multimodal multimedia information retrieval (MMIR). This framework is designed to handle multiple data modalities, such as visual, audio, and text, frequently encountered in real-world applications. By integrating these different data types, this framework facilitates a more holistic understanding of information, thus improving the accuracy and reliability of retrieval tasks. One of the strengths of this framework is its ability to learn semantic relationships within and between modalities through advanced deep neural networks. These networks are trained on query-hit pairs generated from query logs. A major innovation of this approach lies in the efficient handling of multimodal data uncertainty through an improved fuzzy clustering technique. Additionally, the search process is refined through the use of triplet-loss Siamese networks for sophisticated reranking, as well as a novel fusion approach using the ordered weighted average (OWA) operator to combine the ranks of different retrieval systems. This framework leverages parallel processing and transfer learning for efficient feature extraction across different modalities, thus significantly improving scalability and adaptability. Performance has been rigorously evaluated through comprehensive testing on six widely recognized multimodal datasets. The results indicate that this integrated approach, which combines clustering ranking, triplet loss Siamese network for reranking, OWA-based fusion, and the alternative adaptive fuzzy means method (AAFCM) for soft clustering, consistently outperforms all previous configurations reported in the literature. Our experimental results, supported by extensive statistical analysis, confirm the effectiveness and robustness of this approach in MMIR.
期刊介绍:
Since its launching in 1978, the journal Fuzzy Sets and Systems has been devoted to the international advancement of the theory and application of fuzzy sets and systems. The theory of fuzzy sets now encompasses a well organized corpus of basic notions including (and not restricted to) aggregation operations, a generalized theory of relations, specific measures of information content, a calculus of fuzzy numbers. Fuzzy sets are also the cornerstone of a non-additive uncertainty theory, namely possibility theory, and of a versatile tool for both linguistic and numerical modeling: fuzzy rule-based systems. Numerous works now combine fuzzy concepts with other scientific disciplines as well as modern technologies.
In mathematics fuzzy sets have triggered new research topics in connection with category theory, topology, algebra, analysis. Fuzzy sets are also part of a recent trend in the study of generalized measures and integrals, and are combined with statistical methods. Furthermore, fuzzy sets have strong logical underpinnings in the tradition of many-valued logics.