International Journal of Machine Learning and Cybernetics最新文献

筛选
英文 中文
Color attention tracking with score matching 色彩注意力跟踪与分数匹配
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-24 DOI: 10.1007/s13042-024-02316-y
Xuedong He, Jiehui Huang
{"title":"Color attention tracking with score matching","authors":"Xuedong He, Jiehui Huang","doi":"10.1007/s13042-024-02316-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02316-y","url":null,"abstract":"<p>It is an ordinary practice that deep networks are utilized to extract deep features from RGB images. Typically, the popular trackers adopt pre-trained ResNet as a backbone to extract target features, achieving excellent performance. Moreover, Staple has shown that color statistics have complementary cues, while the combination of color statistics and deep features in a unified deep framework has rarely been reported. Therefore, we employ color statistics to construct color attention maps, which are encoded into the deep network to guide the generation of target-aware feature maps. Additionally, DCF-based trackers have an online update module to dynamically update the tracking model, it is particularly necessary to collect reliable target samples. Hence, we refer to the template matching thought to design a score matching method, which is intended to score the tracked targets, this method has the advantage of considering the target extent. In this paper, we conduct sufficient ablation analyses on the color attention module and score matching method to verify their effectiveness. Furthermore, our approaches are combined into the DCF frameworks to construct two brand-new trackers, and both quantitative and qualitative results demonstrate that our trackers can perform favorably against recent and far more sophisticated trackers on multiple public benchmarks.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dap-SiMT: divergence-based adaptive policy for simultaneous machine translation Dap-SiMT:基于发散的同声机器翻译自适应策略
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-23 DOI: 10.1007/s13042-024-02323-z
Libo Zhao, Ziqian Zeng
{"title":"Dap-SiMT: divergence-based adaptive policy for simultaneous machine translation","authors":"Libo Zhao, Ziqian Zeng","doi":"10.1007/s13042-024-02323-z","DOIUrl":"https://doi.org/10.1007/s13042-024-02323-z","url":null,"abstract":"<p>In the realm of Simultaneous Machine Translation (SiMT), a robust read/write (R/W) policy is essential alongside a high-quality translation model. Traditional methods typically employ either a fixed wait-<i>k</i> policy in sync with a wait-<i>k</i> translation model or an adaptive policy that is co-developed with a dedicated translation model. This study introduces a more versatile approach by decoupling the adaptive policy from the translation model. Our rationale is based on the finding that an independent multi-path wait-<i>k</i> model, when combined with adaptive policies utilized in advanced SiMT systems, can perform competitively. Specifically, we present DaP, a divergence-based adaptive policy, which dynamically adjusts read/write decisions for any translation model, taking into account potential divergence in translation distributions resulting from future information. Extensive experiments across multiple benchmarks reveal that our method significantly enhances the balance between translation accuracy and latency, surpassing strong baselines.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TL-LFF Net: transfer learning based lighter, faster, and frozen network for the detection of multi-scale mixed intracranial hemorrhages through genetic optimization algorithm TL-LFF网络:通过遗传优化算法检测多尺度混合颅内出血的基于迁移学习的更轻、更快和冷冻网络
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-23 DOI: 10.1007/s13042-024-02324-y
Lakshmi Prasanna Kothala, Sitaramanjaneya Reddy Guntur
{"title":"TL-LFF Net: transfer learning based lighter, faster, and frozen network for the detection of multi-scale mixed intracranial hemorrhages through genetic optimization algorithm","authors":"Lakshmi Prasanna Kothala, Sitaramanjaneya Reddy Guntur","doi":"10.1007/s13042-024-02324-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02324-y","url":null,"abstract":"<p>Computed tomography (CT) is the most commonly used imaging method in intracranial hemorrhage (ICH). Although deep learning (DL) models are well suited for detecting and segmenting multi-class hemorrhages, localizing multi-scale mixed hemorrhages with limited resources such as bounding boxes is difficult. To address this issue, the current study proposes a novel transfer learning-based TL-LFF Network. To detect multi-scale mixed hemorrhages, the proposed model employs a backbone module that extracts in-depth features from the input images, and a spatial pyramid pooling faster layer that performs the pooling operation at various levels. In the neck section, a path aggregated network (PANet) is used to store spatial information. Furthermore, to achieve a lightweight nature, the proposed backbone and neck modules were frozen during the backpropagation stage, resulting in a decrease in detection accuracy. To improve detection capability while remaining lightweight, a concept known as transfer learning is used. This strategy significantly improves the accuracy of the proposed model. In addition, the Genetic Algorithm (GA) concept is used to optimize the hyperparameters, where the mutation is used to develop new offspring based on previous generations. The brain hemorrhage extended dataset was used to train and validate the proposed model. In terms of detection metrics and lightweight criteria, the experimental results showed that the proposed model performed better when compared to other existing models. As a result, we can use the proposed model in the clinical implementation stage to reduce the radiologist's CT scan read time.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hierarchical dual-view model for fake news detection guided by discriminative lexicons 以判别词典为指导的分层双视角假新闻检测模型
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-23 DOI: 10.1007/s13042-024-02322-0
Sijia Yang, Xianyong Li, Yajun Du, Dong Huang, Xiaoliang Chen, Yongquan Fan, Shumin Wang
{"title":"A hierarchical dual-view model for fake news detection guided by discriminative lexicons","authors":"Sijia Yang, Xianyong Li, Yajun Du, Dong Huang, Xiaoliang Chen, Yongquan Fan, Shumin Wang","doi":"10.1007/s13042-024-02322-0","DOIUrl":"https://doi.org/10.1007/s13042-024-02322-0","url":null,"abstract":"<p>Fake news detection aims to automatically identify the credibility of source posts, mitigating potential societal harm and conserving human resources. Textual fake news detection methods can be categorized into pattern- and fact-based. Pattern-based models focus on identifying shared writing patterns in source posts, while fact-based models leverage auxiliary external knowledge. Researchers have recently attempted to merge these two views into a comprehensive detection system, achieving superior performance to single-view methods. However, existing dual-view methods often prioritize integrating single-view methods over exploring nuanced characteristics of both perspectives. To address this, we propose a novel hierarchical dual-view model for fake news detection guided by discriminative lexicons. First, we construct two lexicons based on distinct word usage tendencies in fake and real news and further augment them with synonyms sourced from large language models. We then devise a hierarchical attention network to derive semantic representations for the source post, incorporating a lexicon attention loss to guide the prioritization of words from the two lexicons. Subsequently, a lexicon-guided interaction network is employed to model the relations between the source post and its relevant articles, assigning authenticity-aware weights to each article. Finally, the representations of source post and relevant articles are concatenated for joint detection. According to experimental results, our model outperforms many competitive baselines in terms of the macro F1 score ranging from 1.1% to 10.5% on Weibo and from 3.2% to 10.8% on Twitter.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anchor-based Domain Adaptive Hashing for unsupervised image retrieval 基于锚点的无监督图像检索领域自适应哈希算法
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-21 DOI: 10.1007/s13042-024-02298-x
Yonghao Chen, Xiaozhao Fang, Yuanyuan Liu, Xi Hu, Na Han, Peipei Kang
{"title":"Anchor-based Domain Adaptive Hashing for unsupervised image retrieval","authors":"Yonghao Chen, Xiaozhao Fang, Yuanyuan Liu, Xi Hu, Na Han, Peipei Kang","doi":"10.1007/s13042-024-02298-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02298-x","url":null,"abstract":"<p>Traditional image retrieval methods suffer from a significant performance degradation when the model is trained on the target dataset and run on another dataset. To address this issue, Domain Adaptive Retrieval (DAR) has emerged as a promising solution, specifically designed to overcome domain shifts in retrieval tasks. However, existing unsupervised DAR methods still face two primary limitations: (1) they under-explore the intrinsic structure among domains, resulting in limited generalization capabilities; and (2) the models are often too complex to be applied to large-scale datasets. To tackle these limitations, we propose a novel unsupervised DAR method named Anchor-based Domain Adaptive Hashing (ADAH). ADAH aims to exploit the commonalities among domains with the assumption that a consensus latent space exists for the source and target domains. To achieve this, an anchor-based similarity reconstruction scheme is proposed, which learns a set of domain-shared anchors and domain-specific anchor graphs, and then reconstructs the similarity matrix with these anchor graphs, thereby effectively exploiting inter- and intra-domain similarity structures. Subsequently, by treating the anchor graphs as feature embeddings, we solve the Distance-Distance Difference Minimization (DDDM) problem between them and their corresponding hash codes. This preserves the similarity structure of the similarity matrix in the hash code. Finally, a two-stage strategy is employed to derive the hash function, ensuring its effectiveness and scalability. Experimental results on four datasets demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extraction of entity relationships serving the field of agriculture food safety regulation 提取农业食品安全监管领域的实体关系
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-21 DOI: 10.1007/s13042-024-02304-2
Zhihua Zhao, Yiming Liu, Dongdong Lv, Ruixuan Li, Xudong Yu, Dianhui Mao
{"title":"Extraction of entity relationships serving the field of agriculture food safety regulation","authors":"Zhihua Zhao, Yiming Liu, Dongdong Lv, Ruixuan Li, Xudong Yu, Dianhui Mao","doi":"10.1007/s13042-024-02304-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02304-2","url":null,"abstract":"<p>Agriculture food (agri-food) safety is closely related to all aspects of people's lives. In recent years, with the emergence of deep learning technology based on big data, the extraction of information relations in the field of agri-food safety supervision has become a research hotspot. However, most of the current work only expands the relationship recognition based on the traditional named entity recognition task, which makes it difficult to establish a true 'connection' between entities and relationships. The pipelined and federated extraction architectures that have emerged in this area are problematic in practice. In addition, the contextual information of the text corpus in the agri-food safety regulatory domain has not been fully utilized. To address the above issues, this paper proposes a semi-joint entity relationship extraction model (EB-SJRE) based on contextual entity boundary features. Firstly, a Token pair subject-object correspondence matrix label is designed to intuitively model the subject-object boundary, which is more friendly to complex entities in the field of agri-food safety regulation. Secondly, the dynamic fine-tuning of Bert makes the text embedding more relevant to the textual context of the agri-food safety regulation domain. Finally, we introduce an attention mechanism in the Token pair tagging framework to capture deep semantic subject-object boundary association information, which cleverly solves the problem of bias exposure due to the pipeline structure and the dimensional explosion due to the joint extraction structure. The experimental results show that our model achieves the best F1-score of 88.71% on agri-food safety regulation domain data and F1-scores of 92.36%, 92.80%, 88.91%, and 92.21% on NYT, NYT-star, WebNLG, and WebNLG-star, respectively. This indicates that EB-SJRE has excellent generalization ability in both the agri-food safety regulatory and public sectors.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing automated street crime detection: a drone-based system integrating CNN models and enhanced feature selection techniques 推进街头犯罪自动检测:基于无人机的系统集成了 CNN 模型和增强型特征选择技术
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-20 DOI: 10.1007/s13042-024-02315-z
Lakshma Reddy Vuyyuru, NagaMalleswara Rao Purimetla, Kancharakunt Yakub Reddy, Sai Srinivas Vellela, Sk Khader Basha, Ramesh Vatambeti
{"title":"Advancing automated street crime detection: a drone-based system integrating CNN models and enhanced feature selection techniques","authors":"Lakshma Reddy Vuyyuru, NagaMalleswara Rao Purimetla, Kancharakunt Yakub Reddy, Sai Srinivas Vellela, Sk Khader Basha, Ramesh Vatambeti","doi":"10.1007/s13042-024-02315-z","DOIUrl":"https://doi.org/10.1007/s13042-024-02315-z","url":null,"abstract":"<p>This study presents a pioneering solution to the growing challenge of escalating global crime rates through the introduction of an automated drone-based street crime detection system. Leveraging advanced Convolutional Neural Network (CNN) models, the system integrates several key components for analyzing images captured by drones. Initially, the Embedding Bilateral Filter (EBF) technique divides images into base and detail layers to enhance detection accuracy. The fusion model, IR with attention-based Conv-ViT, combines Inception-V3, ResNet-50, and Convolution Vision Transformer (Conv-ViT) to capture both shape and texture details efficiently. Further enhancement is achieved through the Improved Shark Smell Optimization Algorithm (ISSOA), which optimizes feature selection and minimizes redundancy in image extraction. Additionally, a Multi-scale Contextual Semantic Guidance Network (MCS-GNet) ensures robust image classification by integrating features from multiple layers to prevent data loss. Evaluation on the UCF-Crime and UCSD Ped2 datasets demonstrates superior accuracy, with remarkable results of 0.783 and 0.974, respectively. This innovative approach offers a promising solution to the arduous and continuous task of monitoring security camera feeds for suspicious activities, thereby addressing the pressing need for automated crime detection systems on a global scale.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-stage zero-shot object detection network based on CLIP and pseudo-labeling 基于 CLIP 和伪标记的单级零镜头物体检测网络
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-20 DOI: 10.1007/s13042-024-02321-1
Jiafeng Li, Shengyao Sun, Kang Zhang, Jing Zhang, Li Zhuo
{"title":"Single-stage zero-shot object detection network based on CLIP and pseudo-labeling","authors":"Jiafeng Li, Shengyao Sun, Kang Zhang, Jing Zhang, Li Zhuo","doi":"10.1007/s13042-024-02321-1","DOIUrl":"https://doi.org/10.1007/s13042-024-02321-1","url":null,"abstract":"<p>The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model’s ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model’s ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ETCGN: entity type-constrained graph networks for document-level relation extraction ETCGN:用于文档级关系提取的实体类型受限图网络
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-20 DOI: 10.1007/s13042-024-02293-2
Hangxiao Yang, Changpu Chen, Shaokai Zhang, Baiyang Chen, Chang Liu, Qilin Li
{"title":"ETCGN: entity type-constrained graph networks for document-level relation extraction","authors":"Hangxiao Yang, Changpu Chen, Shaokai Zhang, Baiyang Chen, Chang Liu, Qilin Li","doi":"10.1007/s13042-024-02293-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02293-2","url":null,"abstract":"<p>Document-level relation extraction aims at discerning semantic connections between entities within a given document. Compared with sentence-level relation extraction settings, the complexity of document-level relation extraction lies in necessitating models to exhibit the capability to infer semantic relations across multiple sentences. In this paper, we propose a novel model, named Entity Type-Constrained Graph Network (ETCGN). The proposed model utilizes a graph structure to capture intricate interactions among diverse mentions within the document. Moreover, it aggregates references to the same entity while integrating path-based reasoning mechanisms to deduce relations between entities. Furthermore, we present a novel constraint method that capitalizes on entity types to confine the scope of potential relations. Experimental results on two public dataset (DocRED and HacRED) show that our model outperforms a number of baselines and achieves state-of-the-art performance. Further analysis verifies the effectiveness of type-based constraints and path-based reasoning mechanisms. Our code is available at: https://github.com/yhx30/ETCGN.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint feature fusion hashing for cross-modal retrieval 用于跨模态检索的联合特征融合哈希算法
IF 5.6 3区 计算机科学
International Journal of Machine Learning and Cybernetics Pub Date : 2024-08-20 DOI: 10.1007/s13042-024-02309-x
Yuxia Cao
{"title":"Joint feature fusion hashing for cross-modal retrieval","authors":"Yuxia Cao","doi":"10.1007/s13042-024-02309-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02309-x","url":null,"abstract":"<p>Cross-modal hashing retrieval maps data from different modalities into a common low-dimensional hash code space, enabling fast and efficient retrieval. Recently, there has been a growing interest in the cross-modal hashing retrieval approach. Nonetheless, a significant number of current methodologies overlook the influence of semantically rich features on retrieval performance. In addition, class attribute embedding is often forgotten in cross-modal feature fusion, which is crucial for learning more discriminative hash codes. To meet these challenges, we put forward a novel method, namely joint feature fusion hashing (JFFH) for cross-modal retrieval. Specifically, we use the fast language image pre-training model as the feature coding module of cross-modal data. To more effectively mitigate semantic disparities between modalities, we introduce a multimodal contrastive learning loss to strengthen the interaction between modalities and improve the semantic representation of modalities. In addition, we extract class attribute features as class embedding and integrate them with cross-modal features to enhance the semantic relationship within the fused features. To better capture both inter-modal and intra-modal dependencies as well as semantic relevance, we integrate the self-attention mechanism into the multi-modal fusion transformer encoder to facilitate efficient feature fusion. Besides, we apply label-wise high-level semantic similarity and feature-wise low-level semantic similarity to enhance the discrimination of hash codes. Our JFFH method shows better retrieval performance in large-scale cross-modal retrieval.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信