Pattern Recognition最新文献

筛选
英文 中文
Contrastive Domain Adaptation with Test-Time Training for Out-of-Context News Detection
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-10 DOI: 10.1016/j.patcog.2025.111530
Yimeng Gu , Mengqi Zhang , Ignacio Castro , Shu Wu , Gareth Tyson
{"title":"Contrastive Domain Adaptation with Test-Time Training for Out-of-Context News Detection","authors":"Yimeng Gu ,&nbsp;Mengqi Zhang ,&nbsp;Ignacio Castro ,&nbsp;Shu Wu ,&nbsp;Gareth Tyson","doi":"10.1016/j.patcog.2025.111530","DOIUrl":"10.1016/j.patcog.2025.111530","url":null,"abstract":"<div><div>Out-of-context news is a common type of misinformation on online media platforms. This involves posting a caption, alongside a mismatched news image. Reflecting its importance, researchers have developed models to detect such misinformation. However, a common limitation of these models is that they only consider the scenario where pre-labelled data is available for each news topic or agency, failing to address the out-of-context news detection on unverified news of other topics or agencies. In this work, we therefore focus on <em>domain adaptive</em> out-of-context news detection. We regard news topic or news agency as the <em>domain</em>. In order to effectively adapt the detection model to unlabelled news topics or agencies, we propose <u>Con</u>trastive <u>D</u>omain <u>A</u>daptation with <u>T</u>est-<u>T</u>ime <u>T</u>raining (ConDA-TTT). It first applies contrastive learning to learn a more separable representation space for news inputs, and then uses maximum mean discrepancy (MMD) to remove the domain-specific features so as to keep the domain-invariant features. During test time, it uses the trained model to predict pseudo labels for the target domain test data, and selects those with higher confidence scores to train the classifier of the model, in order to further adapt the model to the target domain data distribution. This approach adapts the model at both training and test phase, making the domain adaptation more robust to distribution shifts. Experimental results demonstrate that our approach outperforms state-of-the-art baselines in all the domain adaptation settings on two benchmark datasets, by as much as 2.6% in F1 and 2.4% in accuracy.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111530"},"PeriodicalIF":7.5,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contribution-based imbalanced hybrid resampling ensemble
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-10 DOI: 10.1016/j.patcog.2025.111553
Lingyun Zhao , Fei Han , Qinghua Ling , Yubin Ge , Yuze Zhang , Qing Liu , Henry Han
{"title":"Contribution-based imbalanced hybrid resampling ensemble","authors":"Lingyun Zhao ,&nbsp;Fei Han ,&nbsp;Qinghua Ling ,&nbsp;Yubin Ge ,&nbsp;Yuze Zhang ,&nbsp;Qing Liu ,&nbsp;Henry Han","doi":"10.1016/j.patcog.2025.111553","DOIUrl":"10.1016/j.patcog.2025.111553","url":null,"abstract":"<div><div>Resampling is an effective method for addressing data imbalance. Prevailing methods adjust the data distribution by either describing information or noise, and exhibit superiority in many scenarios. However, current studies face challenges in considering both information and noise simultaneously, as noisy samples usually have high information levels, potentially leading to misestimation. In this paper, a Contribution-Based Hybrid Resampling Ensemble (CHRE) is proposed to address the correlation problem between information and noise. CHRE is a semi-supervised algorithm based on a novel Global Unified Data Evaluation (GUDE) framework. Firstly, GUDE describes sample contribution by redefining the information and noise levels. Subsequently, based on sample contribution, CHRE removes negatively contributing majority samples, and oversamples minority samples Concurrently, pseudo-labels related to these minority samples are included in the oversampling. Throughout this process, CHRE resamples based on the sample contribution and optimizes the model. GUDE provides sample contribution based on the model feedback, with both interacting for iterative optimization. Extensive experiments are conducted on 53 benchmark datasets, involving three base classifiers and 13 state-of-the-art imbalance algorithms. The results demonstrate significant advantages of CHRE. Noise studies further indicate the high robustness of CHRE.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111553"},"PeriodicalIF":7.5,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A bijective inference network for interpretable identification of RNA N6-methyladenosine modification sites
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-10 DOI: 10.1016/j.patcog.2025.111541
Guodong Li , Yue Yang , Dongxu Li , Xiaorui Su , Zhi Zeng , Pengwei Hu , Lun Hu
{"title":"A bijective inference network for interpretable identification of RNA N6-methyladenosine modification sites","authors":"Guodong Li ,&nbsp;Yue Yang ,&nbsp;Dongxu Li ,&nbsp;Xiaorui Su ,&nbsp;Zhi Zeng ,&nbsp;Pengwei Hu ,&nbsp;Lun Hu","doi":"10.1016/j.patcog.2025.111541","DOIUrl":"10.1016/j.patcog.2025.111541","url":null,"abstract":"<div><div>The accurate identification of N<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>-methyladenosine (m<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>A) modification sites is crucial for unraveling various functional mechanisms. While existing methods primarily focus on learning high-quality embeddings of RNA sequences for this task, few of them consider incorporating specific RNA secondary structures, limiting their interpretability for in-depth post-transcriptional analysis. In this work, we introduce a novel bijective inference network, named m<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>A-BIN, which integrates RNA sequences and secondary structures within a unified parameter-shared framework, enhancing the accuracy of m<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>A modification site identification through the auxiliary supervision of RNA secondary structures. To begin with, m<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>A-BIN constructs sequential and structural graphs from RNA sequences and secondary structures, respectively. Bijective mapping functions are then specifically designed to couple the procedures of graph representation learning and interpretable dependency inference, providing informative supervision for learning sequential and structural embeddings of RNA. By fusing these two types of RNA embeddings, m<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>A-BIN efficiently performs the identification task. The attribution phase of m<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>A-BIN further ascribes the prediction results to nucleotide dependencies acquired during the interpretable dependency inference, including RNA sequence and structural patterns, thereby enhancing its interpretability. Extensive experimental results demonstrate the promising performance of m<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>A-BIN, showcasing its efficacy in terms of both accuracy and interpretability for the identification of novel m<span><math><msup><mrow></mrow><mrow><mn>6</mn></mrow></msup></math></span>A modification sites.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111541"},"PeriodicalIF":7.5,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143636761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-Shot Sketch-Based Image Retrieval with teacher-guided and student-centered cross-modal bidirectional knowledge distillation
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-09 DOI: 10.1016/j.patcog.2025.111529
Jiale Du , Yang Liu , Xinbo Gao , Jungong Han , Lei Zhang
{"title":"Zero-Shot Sketch-Based Image Retrieval with teacher-guided and student-centered cross-modal bidirectional knowledge distillation","authors":"Jiale Du ,&nbsp;Yang Liu ,&nbsp;Xinbo Gao ,&nbsp;Jungong Han ,&nbsp;Lei Zhang","doi":"10.1016/j.patcog.2025.111529","DOIUrl":"10.1016/j.patcog.2025.111529","url":null,"abstract":"<div><div>In the context of zero-shot learning, the task of using unseen-class sketches as queries to retrieve real images is referred to as Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR). The ZS-SBIR task aims to generalize knowledge learned from known categories to unknown ones. Current research primarily relies on fine-tuning networks via loss functions or unidirectionally extracting knowledge from fixed-parameter teacher models for training student models. However, unidirectional knowledge extraction from teacher models often lacks mutual learning and knowledge alignment between the teacher and student models, while fine-tuning networks via loss functions struggles to handle both photo and sketch modalities simultaneously. Therefore, we designed a modal perception and distribution alignment scheme based on gradient weighting to explore both photo and sketch features bidirectionally and deeply investigate the relationships between different modalities. Building on this, we propose a teacher-guided and student-centered cross-modal bidirectional knowledge distillation framework. During training, the student and teacher models mutually learn discriminative information based on the relationships between different modalities and synchronize their parameters under the guidance of the teacher model, thus effectively achieving cross-modal alignment. Extensive experiments conducted on the TU-Berlin Ext, Sketchy Ext and QuickDraw Ext datasets demonstrate that our method significantly enhances retrieval performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111529"},"PeriodicalIF":7.5,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based image captioning
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-08 DOI: 10.1016/j.patcog.2025.111522
Dongsheng Xu , Qingbao Huang , Xingmao Zhang , Haonan Cheng , Feng Shuang , Yi Cai
{"title":"DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based image captioning","authors":"Dongsheng Xu ,&nbsp;Qingbao Huang ,&nbsp;Xingmao Zhang ,&nbsp;Haonan Cheng ,&nbsp;Feng Shuang ,&nbsp;Yi Cai","doi":"10.1016/j.patcog.2025.111522","DOIUrl":"10.1016/j.patcog.2025.111522","url":null,"abstract":"<div><div>OCR-based image captioning is an important but under-explored task, aiming to generate descriptions containing visual objects and scene text. Recent studies have made encouraging progress, but they are still suffering from a lack of overall understanding of scenes and generating inaccurate captions. One possible reason is that current studies mainly focus on constructing the plane-level geometric relationship of scene text without depth information. This leads to insufficient scene text relational reasoning so that models may describe scene text inaccurately. The other possible reason is that existing methods fail to generate fine-grained descriptions of some visual objects. In addition, they may ignore essential visual objects, leading to the scene text belonging to these ignored objects not being utilized. To address the above issues, we propose a Depth and Visual Concepts Aware Transformer (DEVICE) for OCR-based image captioning. Concretely, to construct three-dimensional geometric relations, we introduce depth information and propose a depth-enhanced feature updating module to ameliorate OCR token features. To generate more precise and comprehensive captions, we introduce semantic features of detected visual concepts as auxiliary information, and propose a semantic-guided alignment module to improve the model’s ability to utilize visual concepts. Our DEVICE is capable of comprehending scenes more comprehensively and boosting the accuracy of described visual entities. Sufficient experiments demonstrate the effectiveness of our proposed DEVICE, which outperforms state-of-the-art models on the TextCaps test set.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111522"},"PeriodicalIF":7.5,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging facial landmarks improves generalization ability for deepfake detection
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-08 DOI: 10.1016/j.patcog.2025.111528
Qi Gao , Baopeng Zhang , Jianghao Wu , Wenxin Luo , Zhu Teng , Jianping Fan
{"title":"Leveraging facial landmarks improves generalization ability for deepfake detection","authors":"Qi Gao ,&nbsp;Baopeng Zhang ,&nbsp;Jianghao Wu ,&nbsp;Wenxin Luo ,&nbsp;Zhu Teng ,&nbsp;Jianping Fan","doi":"10.1016/j.patcog.2025.111528","DOIUrl":"10.1016/j.patcog.2025.111528","url":null,"abstract":"<div><div>Recently, facial forgery technology has become increasingly sophisticated and published datasets aim to cover a wide range of data variations. Existing deepfake detection models have benefited from the powerful feature embedding of deep networks and carefully designed fine-tuning modules, resulting in an excellent performance on in-dataset evaluations. However, the performance declines in cross-dataset evaluations due to various forgery methods and dataset shifts. In this study, we concentrate on the generalization issue of deepfake detection and find that forgery traces appear to gather around the facial interest points even manipulated by different forgery methods. To facilitate this, we propose a Trail Tracing Network (TTNet) to capture the generalized feature representation, which leverages facial landmarks to eliminate redundant information and expand the forged traces in the feature space. We conduct extensive experiments on the widely employed benchmarks, including FaceForensics++, DFDCp, and Celeb-DF. Experimental results demonstrate the outstanding generalization ability of our method against existing state-of-the-art methods by a large margin. In addition, the proposed method also exhibits excellent performance on the in-dataset evaluation.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111528"},"PeriodicalIF":7.5,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NRGAN: A Noise-resilient GAN with adaptive feature modulation for SAR image segmentation
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-08 DOI: 10.1016/j.patcog.2025.111490
Shuo Lian , Jianchao Fan , Jun Wang
{"title":"NRGAN: A Noise-resilient GAN with adaptive feature modulation for SAR image segmentation","authors":"Shuo Lian ,&nbsp;Jianchao Fan ,&nbsp;Jun Wang","doi":"10.1016/j.patcog.2025.111490","DOIUrl":"10.1016/j.patcog.2025.111490","url":null,"abstract":"<div><div>The information extraction of offshore aquaculture rafts from synthetic aperture radar (SAR) images is important for large-scale marine resource exploration and utilization. In this paper, a deep learning model, called Noise-Resilient Generative Adversarial Network (NRGAN), is proposed for SAR image segmentation captured under varying sea conditions to monitor aquaculture rafts. NRGAN consists of an image generator and two regressors. The image generator is used for image segmentation and the regressors for discriminating the generated results and the actual labels. As a key component of the generator, a pixel-level contextual feature adaptation module is designed to improve the performance of the model in dealing with issues such as noise interference and complex image features commonly found in SAR images. The module consists of three parts: one for spatial-feature adaptation to aggregate spatial information from input feature maps and generate a spatial attention map to focus on relevant areas in images, one for contextual-feature adaptation to integrate contextual information for improving feature learning and increasing the expressiveness of input data, and one for pixel-level feature adaptation to refine the contribution of regions within the images, thereby enhancing the coherence of the overall segmentation.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111490"},"PeriodicalIF":7.5,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-light image enhancement via Clustering Contrastive Learning for visual recognition
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-08 DOI: 10.1016/j.patcog.2025.111554
Guanglei Sheng , Gang Hu , Xiaofeng Wang , Wei Chen , Jinlin Jiang
{"title":"Low-light image enhancement via Clustering Contrastive Learning for visual recognition","authors":"Guanglei Sheng ,&nbsp;Gang Hu ,&nbsp;Xiaofeng Wang ,&nbsp;Wei Chen ,&nbsp;Jinlin Jiang","doi":"10.1016/j.patcog.2025.111554","DOIUrl":"10.1016/j.patcog.2025.111554","url":null,"abstract":"<div><div>Visual recognition tasks of low-light images remain a big challenge. We propose an unsupervised low-light image enhancement module that can be integrated into any baseline visual model to enhance the performance. The proposed method is based on Clustering Contrastive Learning and Grad-CAM (Gradient-Class Activation Map) feature alignment, called CCGC. The CCGC method enhances the luminance semantic information of low-light images and remains the semantic feature information focusing. Simulation experimental results on various low-light image datasets demonstrate the significant feature enhancement and generalization capability of CCGC. Evaluation of the established CUB-2011 low-light image dataset shows a substantial increase in classification accuracy across multiple benchmark models. Furthermore, the proposed method significantly improves the classification accuracy on a real low-light traditional Chinese medicine dataset and enhances face detection performance on dark face detection datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111554"},"PeriodicalIF":7.5,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guiding Prototype Networks with label semantics for few-shot text classification
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-08 DOI: 10.1016/j.patcog.2025.111497
Xinyue Liu , Yunlong Gao , Linlin Zong , Wenxin Liang , Bo Xu
{"title":"Guiding Prototype Networks with label semantics for few-shot text classification","authors":"Xinyue Liu ,&nbsp;Yunlong Gao ,&nbsp;Linlin Zong ,&nbsp;Wenxin Liang ,&nbsp;Bo Xu","doi":"10.1016/j.patcog.2025.111497","DOIUrl":"10.1016/j.patcog.2025.111497","url":null,"abstract":"<div><div>Few-shot text classification aims to recognize unseen classes with limited labeled text samples. Typical meta-learning methods, e.g., Prototypical Networks, face several problems. (1) The limited words in each sentence make it difficult to extract fine-grained class-related semantic information. (2) The semantic information from labels is not fully utilized, leading to ambiguities in class definitions. (3) The randomly selected support samples cannot represent their corresponding classes well. In this paper, we propose to leverage label semantics tackling the above problems and present <strong>L</strong>abel <strong>G</strong>uided <strong>P</strong>rototype <strong>N</strong>etworks (LGPN). Firstly, we use prompt encoding to generate text representations instead of aggregating the words in the sentences, extracting more class-related semantic information. Secondly, we propose Label-guided Distance Scaling (LDS), in the training stage, we design label-guided loss to pull the samples closer to their corresponding labels, making class distributions distinguishable. Thirdly, in the testing stage, we scale the text representations with the label semantics to pull each support sample closer to the class center, which reduces the prediction contradictions caused by randomly selected support samples (i.e., unsatisfactory support sample representations). We conduct extensive experiments on six benchmark datasets, and our LGPN shows obvious advantages over state-of-the-art models. Additionally, we further explore the effectiveness and universality of our modules.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111497"},"PeriodicalIF":7.5,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HIAN: A hybrid interactive attention network for multimodal sarcasm detection
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-08 DOI: 10.1016/j.patcog.2025.111535
Yongtang Bao , Xin Zhao , Peng Zhang , Yue Qi , Haojie Li
{"title":"HIAN: A hybrid interactive attention network for multimodal sarcasm detection","authors":"Yongtang Bao ,&nbsp;Xin Zhao ,&nbsp;Peng Zhang ,&nbsp;Yue Qi ,&nbsp;Haojie Li","doi":"10.1016/j.patcog.2025.111535","DOIUrl":"10.1016/j.patcog.2025.111535","url":null,"abstract":"<div><div>Multimodal sarcasm detection aims to use various modalities of data, such as text, images, etc., to identify whether they contain sarcastic meanings. Both images and texts contain rich sarcastic clues, but there are differences in dimension between them, and the quality of the sarcastic information they contain is very different. Therefore, seeking an appropriate feature fusion strategy to align modal features to maximize the utilization of inconsistent relationships between modalities is a significant challenge in this task. To this end, we introduce a novel sarcasm detection fusion model based on multimodal hybrid interactive attention (HIAN). We concatenate class words obtained from images with text and use the proposed bidirectional long short-term memory network with an interactive attention layer to enhance the extraction of text features. The text features obtained in this way can fully capture the contextual information of the text and the supplementary information in the image. To further enhance the feature fusion between modalities, we propose a multimodal interactive attention network and a fusion-enhanced transformer to promote the sharing of high-order complementary information, which represents the complementary non-linear semantic relationship between the three modalities and captures more inconsistencies between modalities. Extensive experiments conducted on publicly available multimodal sarcasm detection benchmark datasets show that our results surpass those of the baseline model and current state-of-the-art methods for the case of using the base BERT model.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111535"},"PeriodicalIF":7.5,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信