Neural Networks最新文献

筛选
英文 中文
Balancing user preferences by social networks: A condition-guided social recommendation model for mitigating popularity bias
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-06 DOI: 10.1016/j.neunet.2025.107317
Xin He , Wenqi Fan , Ruobing Wang , Yili Wang , Ying Wang , Shirui Pan , Xin Wang
{"title":"Balancing user preferences by social networks: A condition-guided social recommendation model for mitigating popularity bias","authors":"Xin He ,&nbsp;Wenqi Fan ,&nbsp;Ruobing Wang ,&nbsp;Yili Wang ,&nbsp;Ying Wang ,&nbsp;Shirui Pan ,&nbsp;Xin Wang","doi":"10.1016/j.neunet.2025.107317","DOIUrl":"10.1016/j.neunet.2025.107317","url":null,"abstract":"<div><div>Social recommendation models weave social interactions into their design to provide uniquely personalized recommendation results for users. However, social networks not only amplify the popularity bias in recommendation models, resulting in more frequent recommendation of hot items and fewer long-tail items, but also include a substantial amount of redundant information that is essentially meaningless for the model’s performance. Existing social recommendation models often integrate the entire social network directly, with little effort to filter or adjust social information to mitigate popularity bias introduced by the social network. In this paper, we propose a Condition-Guided Social Recommendation Model (named CGSoRec) to mitigate the model’s popularity bias by denoising the social network and adjusting the weights of user’s social preferences. More specifically, CGSoRec first includes a Condition-Guided Social Denoising Model (CSD) to remove redundant social relations in the social network for capturing users’ social preferences with items more precisely. Then, CGSoRec calculates users’ social preferences based on denoised social network and adjusts the weights in users’ social preferences to make them can counteract the popularity bias present in the recommendation model. At last, CGSoRec includes a Condition-Guided Diffusion Recommendation Model (CGD) to introduce the adjusted social preferences as conditions to control the recommendation results for a debiased direction. Comprehensive experiments on three real-world datasets demonstrate the effectiveness of our proposed method. The anonymous code is in: <span><span>https://anonymous.4open.science/r/CGSoRec-2B72</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107317"},"PeriodicalIF":6.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Huber quantile regression networks
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-05 DOI: 10.1016/j.neunet.2025.107364
Hristos Tyralis , Georgia Papacharalampous , Nilay Dogulu , Kwok P. Chun
{"title":"Deep Huber quantile regression networks","authors":"Hristos Tyralis ,&nbsp;Georgia Papacharalampous ,&nbsp;Nilay Dogulu ,&nbsp;Kwok P. Chun","doi":"10.1016/j.neunet.2025.107364","DOIUrl":"10.1016/j.neunet.2025.107364","url":null,"abstract":"<div><div>Typical machine learning regression applications aim to report the mean or the median of the predictive probability distribution, via training with a squared or an absolute error scoring function. The importance of issuing predictions of more functionals of the predictive probability distribution (quantiles and expectiles) has been recognized as a means to quantify the uncertainty of the prediction. In deep learning (DL) applications, that is possible through quantile and expectile regression neural networks (QRNN and ERNN respectively). Here we introduce deep Huber quantile regression networks (DHQRN) that nest QRNN and ERNN as edge cases. DHQRN can predict Huber quantiles, which are more general functionals in the sense that they nest quantiles and expectiles as limiting cases. The main idea is to train a DL algorithm with the Huber quantile scoring function, which is consistent for the Huber quantile functional. As a proof of concept, DHQRN are applied to predict house prices in Melbourne, Australia and Boston, United States (US). In this context, predictive performances of three DL architectures are discussed along with evidential interpretation of results from two economic case studies. Additional simulation experiments and applications to real-world case studies using open datasets demonstrate a satisfactory absolute performance of DHQRN.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107364"},"PeriodicalIF":6.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learn the global prompt in the low-rank tensor space for heterogeneous federated learning
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-05 DOI: 10.1016/j.neunet.2025.107319
Lele Fu , Sheng Huang , Yuecheng Li , Chuan Chen , Chuanfu Zhang , Zibin Zheng
{"title":"Learn the global prompt in the low-rank tensor space for heterogeneous federated learning","authors":"Lele Fu ,&nbsp;Sheng Huang ,&nbsp;Yuecheng Li ,&nbsp;Chuan Chen ,&nbsp;Chuanfu Zhang ,&nbsp;Zibin Zheng","doi":"10.1016/j.neunet.2025.107319","DOIUrl":"10.1016/j.neunet.2025.107319","url":null,"abstract":"<div><div>Federated learning collaborates with multiple clients to train a global model, enhancing the model generalization while allowing the local data transmission-free and security. However, federated learning currently faces three intractable challenges: (1) The large number of model parameters result in an excessive communication burden. (2) The non-independently and identically distributed local data induces the degradation of global model. (3) The model heterogeneity renders traditional federated aggregation infeasible. To dissipate the three difficulties, we propose to learn the global prompt in the low-rank tensor space (FedGPT) for heterogeneous federated learning. Specifically, we employ the prompts rather than the model parameters as the carrier of local knowledge to achieve the information interaction between multiple clients. Since the prompts only have a very small number of variables, the communication volume is greatly reduced. To cope with the data heterogeneity, the prompts from different clients are stacked into the third-order tensors, on which the tensor singular value decomposition is performed to extract the global information. Furthermore, the proposed FedGPT possesses the ability to handle the model heterogeneity, the local models of different sizes can transfer the knowledge with the help of the prompts to improve the performance. Extensive experiments on three real-world datasets are conducted. Overall, FedGPT outperforms other state-of-the-art compared methods by up to 13.21%, and achieves less than 3% of communication volume of FedAvg, demonstrating the superiority of the proposed FedGPT.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107319"},"PeriodicalIF":6.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143579086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting face forgery detection towards generalization
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-05 DOI: 10.1016/j.neunet.2025.107310
Chunlei Peng , Tao Chen , Decheng Liu , Huiqing Guo , Nannan Wang , Xinbo Gao
{"title":"Revisiting face forgery detection towards generalization","authors":"Chunlei Peng ,&nbsp;Tao Chen ,&nbsp;Decheng Liu ,&nbsp;Huiqing Guo ,&nbsp;Nannan Wang ,&nbsp;Xinbo Gao","doi":"10.1016/j.neunet.2025.107310","DOIUrl":"10.1016/j.neunet.2025.107310","url":null,"abstract":"<div><div>Face forgery detection aims to distinguish AI generated fake faces with real faces. With the rapid development of face forgery creation algorithms, a large number of generative models have been proposed, which gradually reduce the local distortion phenomenon or the specific frequency traces in these models. At the same time, in the process of face data compression and transmission, distortion phenomenon and specific frequency cues could be eliminated, which brings severe challenges to the performance and generalization ability of face forgery detection. To promote the progress on face forgery detection research towards generalization, we present the first comprehensive overview and in-depth analysis of the generalizable face forgery detection methods. We categorize the target of generalizable face forgery detection into the robustness on novel and unknown forged images, and robustness on damaged low-quality images. We discuss representative generalization strategies including the aspects of data augmentation, multi-source learning, fingerprints detection, feature enhancement, temporal analysis, vision-language detection. We summarize the widely used datasets and the generalization performance of state-of-the-art methods in terms of robustness to novel unknown forgery as well as damaged quality forgery types. Finally, we discuss under-investigated open issues on face forgery detection towards generalization in six directions, including building a new generation of datasets, extracting strong forgery cues, considering identity features in face forgery detection, security and fairness of forgery detectors, the potential of large models in forgery detection and test-time adaptation. Our revisit of face forgery detection towards generalization will help promote the research and application of face forgery detection on real-world unconstrained conditions in the future.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107310"},"PeriodicalIF":6.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Graph Neural Network with Adaptive Relation Reconstruction
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-05 DOI: 10.1016/j.neunet.2025.107313
Weihong Lin , Zhaoliang Chen , Yuhong Chen , Shiping Wang
{"title":"Heterogeneous Graph Neural Network with Adaptive Relation Reconstruction","authors":"Weihong Lin ,&nbsp;Zhaoliang Chen ,&nbsp;Yuhong Chen ,&nbsp;Shiping Wang","doi":"10.1016/j.neunet.2025.107313","DOIUrl":"10.1016/j.neunet.2025.107313","url":null,"abstract":"<div><div>Topological structures of real-world graphs often exhibit heterogeneity involving diverse nodes and relation types. In recent years, heterogeneous graph learning methods utilizing meta-paths to capture composite relations and guide neighbor selection have garnered considerable attention. However, meta-path based approaches may establish connections between nodes of different categories while overlooking relations between nodes of the same category, decreasing the quality of node embeddings. In light of this, this paper proposes a Heterogeneous Graph Neural Network with Adaptive Relation Reconstruction (HGNN-AR<sup>2</sup>) that adaptively adjusts the relations to alleviate connection deficiencies and heteromorphic issues. HGNN-AR<sup>2</sup> is grounded on distinct connections derived from multiple meta-paths. By examining the homomorphic correlations of latent features from each meta-path, we reshape the cross-node connections to explore the pertinent latent relations. Through the relation reconstruction, we unveil unique connections reflected by each meta-path and incorporate them into graph convolutional networks for more comprehensive representations. The proposed model is evaluated on various benchmark heterogeneous graph datasets, demonstrating superior performance compared to state-of-the-art competitors.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107313"},"PeriodicalIF":6.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A general debiasing framework with counterfactual reasoning for multimodal public speaking anxiety detection
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-04 DOI: 10.1016/j.neunet.2025.107314
Tingting Zhang , Yangfu Zhu , Bin Wu , Chunping Zheng , Jiachen Tan , Zihua Xiong
{"title":"A general debiasing framework with counterfactual reasoning for multimodal public speaking anxiety detection","authors":"Tingting Zhang ,&nbsp;Yangfu Zhu ,&nbsp;Bin Wu ,&nbsp;Chunping Zheng ,&nbsp;Jiachen Tan ,&nbsp;Zihua Xiong","doi":"10.1016/j.neunet.2025.107314","DOIUrl":"10.1016/j.neunet.2025.107314","url":null,"abstract":"<div><div>Multimodal Public Speaking Anxiety Detection (MPSAD), which aims to identify the anxiety states of learners, has attracted widespread attention. Unfortunately, the current MPSAD task inevitably suffers from the impact of latent different types of multimodal hybrid biases, such as context bias, label bias and keyword bias. Models may rely on these biases as shortcuts, preventing them from fully utilizing all three modalities to learn multimodal knowledge. Existing methods primarily focus on addressing specific types of biases, but anticipating bias types when designing these methods is challenging, as we cannot foresee all possible biases. To tackle this issue, we propose a General Multimodal Counterfactual Reasoning debiasing framework (GMCR), which eliminates multimodal hybrid biases from a unified causal perspective. Specifically, this plug-and-play debiasing framework removes multimodal hybrid biases by disentangling causal and biased features and capturing adverse effects via a counterfactual branch. It then subtracts spurious correlations during inference for unbiased predictions. Due to the challenge of collecting speech video data, there are currently limited high-quality datasets available for the MPSAD task. To overcome this scarcity, we create a new large-scale fine-grained Multimodal English Public Speaking Anxiety (ME-PSA) dataset. Extensive experiments on our ME-PSA and two benchmarks demonstrate the superiority of our proposed framework, with improvements of over 2.00% in accuracy and 4.00% in F1 score compared to the vanilla SOTA baselines.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107314"},"PeriodicalIF":6.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143579134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hy-DeFake: Hypergraph neural networks for detecting fake news in online social networks
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-04 DOI: 10.1016/j.neunet.2025.107302
Xing Su, Jian Yang, Jia Wu, Zitai Qiu
{"title":"Hy-DeFake: Hypergraph neural networks for detecting fake news in online social networks","authors":"Xing Su,&nbsp;Jian Yang,&nbsp;Jia Wu,&nbsp;Zitai Qiu","doi":"10.1016/j.neunet.2025.107302","DOIUrl":"10.1016/j.neunet.2025.107302","url":null,"abstract":"<div><div>Nowadays social media is the primary platform for people to obtain news and share information. Combating online fake news has become an urgent task to reduce the damage it causes to society. Existing methods typically improve their fake news detection performances by utilizing textual auxiliary information (such as relevant retweets and comments) or simple structural information (<em>i.e.</em>, graph construction). However, these methods face two challenges. First, an increasing number of users tend to directly forward the source news without adding comments, resulting in a lack of textual auxiliary information. Second, simple graphs are unable to extract complex relations beyond pairwise association in a social context. Given that real-world social networks are intricate and involve high-order relations, we argue that exploring beyond pairwise relations between news and users is crucial for fake news detection. Therefore, we propose constructing an attributed hypergraph to represent non-textual and high-order relations for user participation in news spreading. We also introduce a hypergraph neural network-based method called Hy-DeFake to tackle the challenges. Our proposed method captures semantic information from news content, credibility information from involved users, and high-order correlations between news and users to learn distinctive embeddings for fake news detection. The superiority of Hy-DeFake is demonstrated through experiments conducted on four widely-used datasets, and it is compared against nine baselines using four evaluation metrics.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107302"},"PeriodicalIF":6.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual selective fusion transformer network for hyperspectral image classification
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-03 DOI: 10.1016/j.neunet.2025.107311
Yichu Xu , Di Wang , Lefei Zhang , Liangpei Zhang
{"title":"Dual selective fusion transformer network for hyperspectral image classification","authors":"Yichu Xu ,&nbsp;Di Wang ,&nbsp;Lefei Zhang ,&nbsp;Liangpei Zhang","doi":"10.1016/j.neunet.2025.107311","DOIUrl":"10.1016/j.neunet.2025.107311","url":null,"abstract":"<div><div>Transformer has achieved satisfactory results in the field of hyperspectral image (HSI) classification. However, existing Transformer models face two key challenges when dealing with HSI scenes characterized by diverse land cover types and rich spectral information: (1) A fixed receptive field overlooks the effective contextual scales required by various HSI objects; (2) invalid self-attention features in context fusion affect model performance. To address these limitations, we propose a novel Dual Selective Fusion Transformer Network (DSFormer) for HSI classification. DSFormer achieves joint spatial and spectral contextual modeling by flexibly selecting and fusing features across different receptive fields, effectively reducing unnecessary information interference by focusing on the most relevant spatial–spectral tokens. Specifically, we design a Kernel Selective Fusion Transformer Block (KSFTB) to learn an optimal receptive field by adaptively fusing spatial and spectral features across different scales, enhancing the model’s ability to accurately identify diverse HSI objects. Additionally, we introduce a Token Selective Fusion Transformer Block (TSFTB), which strategically selects and combines essential tokens during the spatial–spectral self-attention fusion process to capture the most crucial contexts. Extensive experiments conducted on four benchmark HSI datasets demonstrate that the proposed DSFormer significantly improves land cover classification accuracy, outperforming existing state-of-the-art methods. Specifically, DSFormer achieves overall accuracies of 96.59%, 97.66%, 95.17%, and 94.59% in the Pavia University, Houston, Indian Pines, and Whu-HongHu datasets, respectively, reflecting improvements of 3.19%, 1.14%, 0.91%, and 2.80% over the previous model. The code will be available online at <span><span>https://github.com/YichuXu/DSFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107311"},"PeriodicalIF":6.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143552996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ABVS breast tumour segmentation via integrating CNN with dilated sampling self-attention and feature interaction Transformer
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-03 DOI: 10.1016/j.neunet.2025.107312
Yiyao Liu , Jinyao Li , Yi Yang , Cheng Zhao , Yongtao Zhang , Peng Yang , Lei Dong , Xiaofei Deng , Ting Zhu , Tianfu Wang , Wei Jiang , Baiying Lei
{"title":"ABVS breast tumour segmentation via integrating CNN with dilated sampling self-attention and feature interaction Transformer","authors":"Yiyao Liu ,&nbsp;Jinyao Li ,&nbsp;Yi Yang ,&nbsp;Cheng Zhao ,&nbsp;Yongtao Zhang ,&nbsp;Peng Yang ,&nbsp;Lei Dong ,&nbsp;Xiaofei Deng ,&nbsp;Ting Zhu ,&nbsp;Tianfu Wang ,&nbsp;Wei Jiang ,&nbsp;Baiying Lei","doi":"10.1016/j.neunet.2025.107312","DOIUrl":"10.1016/j.neunet.2025.107312","url":null,"abstract":"<div><div>Given the rapid increase in breast cancer incidence, the Automated Breast Volume Scanner (ABVS) is developed to screen breast tumours efficiently and accurately. However, reviewing ABVS images is a challenging task owing to the significant variations in sizes and shapes of breast tumours. We propose a novel 3D segmentation network (i.e., DST-C) that combines a convolutional neural network (CNN) with a dilated sampling self-attention Transformer (DST). In our network, the global features extracted from the DST branch are guided by the detailed local information provided by the CNN branch, which adapts to the diversity of tumour size and morphology. For medical images, especially ABVS images, the scarcity of annotation leads to difficulty in model training. Therefore, a self-supervised learning method based on a dual-path approach for mask image modelling is introduced to generate valuable representations of images. In addition, a unique postprocessing method is proposed to reduce the false-positive rate and improve the sensitivity simultaneously. The experimental results demonstrate that our model has achieved promising 3D segmentation and detection performance using our in-house dataset. Our code is available at: <span><span>https://github.com/magnetliu/dstc-net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107312"},"PeriodicalIF":6.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143552991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DRTN: Dual Relation Transformer Network with feature erasure and contrastive learning for multi-label image classification
IF 6 1区 计算机科学
Neural Networks Pub Date : 2025-03-03 DOI: 10.1016/j.neunet.2025.107309
Wei Zhou, Kang Lin, Zhijie Zheng, Dihu Chen, Tao Su, Haifeng Hu
{"title":"DRTN: Dual Relation Transformer Network with feature erasure and contrastive learning for multi-label image classification","authors":"Wei Zhou,&nbsp;Kang Lin,&nbsp;Zhijie Zheng,&nbsp;Dihu Chen,&nbsp;Tao Su,&nbsp;Haifeng Hu","doi":"10.1016/j.neunet.2025.107309","DOIUrl":"10.1016/j.neunet.2025.107309","url":null,"abstract":"<div><div>The objective of multi-label image classification (MLIC) task is to simultaneously identify multiple objects present in an image. Several researchers directly flatten 2D feature maps into 1D grid feature sequences, and utilize Transformer encoder to capture the correlations of grid features to learn object relationships. Although obtaining promising results, these Transformer-based methods lose spatial information. In addition, current attention-based models often focus only on salient feature regions, but ignore other potential useful features that contribute to MLIC task. To tackle these problems, we present a novel <strong>D</strong>ual <strong>R</strong>elation <strong>T</strong>ransformer <strong>N</strong>etwork (<strong>DRTN</strong>) for MLIC task, which can be trained in an end-to-end manner. Concretely, to compensate for the loss of spatial information of grid features resulting from the flattening operation, we adopt a grid aggregation scheme to generate pseudo-region features, which does not need to make additional expensive annotations to train object detector. Then, a new dual relation enhancement (DRE) module is proposed to capture correlations between objects using two different visual features, thereby complementing the advantages provided by both grid and pseudo-region features. After that, we design a new feature enhancement and erasure (FEE) module to learn discriminative features and mine additional potential valuable features. By using attention mechanism to discover the most salient feature regions and removing them with region-level erasure strategy, our FEE module is able to mine other potential useful features from the remaining parts. Further, we devise a novel contrastive learning (CL) module to encourage the foregrounds of salient and potential features to be closer, while pushing their foregrounds further away from background features. This manner compels our model to learn discriminative and valuable features more comprehensively. Extensive experiments demonstrate that DRTN method surpasses current MLIC models on three challenging benchmarks, <em>i.e.</em>, MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE datasets.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"187 ","pages":"Article 107309"},"PeriodicalIF":6.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143552993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信