{"title":"Laplacian attention: A plug-and-play algorithm without increasing model complexity for vision tasks","authors":"Xiaolei Chen, Yubing Lu, Runyu Wen","doi":"10.1049/cit2.12402","DOIUrl":"https://doi.org/10.1049/cit2.12402","url":null,"abstract":"<p>Most prevailing attention mechanism modules in contemporary research are convolution-based modules, and while these modules contribute to enhancing the accuracy of deep learning networks in visual tasks, they concurrently augment the overall model complexity. To address the problem, this paper proposes a plug-and-play algorithm that does not increase the complexity of the model, Laplacian attention (LA). The LA algorithm first calculates the similarity distance between feature points in the feature space and feature channel and constructs the residual Laplacian matrix between feature points through the similarity distance and Gaussian kernel. This construction serves to segregate non-similar feature points while aggregating those with similarities. Ultimately, the LA algorithm allocates the outputs of the feature channel and the feature space adaptively to derive the final LA outputs. Crucially, the LA algorithm is confined to the forward computation process and does not involve backpropagation or any parameter learning. The LA algorithm undergoes comprehensive experimentation on three distinct datasets—namely Cifar-10, miniImageNet, and Pascal VOC 2012. The experimental results demonstrate that, compared with the advanced attention mechanism modules in recent years, such as SENet, CBAM, ECANet, coordinate attention, and triplet attention, the LA algorithm exhibits superior performance across image classification, object detection and semantic segmentation tasks.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"545-556"},"PeriodicalIF":8.4,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12402","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KitWaSor: Pioneering pre-trained model for kitchen waste sorting with an innovative million-level benchmark dataset","authors":"Leyuan Fang, Shuaiyu Ding, Hao Feng, Junwu Yu, Lin Tang, Pedram Ghamisi","doi":"10.1049/cit2.12399","DOIUrl":"https://doi.org/10.1049/cit2.12399","url":null,"abstract":"<p>Intelligent sorting is an important prerequisite for the full quantitative consumption and harmless disposal of kitchen waste. The existing object detection method based on an ImageNet pre-trained model is an effective way of sorting. Owing to significant domain gaps between natural images and kitchen waste images, it is difficult to reflect the characteristics of diverse scales and dense distribution in kitchen waste based on an ImageNet pre-trained model, leading to poor generalisation. In this article, the authors propose the first pre-trained model for kitchen waste sorting called KitWaSor, which combines both contrastive learning (CL) and masked image modelling (MIM) through self-supervised learning (SSL). First, to address the issue of diverse scales, the authors propose a mixed masking strategy by introducing an incomplete masking branch based on the original random masking branch. It prevents the complete loss of small-scale objects while avoiding excessive leakage of large-scale object pixels. Second, to address the issue of dense distribution, the authors introduce semantic consistency constraints on the basis of the mixed masking strategy. That is, object semantic reasoning is performed through semantic consistency constraints to compensate for the lack of contextual information. To train KitWaSor, the authors construct the first million-level kitchen waste dataset across seasonal and regional distributions, named KWD-Million. Extensive experiments show that KitWaSor achieves state-of-the-art (SOTA) performance on the two most relevant downstream tasks for kitchen waste sorting (i.e. image classification and object detection), demonstrating the effectiveness of the proposed KitWaSor.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"94-114"},"PeriodicalIF":8.4,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12399","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143536116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature pyramid attention network for audio-visual scene classification","authors":"Liguang Zhou, Yuhongze Zhou, Xiaonan Qi, Junjie Hu, Tin Lun Lam, Yangsheng Xu","doi":"10.1049/cit2.12375","DOIUrl":"https://doi.org/10.1049/cit2.12375","url":null,"abstract":"<p>Audio-visual scene classification (AVSC) poses a formidable challenge owing to the intricate spatial-temporal relationships exhibited by audio-visual signals, coupled with the complex spatial patterns of objects and textures found in visual images. The focus of recent studies has predominantly revolved around extracting features from diverse neural network structures, inadvertently neglecting the acquisition of semantically meaningful regions and crucial components within audio-visual data. The authors present a feature pyramid attention network (FPANet) for audio-visual scene understanding, which extracts semantically significant characteristics from audio-visual data. The authors’ approach builds multi-scale hierarchical features of sound spectrograms and visual images using a feature pyramid representation and localises the semantically relevant regions with a feature pyramid attention module (FPAM). A dimension alignment (DA) strategy is employed to align feature maps from multiple layers, a pyramid spatial attention (PSA) to spatially locate essential regions, and a pyramid channel attention (PCA) to pinpoint significant temporal frames. Experiments on visual scene classification (VSC), audio scene classification (ASC), and AVSC tasks demonstrate that FPANet achieves performance on par with state-of-the-art (SOTA) approaches, with a 95.9 F1-score on the ADVANCE dataset and a relative improvement of 28.8%. Visualisation results show that FPANet can prioritise semantically meaningful areas in audio-visual signals.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"359-374"},"PeriodicalIF":8.4,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12375","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to ‘Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification’","authors":"","doi":"10.1049/cit2.12392","DOIUrl":"https://doi.org/10.1049/cit2.12392","url":null,"abstract":"<p>Yong Li, Shuhang Wang, Shijie Xu, and Jiao Yin. 2024. Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification. CAAI Transactions on Intelligence Technology 9, 3 (June 2024), 544–556. https://doi.org/10.1049/cit2.12301.</p><p>In the section discussing the spatial distribution of fraud and normal merchants' shipping addresses, the following text needs correction:</p><p>Please replace Figure 1 and 2 with the following text ‘According to the data analysis results, the spatial distribution of fraud merchants' shipping addresses is characterised by sparsity (because fraud merchants ship on behalf of others, resulting in a large number of shipping addresses with few shipments per address), while the distribution of normal merchants' shipping addresses is characterised by density (as normal merchants typically ship from centralised warehouses, resulting in a small number of shipping addresses with a large number of shipments per address). These differences in shipping behaviour can provide significant assistance in detecting fraud merchants.’</p><p>We apologise for this error.</p><p>Please note that due to the deletion of two images, the order of subsequent images has been adjusted accordingly.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"634"},"PeriodicalIF":8.4,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12392","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanlin Yang, Zhonglin Ye, Lei Meng, Mingyuan Li, Haixing Zhao
{"title":"Graph neural link predictor based on cycle structure","authors":"Yanlin Yang, Zhonglin Ye, Lei Meng, Mingyuan Li, Haixing Zhao","doi":"10.1049/cit2.12396","DOIUrl":"https://doi.org/10.1049/cit2.12396","url":null,"abstract":"<p>Currently, the link prediction algorithms primarily focus on studying the interaction between nodes based on chain structure and star structure, which predominantly rely on low-order structural information and do not explore the multivariate interactions between nodes from the perspective of higher-order structural information present in the network. The cycle structure is a higher-order structure that lies between the star and clique structures, where all nodes within the same cycle can interact with each other, even in the absence of direct edges. If a node is encompassed by multiple cycles, it indicates that the node interacts and associates with a greater number of nodes in the network, and it means the node is more important in the network to some extent. Furthermore, if two nodes are included in multiple cycles, it signifies the two nodes are more likely to be connected. Therefore, firstly, a multi-information fusion node importance algorithm based on the cycle structure information is proposed, which integrates both high-order and low-order structural information. Secondly, the obtained integrated structure information and node feature information is regarded as the input features, a two-channel graph neural network model is designed to learn the cycle structure information. Then, the cycle structure information is utilised for the task of link prediction, and a graph neural link predictor with multi-information interactions based on the cycle structure is developed. Finally, extensive experimental validation and analysis show that the node ranking result of the proposed node importance index is more consistent with the actual situation, the proposed graph neural network model can effectively learn the cycle structure information, and using higher-order structural information—cycle information proves to significantly enhance the overall link prediction performance.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"615-632"},"PeriodicalIF":8.4,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12396","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohanmuralidhar Prajwala, Prabhuswamy Prajwal Kumar, Shanubhog Maheshwarappa Gopinath, Shivakumara Palaiahnakote, Mahadevappa Basavanna, Daniel P. Lopresti
{"title":"Domain-independent adaptive histogram-based features for pomegranate fruit and leaf diseases classification","authors":"Mohanmuralidhar Prajwala, Prabhuswamy Prajwal Kumar, Shanubhog Maheshwarappa Gopinath, Shivakumara Palaiahnakote, Mahadevappa Basavanna, Daniel P. Lopresti","doi":"10.1049/cit2.12390","DOIUrl":"https://doi.org/10.1049/cit2.12390","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <p>Disease identification for fruits and leaves in the field of agriculture is important for estimating production, crop yield, and earnings for farmers. In the specific case of pomegranates, this is challenging because of the wide range of possible diseases and their effects on the plant and the crop. This study presents an adaptive histogram-based method for solving this problem. Our method describe is domain independent in the sense that it can be easily and efficiently adapted to other similar smart agriculture tasks. The approach explores colour spaces, namely, Red, Green, and Blue along with Grey. The histograms of colour spaces and grey space are analysed based on the notion that as the disease changes, the colour also changes. The proximity between the histograms of grey images with individual colour spaces is estimated to find the closeness of images. Since the grey image is the average of colour spaces (R, G, and B), it can be considered a reference image. For estimating the distance between grey and colour spaces, the proposed approach uses a Chi-Square distance measure. Further, the method uses an Artificial Neural Network for classification. The effectiveness of our approach is demonstrated by testing on a dataset of fruit and leaf images affected by different diseases. The results show that the method outperforms existing techniques in terms of average classification rate.</p>\u0000 </section>\u0000 </div>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"317-336"},"PeriodicalIF":8.4,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12390","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-station multi-robot task assignment method based on deep reinforcement learning","authors":"Junnan Zhang, Ke Wang, Chaoxu Mu","doi":"10.1049/cit2.12394","DOIUrl":"https://doi.org/10.1049/cit2.12394","url":null,"abstract":"<p>This paper focuses on the problem of multi-station multi-robot spot welding task assignment, and proposes a deep reinforcement learning (DRL) framework, which is made up of a public graph attention network and independent policy networks. The graph of welding spots distribution is encoded using the graph attention network. Independent policy networks with attention mechanism as a decoder can handle the encoded graph and decide to assign robots to different tasks. The policy network is used to convert the large scale welding spots allocation problem to multiple small scale single-robot welding path planning problems, and the path planning problem is quickly solved through existing methods. Then, the model is trained through reinforcement learning. In addition, the task balancing method is used to allocate tasks to multiple stations. The proposed algorithm is compared with classical algorithms, and the results show that the algorithm based on DRL can produce higher quality solutions.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 1","pages":"134-146"},"PeriodicalIF":8.4,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12394","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143533414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Zhou, Zhengzhi Zhu, Hongbo Gao, Chunyu Wang, Muhammad Attique Khan, Mati Ullah, Siffat Ullah Khan
{"title":"Multi-omics graph convolutional networks for digestive system tumour classification and early-late stage diagnosis","authors":"Lin Zhou, Zhengzhi Zhu, Hongbo Gao, Chunyu Wang, Muhammad Attique Khan, Mati Ullah, Siffat Ullah Khan","doi":"10.1049/cit2.12395","DOIUrl":"https://doi.org/10.1049/cit2.12395","url":null,"abstract":"<p>The prevalence of digestive system tumours (DST) poses a significant challenge in the global crusade against cancer. These neoplasms constitute 20% of all documented cancer diagnoses and contribute to 22.5% of cancer-related fatalities. The accurate diagnosis of DST is paramount for vigilant patient monitoring and the judicious selection of optimal treatments. Addressing this challenge, the authors introduce a novel methodology, denominated as the Multi-omics Graph Transformer Convolutional Network (MGTCN). This innovative approach aims to discern various DST tumour types and proficiently discern between early-late stage tumours, ensuring a high degree of accuracy. The MGTCN model incorporates the Graph Transformer Layer framework to meticulously transform the multi-omics adjacency matrix, thereby illuminating potential associations among diverse samples. A rigorous experimental evaluation was undertaken on the DST dataset from The Cancer Genome Atlas to scrutinise the efficacy of the MGTCN model. The outcomes unequivocally underscore the efficiency and precision of MGTCN in diagnosing diverse DST tumour types and successfully discriminating between early-late stage DST cases. The source code for this groundbreaking study is readily accessible for download at https://github.com/bigone1/MGTCN.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 6","pages":"1572-1586"},"PeriodicalIF":8.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12395","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143247961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinge Shi, Yi Chen, Ali Asghar Heidari, Zhennao Cai, Huiling Chen, Guoxi Liang
{"title":"Topological search and gradient descent boosted Runge–Kutta optimiser with application to engineering design and feature selection","authors":"Jinge Shi, Yi Chen, Ali Asghar Heidari, Zhennao Cai, Huiling Chen, Guoxi Liang","doi":"10.1049/cit2.12387","DOIUrl":"https://doi.org/10.1049/cit2.12387","url":null,"abstract":"<p>The Runge–Kutta optimiser (RUN) algorithm, renowned for its powerful optimisation capabilities, faces challenges in dealing with increasing complexity in real-world problems. Specifically, it shows deficiencies in terms of limited local exploration capabilities and less precise solutions. Therefore, this research aims to integrate the topological search (TS) mechanism with the gradient search rule (GSR) into the framework of RUN, introducing an enhanced algorithm called TGRUN to improve the performance of the original algorithm. The TS mechanism employs a circular topological scheme to conduct a thorough exploration of solution regions surrounding each solution, enabling a careful examination of valuable solution areas and enhancing the algorithm’s effectiveness in local exploration. To prevent the algorithm from becoming trapped in local optima, the GSR also integrates gradient descent principles to direct the algorithm in a wider investigation of the global solution space. This study conducted a serious of experiments on the IEEE CEC2017 comprehensive benchmark function to assess the enhanced effectiveness of TGRUN. Additionally, the evaluation includes real-world engineering design and feature selection problems serving as an additional test for assessing the optimisation capabilities of the algorithm. The validation outcomes indicate a significant improvement in the optimisation capabilities and solution accuracy of TGRUN.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"557-614"},"PeriodicalIF":8.4,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Zhao, Weizhi Nie, Jie Nie, Yuyi Zhang, Bo Wang
{"title":"RJAN: Region-based joint attention network for 3D shape recognition","authors":"Yue Zhao, Weizhi Nie, Jie Nie, Yuyi Zhang, Bo Wang","doi":"10.1049/cit2.12388","DOIUrl":"https://doi.org/10.1049/cit2.12388","url":null,"abstract":"<div>\u0000 \u0000 \u0000 <section>\u0000 \u0000 <p>As an essential field of multimedia and computer vision, 3D shape recognition has attracted much research attention in recent years. Multiview-based approaches have demonstrated their superiority in generating effective 3D shape representations. Typical methods usually extract the multiview global features and aggregate them together to generate 3D shape descriptors. However, there exist two disadvantages: First, the mainstream methods ignore the comprehensive exploration of local information in each view. Second, many approaches roughly aggregate multiview features by adding or concatenating them together. The information loss for some discriminative characteristics limits the representation effectiveness. To address these problems, a novel architecture named region-based joint attention network (RJAN) was proposed. Specifically, the authors first design a hierarchical local information exploration module for view descriptor extraction. The region-to-region and channel-to-channel relationships from different granularities can be comprehensively explored and utilised to provide more discriminative characteristics for view feature learning. Subsequently, a novel relation-aware view aggregation module is designed to aggregate the multiview features for shape descriptor generation, considering the view-to-view relationships. Extensive experiments were conducted on three public databases: ModelNet40, ModelNet10, and ShapeNetCore55. RJAN achieves state-of-the-art performance in the tasks of 3D shape classification and 3D shape retrieval, which demonstrates the effectiveness of RJAN. The code has been released on https://github.com/slurrpp/RJAN.</p>\u0000 </section>\u0000 </div>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 2","pages":"460-473"},"PeriodicalIF":8.4,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}