Yan Jiang, Yuyan Ding, Xinglong Zhang, Xin Xu, Junwen Huang
{"title":"A self-learning human-machine cooperative control method based on driver intention recognition","authors":"Yan Jiang, Yuyan Ding, Xinglong Zhang, Xin Xu, Junwen Huang","doi":"10.1049/cit2.12313","DOIUrl":"10.1049/cit2.12313","url":null,"abstract":"<p>Human-machine cooperative control has become an important area of intelligent driving, where driver intention recognition and dynamic control authority allocation are key factors for improving the performance of cooperative decision-making and control. In this paper, an online learning method is proposed for human-machine cooperative control, which introduces a priority control parameter in the reward function to achieve optimal allocation of control authority under different driver intentions and driving safety conditions. Firstly, a two-layer LSTM-based sequence prediction algorithm is proposed to recognise the driver's lane change (LC) intention for human-machine cooperative steering control. Secondly, an online reinforcement learning method is developed for optimising the steering authority to reduce driver workload and improve driving safety. The driver-in-the-loop simulation results show that our method can accurately predict the driver's LC intention in cooperative driving and effectively compensate for the driver's non-optimal driving actions. The experimental results on a real intelligent vehicle further demonstrate the online optimisation capability of the proposed RL-based control authority allocation algorithm and its effectiveness in improving driving safety.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1101-1115"},"PeriodicalIF":8.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12313","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140773169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis","authors":"Vinura Dhananjaya, Surangika Ranathunga, Sanath Jayasena","doi":"10.1049/cit2.12333","DOIUrl":"10.1049/cit2.12333","url":null,"abstract":"<p>Pre-trained multilingual language models (PMLMs) such as mBERT and XLM-R have shown good cross-lingual transferability. However, they are not specifically trained to capture cross-lingual signals concerning sentiment words. This poses a disadvantage for low-resource languages (LRLs) that are under-represented in these models. To better fine-tune these models for sentiment classification in LRLs, a novel intermediate task fine-tuning (ITFT) technique based on a sentiment lexicon of a high-resource language (HRL) is introduced. The authors experiment with LRLs Sinhala, Tamil and Bengali for a 3-class sentiment classification task and show that this method outperforms vanilla fine-tuning of the PMLM. It also outperforms or is on-par with basic ITFT that relies on an HRL sentiment classification dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1116-1125"},"PeriodicalIF":8.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12333","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140776263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linhui Dai, Hong Liu, Pinhao Song, Hao Tang, Runwei Ding, Shengquan Li
{"title":"Edge-guided representation learning for underwater object detection","authors":"Linhui Dai, Hong Liu, Pinhao Song, Hao Tang, Runwei Ding, Shengquan Li","doi":"10.1049/cit2.12325","DOIUrl":"https://doi.org/10.1049/cit2.12325","url":null,"abstract":"<p>Underwater object detection (UOD) is crucial for marine economic development, environmental protection, and the planet's sustainable development. The main challenges of this task arise from low-contrast, small objects, and mimicry of aquatic organisms. The key to addressing these challenges is to focus the model on obtaining more discriminative information. The authors observe that the edges of underwater objects are highly unique and can be distinguished from low-contrast or mimicry environments based on their edges. Motivated by this observation, an Edge-guided Representation Learning Network, termed ERL-Net is proposed, that aims to achieve discriminative representation learning and aggregation under the guidance of edge cues. Firstly, an edge-guided attention module is introduced to model the explicit boundary information, which generates more discriminative features. Secondly, a hierarchical feature aggregation module is proposed to aggregate the multi-scale discriminative features by regrouping them into three levels, effectively aggregating global and local information for locating and recognising underwater objects. Finally, a wide and asymmetric receptive field block is proposed to enable features to have a wider receptive field, allowing the model to focus on smaller object information. Comprehensive experiments on three challenging underwater datasets show that our method achieves superior performance on the UOD task.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1078-1091"},"PeriodicalIF":8.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12325","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142560452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuqi Wang, Dawei Dai, Da Liu, Shuyin Xia, Guoyin Wang
{"title":"BTSC: Binary tree structure convolution layers for building interpretable decision-making deep CNN","authors":"Yuqi Wang, Dawei Dai, Da Liu, Shuyin Xia, Guoyin Wang","doi":"10.1049/cit2.12328","DOIUrl":"https://doi.org/10.1049/cit2.12328","url":null,"abstract":"<p>Although deep convolution neural network (DCNN) has achieved great success in computer vision field, such models are considered to lack interpretability in decision-making. One of fundamental issues is that its decision mechanism is considered to be a “black-box” operation. The authors design the binary tree structure convolution (BTSC) module and control the activation level of particular neurons to build the interpretable DCNN model. First, the authors design a BTSC module, in which each parent node generates two independent child layers, and then integrate them into a normal DCNN model. The main advantages of the BTSC are as follows: 1) child nodes of the different parent nodes do not interfere with each other; 2) parent and child nodes can inherit knowledge. Second, considering the activation level of neurons, the authors design an information coding objective to guide neural nodes to learn the particular information coding that is expected. Through the experiments, the authors can verify that: 1) the decision-making made by both the ResNet and DenseNet models can be explained well based on the \"decision information flow path\" (known as <b>the decision-path</b>) formed in the BTSC module; 2) <b>the decision-path</b> can reasonably interpret the decision reversal mechanism (Robustness mechanism) of the DCNN model; 3) the credibility of decision-making can be measured by the matching degree between the actual and expected <b>decision-path</b>.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1331-1345"},"PeriodicalIF":8.4,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UKF-MOT: An unscented Kalman filter-based 3D multi-object tracker","authors":"Meng Liu, Jianwei Niu, Yu Liu","doi":"10.1049/cit2.12315","DOIUrl":"10.1049/cit2.12315","url":null,"abstract":"<p>Multi-object tracking in autonomous driving is a non-linear problem. To better address the tracking problem, this paper leveraged an unscented Kalman filter to predict the object's state. In the association stage, the Mahalanobis distance was employed as an affinity metric, and a Non-minimum Suppression method was designed for matching. With the detections fed into the tracker and continuous ‘predicting-matching’ steps, the states of each object at different time steps were described as their own continuous trajectories. We conducted extensive experiments to evaluate tracking accuracy on three challenging datasets (KITTI, nuScenes and Waymo). The experimental results demonstrated that our method effectively achieved multi-object tracking with satisfactory accuracy and real-time efficiency.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"1031-1041"},"PeriodicalIF":8.4,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140368908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer force perception skills to robot-assisted laminectomy via imitation learning from human demonstrations","authors":"Meng Li, Xiaozhi Qi, Xiaoguang Han, Ying Hu, Bing Li, Yu Zhao, Jianwei Zhang","doi":"10.1049/cit2.12331","DOIUrl":"10.1049/cit2.12331","url":null,"abstract":"<p>A comparative study of two force perception skill learning approaches for robot-assisted spinal surgery, the impedance model method and the imitation learning (IL) method, is presented. The impedance model method develops separate models for the surgeon and patient, incorporating spring-damper and bone-grinding models. Expert surgeons' feature parameters are collected and mapped using support vector regression and image navigation techniques. The imitation learning approach utilises long short-term memory networks (LSTM) and addresses accurate data labelling challenges with custom models. Experimental results demonstrate skill recognition rates of 63.61%–74.62% for the impedance model approach, relying on manual feature extraction. Conversely, the imitation learning approach achieves a force perception recognition rate of 91.06%, outperforming the impedance model on curved bone surfaces. The findings demonstrate the potential of imitation learning to enhance skill acquisition in robot-assisted spinal surgery by eliminating the laborious process of manual feature extraction.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"903-916"},"PeriodicalIF":8.4,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12331","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140365841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junjian Zhang, Hao Tan, Le Wang, Yaguan Qian, Zhaoquan Gu
{"title":"Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems","authors":"Junjian Zhang, Hao Tan, Le Wang, Yaguan Qian, Zhaoquan Gu","doi":"10.1049/cit2.12295","DOIUrl":"10.1049/cit2.12295","url":null,"abstract":"<p>Adversarial attacks have been posing significant security concerns to intelligent systems, such as speaker recognition systems (SRSs). Most attacks assume the neural networks in the systems are known beforehand, while black-box attacks are proposed without such information to meet practical situations. Existing black-box attacks improve transferability by integrating multiple models or training on multiple datasets, but these methods are costly. Motivated by the optimisation strategy with spatial information on the perturbed paths and samples, we propose a Dual Spatial Momentum Iterative Fast Gradient Sign Method (DS-MI-FGSM) to improve the transferability of black-box attacks against SRSs. Specifically, DS-MI-FGSM only needs a single data and one model as the input; by extending to the data and model neighbouring spaces, it generates adversarial examples against the integrating models. To reduce the risk of overfitting, DS-MI-FGSM also introduces gradient masking to improve transferability. The authors conduct extensive experiments regarding the speaker recognition task, and the results demonstrate the effectiveness of their method, which can achieve up to 92% attack success rate on the victim model in black-box scenarios with only one known model.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"620-631"},"PeriodicalIF":5.1,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140367055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mutual information oriented deep skill chaining for multi-agent reinforcement learning","authors":"Zaipeng Xie, Cheng Ji, Chentai Qiao, WenZhan Song, Zewen Li, Yufeng Zhang, Yujing Zhang","doi":"10.1049/cit2.12322","DOIUrl":"10.1049/cit2.12322","url":null,"abstract":"<p>Multi-agent reinforcement learning relies on reward signals to guide the policy networks of individual agents. However, in high-dimensional continuous spaces, the non-stationary environment can provide outdated experiences that hinder convergence, resulting in ineffective training performance for multi-agent systems. To tackle this issue, a novel reinforcement learning scheme, Mutual Information Oriented Deep Skill Chaining (MioDSC), is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency. These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state. In addition, MioDSC can generate cooperative policies using the options framework, allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi-agent learning. MioDSC was evaluated in the multi-agent particle environment and the StarCraft multi-agent challenge at varying difficulty levels. The experimental results demonstrate that MioDSC outperforms state-of-the-art methods and is robust across various multi-agent system tasks with high stability.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"1014-1030"},"PeriodicalIF":8.4,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140370730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Xie, Lianghao Jin, Shiqi Hua, Hao Sun, Bo Sun, Zhigang Tu, Jun Liu
{"title":"UDT: U-shaped deformable transformer for subarachnoid haemorrhage image segmentation","authors":"Wei Xie, Lianghao Jin, Shiqi Hua, Hao Sun, Bo Sun, Zhigang Tu, Jun Liu","doi":"10.1049/cit2.12302","DOIUrl":"https://doi.org/10.1049/cit2.12302","url":null,"abstract":"<p>Subarachnoid haemorrhage (SAH), mostly caused by the rupture of intracranial aneurysm, is a common disease with a high fatality rate. SAH lesions are generally diffusely distributed, showing a variety of scales with irregular edges. The complex characteristics of lesions make SAH segmentation a challenging task. To cope with these difficulties, a u-shaped deformable transformer (UDT) is proposed for SAH segmentation. Specifically, first, a multi-scale deformable attention (MSDA) module is exploited to model the diffuseness and scale-variant characteristics of SAH lesions, where the MSDA module can fuse features in different scales and adjust the attention field of each element dynamically to generate discriminative multi-scale features. Second, the cross deformable attention-based skip connection (CDASC) module is designed to model the irregular edge characteristic of SAH lesions, where the CDASC module can utilise the spatial details from encoder features to refine the spatial information of decoder features. Third, the MSDA and CDASC modules are embedded into the backbone Res-UNet to construct the proposed UDT. Extensive experiments are conducted on the self-built SAH-CT dataset and two public medical datasets (GlaS and MoNuSeg). Experimental results show that the presented UDT achieves the state-of-the-art performance.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"756-768"},"PeriodicalIF":5.1,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12302","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141430328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maksym Manko, Anton Popov, Juan Manuel Gorriz, Javier Ramirez
{"title":"Improved organs at risk segmentation based on modified U-Net with self-attention and consistency regularisation","authors":"Maksym Manko, Anton Popov, Juan Manuel Gorriz, Javier Ramirez","doi":"10.1049/cit2.12303","DOIUrl":"10.1049/cit2.12303","url":null,"abstract":"<p>Cancer is one of the leading causes of death in the world, with radiotherapy as one of the treatment options. Radiotherapy planning starts with delineating the affected area from healthy organs, called organs at risk (OAR). A new approach to automatic OAR segmentation in the chest cavity in Computed Tomography (CT) images is presented. The proposed approach is based on the modified U-Net architecture with the ResNet-34 encoder, which is the baseline adopted in this work. The new two-branch CS-SA U-Net architecture is proposed, which consists of two parallel U-Net models in which self-attention blocks with cosine similarity as query-key similarity function (CS-SA) blocks are inserted between the encoder and decoder, which enabled the use of consistency regularisation. The proposed solution demonstrates state-of-the-art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient (oesophagus—0.8714, heart—0.9516, trachea—0.9286, aorta—0.9510) and Hausdorff distance (oesophagus—0.2541, heart—0.1514, trachea—0.1722, aorta—0.1114) and significantly outperforms the baseline. The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"850-865"},"PeriodicalIF":8.4,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12303","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140383790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}