{"title":"Network intrusion detection leveraging multimodal features","authors":"Aklil Kiflay, Athanasios Tsokanos, Mahmood Fazlali, Raimund Kirner","doi":"10.1016/j.array.2024.100349","DOIUrl":"10.1016/j.array.2024.100349","url":null,"abstract":"<div><p>Network Intrusion Detection Systems (NIDSes) are essential for safeguarding critical information systems. However, the lack of adaptability of Machine Learning (ML) based NIDSes to different environments could cause slow adoption. In this paper, we propose a multimodal NIDS that combines flow and payload features to detect cyber-attacks. The focus of the paper is to evaluate the use of multimodal traffic features in detecting attacks, but not on a practical online implementation. In the multimodal NIDS, two random forest models are trained to classify network traffic using selected flow-based features and the first few bytes of protocol payload, respectively. Predictions from the two models are combined using a soft voting approach to get the final traffic classification results. We evaluate the multimodal NIDS using flow-based features and the corresponding payloads extracted from Packet Capture (PCAP) files of a publicly available UNSW-NB15 dataset. The experimental results show that the proposed multimodal NIDS can detect most attacks with average Accuracy, Recall, Precision and F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> scores ranging from 98% to 99% using only six flow-based traffic features, and the first 32 bytes of protocol payload. The proposed multimodal NIDS provides a reliable approach to detecting cyber-attacks in different environments.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100349"},"PeriodicalIF":0.0,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000158/pdfft?md5=571a5eb4d14694ec615bacb4ecbc6a5f&pid=1-s2.0-S2590005624000158-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141029468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArrayPub Date : 2024-05-05DOI: 10.1016/j.array.2024.100348
Jiang Huang, Xianglin Huang, Lifang Yang, Zhulin Tao
{"title":"D2MNet for music generation joint driven by facial expressions and dance movements","authors":"Jiang Huang, Xianglin Huang, Lifang Yang, Zhulin Tao","doi":"10.1016/j.array.2024.100348","DOIUrl":"https://doi.org/10.1016/j.array.2024.100348","url":null,"abstract":"<div><p>In general, dance is always associated with music to improve stage performance effect. As we know, artificial music arrangement consumes a lot of time and manpower. While automatic music arrangement based on input dance video perfectly solves this problem. In the cross-modal music generation task, we take advantage of the complementary information between two input modalities of facial expressions and dance movements. Then we present Dance2MusicNet (D2MNet), an autoregressive generation model based on dilated convolution, which adopts two feature vectors, dance style and beats, as control signals to generate real and diverse music that matches dance video. Finally, a comprehensive evaluation method for qualitative and quantitative experiment is proposed. Compared to baseline methods, D2MNet outperforms better in all evaluating metrics, which clearly demonstrates the effectiveness of our framework.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100348"},"PeriodicalIF":0.0,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000146/pdfft?md5=57bcf00a132e600642ca5c16a65b9121&pid=1-s2.0-S2590005624000146-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140893510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArrayPub Date : 2024-04-29DOI: 10.1016/j.array.2024.100347
Jianlin Zhang , Chen Hou , Xu Yang , Xuechao Yang , Wencheng Yang , Hui Cui
{"title":"Advancing face detection efficiency: Utilizing classification networks for lowering false positive incidences","authors":"Jianlin Zhang , Chen Hou , Xu Yang , Xuechao Yang , Wencheng Yang , Hui Cui","doi":"10.1016/j.array.2024.100347","DOIUrl":"https://doi.org/10.1016/j.array.2024.100347","url":null,"abstract":"<div><p>The advancement of convolutional neural networks (CNNs) has markedly progressed in the field of face detection, significantly enhancing accuracy and recall metrics. Precision and recall remain pivotal for evaluating CNN-based detection models; however, there is a prevalent inclination to focus on improving true positive rates at the expense of addressing false positives. A critical issue contributing to this discrepancy is the lack of pseudo-face images within training and evaluation datasets. This deficiency impairs the regression capabilities of detection models, leading to numerous erroneous detections and inadequate localization. To address this gap, we introduce the WIDERFACE dataset, enriched with a considerable number of pseudo-face images created by amalgamating human and animal facial features. This dataset aims to bolster the detection of false positives during training phases. Furthermore, we propose a new face detection architecture that incorporates a classification model into the conventional face detection model to diminish the false positive rate and augment detection precision. Our comparative analysis on the WIDERFACE and other renowned datasets reveals that our architecture secures a lower false positive rate while preserving the true positive rate in comparison to existing top-tier face detection models.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100347"},"PeriodicalIF":0.0,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000134/pdfft?md5=be911996c21c7c166881a8828f984b70&pid=1-s2.0-S2590005624000134-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140825066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArrayPub Date : 2024-04-26DOI: 10.1016/j.array.2024.100345
Al Amin Biswas
{"title":"A comprehensive review of explainable AI for disease diagnosis","authors":"Al Amin Biswas","doi":"10.1016/j.array.2024.100345","DOIUrl":"https://doi.org/10.1016/j.array.2024.100345","url":null,"abstract":"<div><p>Nowadays, artificial intelligence (AI) has been utilized in several domains of the healthcare sector. Despite its effectiveness in healthcare settings, its massive adoption remains limited due to the transparency issue, which is considered a significant obstacle. To achieve the trust of end users, it is necessary to explain the AI models' output. Therefore, explainable AI (XAI) has become apparent as a potential solution by providing transparent explanations of the AI models' output. In this review paper, the primary aim is to review articles that are mainly related to machine learning (ML) or deep learning (DL) based human disease diagnoses, and the model's decision-making process is explained by XAI techniques. To do that, two journal databases (Scopus and the IEEE Xplore Digital Library) were thoroughly searched using a few predetermined relevant keywords. The PRISMA guidelines have been followed to determine the papers for the final analysis, where studies that did not meet the requirements were eliminated. Finally, 90 Q1 journal articles are selected for in-depth analysis, covering several XAI techniques. Then, the summarization of the several findings has been presented, and appropriate responses to the proposed research questions have been outlined. In addition, several challenges related to XAI in the case of human disease diagnosis and future research directions in this sector are presented.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100345"},"PeriodicalIF":0.0,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000110/pdfft?md5=e1abc0e28d1ca274ca3562e4e862960b&pid=1-s2.0-S2590005624000110-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140816446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArrayPub Date : 2024-04-23DOI: 10.1016/j.array.2024.100346
Salman Fazle Rabby , Muhammad Abdullah Arafat , Taufiq Hasan
{"title":"BT-Net: An end-to-end multi-task architecture for brain tumor classification, segmentation, and localization from MRI images","authors":"Salman Fazle Rabby , Muhammad Abdullah Arafat , Taufiq Hasan","doi":"10.1016/j.array.2024.100346","DOIUrl":"10.1016/j.array.2024.100346","url":null,"abstract":"<div><p>Brain tumors are severe medical conditions that can prove fatal if not detected and treated early. Radiologists often use MRI and CT scan imaging to diagnose brain tumors early. However, a shortage of skilled radiologists to analyze medical images can be problematic in low-resource healthcare settings. To overcome this issue, deep learning-based automatic analysis of medical images can be an effective tool for assistive diagnosis. Conventional methods generally focus on developing specialized algorithms to address a single aspect, such as segmentation, classification, or localization of brain tumors. In this work, a novel multi-task network was proposed, modified from the conventional VGG16, along with a U-Net variant concatenation, that can simultaneously achieve segmentation, classification, and localization using the same architecture. We trained the classification branch using the <em>Brain Tumor MRI Dataset</em>, and the segmentation branch using a “<em>Brain Tumor Segmentation</em> dataset. The integration of our method’s output can aid in simultaneous classification, segmentation, and localization of four types of brain tumors in MRI scans. The proposed multi-task framework achieved 97% accuracy in classification and a dice similarity score of 0.86 for segmentation. In addition, the method shows higher computational efficiency compared to existing methods. Our method can be a promising tool for assistive diagnosis in low-resource healthcare settings where skilled radiologists are scarce.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100346"},"PeriodicalIF":0.0,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000122/pdfft?md5=36c2c4383abffb72e6a44ae52a4e5a0c&pid=1-s2.0-S2590005624000122-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140769030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArrayPub Date : 2024-04-17DOI: 10.1016/j.array.2024.100344
Hong Fang , Dahao Liang , Weiyu Xiang
{"title":"Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection","authors":"Hong Fang , Dahao Liang , Weiyu Xiang","doi":"10.1016/j.array.2024.100344","DOIUrl":"https://doi.org/10.1016/j.array.2024.100344","url":null,"abstract":"<div><p>With the rise of social media and online interactions, there is a growing need for analytical models capable of understanding the nuanced, multi-modal communication inherent in platforms, especially for detecting sarcasm. Existing research employs multi-stage models along with extensive semantic information extractions and single-modal encoders. These models often struggle with efficient aligning and fusing multi-modal representations. Addressing these shortcomings, we introduce the Single-Stage Extensive Semantic Fusion (SSESF) model, designed to concurrently process multi-modal inputs in a unified framework, which performs encoding and fusing in the same architecture with shared parameters. A projection mechanism is employed to overcome the challenges posed by the diversity of inputs and the integration of a wide range of semantic information. Additionally, we design a multi-objective optimization that enhances the model’s ability to learn latent semantic nuances with supervised contrastive learning. The unified framework emphasizes the interaction and integration of multi-modal data, while multi-objective optimization preserves the complexity of semantic nuances for sarcasm detection. Experimental results on a public multi-modal sarcasm dataset demonstrate the superiority of our model, achieving state-of-the-art performance. The findings highlight the model’s capability to integrate extensive semantic information, demonstrating its effectiveness in the simultaneous interpretation and fusion of multi-modal data for sarcasm detection.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100344"},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000109/pdfft?md5=5136c2ac1ad918984ba24754918dce68&pid=1-s2.0-S2590005624000109-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140619309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArrayPub Date : 2024-04-09DOI: 10.1016/j.array.2024.100343
Qingsong Huang , Junqing Fan , Haoran Xu , Wei Han , Xiaohui Huang , Yunliang Chen
{"title":"AFENet: Attention-guided feature enhancement network and a benchmark for low-altitude UAV sewage outfall detection","authors":"Qingsong Huang , Junqing Fan , Haoran Xu , Wei Han , Xiaohui Huang , Yunliang Chen","doi":"10.1016/j.array.2024.100343","DOIUrl":"https://doi.org/10.1016/j.array.2024.100343","url":null,"abstract":"<div><p>Inspecting sewage outfall into rivers is significant to the precise management of the ecological environment because they are the last gate for pollutants to enter the river. Unmanned Aerial Vehicles (UAVs) have the characteristics of maneuverability and high-resolution images and have been used as an important means to inspect sewage outfalls. UAVs are widely used in daily sewage outfall inspections, but relying on manual interpretation lacks the corresponding low-altitude sewage outfall images dataset. Meanwhile, because of the sparse spatial distribution of sewage outfalls, problems like less labeled sample data, complex background types, and weak objects are also prominent. In order to promote the inspection of sewage outfalls, this paper proposes a low-attitude sewage outfall object detection dataset, namely UAV-SOD, and an attention-guided feature enhancement network, namely AFENet. The UAV-SOD dataset features high resolution, complex backgrounds, and diverse objects. Some of the outfall objects are limited by multi-scale, single-colored, and weak feature responses, leading to low detection accuracy. To localize these objects effectively, AFENet first uses the global context block (GCB) to jointly explore valuable global and local information, and then the region of interest (RoI) attention module (RAM) is used to explore the relationships between RoI features. Experimental results show that the proposed method improves detection performance on the proposed UAV-SOD dataset than representative state-of-the-art two-stage object detection methods.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100343"},"PeriodicalIF":0.0,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000092/pdfft?md5=c8639340099f7cc1f4ba21449477dc2a&pid=1-s2.0-S2590005624000092-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140551183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small group pedestrian crossing behaviour prediction using temporal angular 2D skeletal pose","authors":"Hanugra Aulia Sidharta , Berlian Al Kindhi , Eko Mulyanto Yuniarno , Mauridhi Hery Purnomo","doi":"10.1016/j.array.2024.100341","DOIUrl":"10.1016/j.array.2024.100341","url":null,"abstract":"<div><p>A pedestrian is classified as a Vulnerable Road User (VRU) because they do not have the protective equipment that would make them fatal if they were involved in an accident. An accident can happen while a pedestrian is on the road, especially when crossing the road. To ensure pedestrian safety, it is necessary to understand and predict pedestrian behaviour when crossing the road. We propose pedestrian intention prediction using a 2D pose estimation approach with temporal angle as a feature. Based on visual observation of the Joint Attention in Autonomous Driving (JAAD) dataset, we found that pedestrians tend to walk together in small groups while waiting to cross, and then this group is disbanded on the opposite side of the road. Thus, we propose to perform prediction with small group of pedestrians, based on pedestrian statistical data, we define a small group of pedestrians as consisting of 4 pedestrians. Another problem raised is 2D pose estimation is processing each pedestrian index individually, which creates ambiguous pedestrian index in consecutive frame. We propose Multi Input Single Output (MISO), which has capabilities to process multiple pedestrians together, and use summation layer at the end of the model to solve the ambiguous pedestrian index problem without performing tracking on each pedestrian. The performance of our proposed model achieves model accuracy of 0.9306 with prediction performance of 0.8317.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100341"},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000079/pdfft?md5=255bf8dee6ebbdca068e698762cee29a&pid=1-s2.0-S2590005624000079-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140091770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing object detection in low-resolution images via frequency domain learning","authors":"Shuaiqiang Gao , Yunliang Chen , Ningning Cui , Wenjian Qin","doi":"10.1016/j.array.2024.100342","DOIUrl":"https://doi.org/10.1016/j.array.2024.100342","url":null,"abstract":"<div><p>To meet the requirements of navigation devices in terms of weight, power consumption, and size, it is necessary to capture low-resolution images or transmit low-resolution images to a server for object detection. However, due to the lack of details and frequency information, even state-of-the-art detection methods face challenges in accurately identifying objects. To tackle this issue, we introduce a novel upsampling method termed multi-wave representation upsampling, accompanied by a training strategy aimed at reinstating high-frequency details and augmenting the precision of object detection. Finally, we conduct empirical experiments showing that compared to alternative methodologies, our proposed approach yields images exhibiting minimal disparities in frequency compared to high-resolution counterparts. Additionally, it exhibits superior performance across objects of varying scales, while simultaneously demonstrating reduced parameter count and enhanced computational efficiency.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100342"},"PeriodicalIF":0.0,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000080/pdfft?md5=5c4a2e90b7f870b58f73cec79a3a6c25&pid=1-s2.0-S2590005624000080-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140122445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArrayPub Date : 2024-03-03DOI: 10.1016/j.array.2024.100340
Yuzheng Liu , Jianxun Zhang , Lei Shi , Mingxiang Huang , Linyu Lin , Lingfeng Zhu , Xianglu Lin , Chuanlei Zhang
{"title":"Detection method of the seat belt for workers at height based on UAV image and YOLO algorithm","authors":"Yuzheng Liu , Jianxun Zhang , Lei Shi , Mingxiang Huang , Linyu Lin , Lingfeng Zhu , Xianglu Lin , Chuanlei Zhang","doi":"10.1016/j.array.2024.100340","DOIUrl":"https://doi.org/10.1016/j.array.2024.100340","url":null,"abstract":"<div><p>In the domain of outdoor construction within the power industry, working at significant heights is common, requiring stringent safety measures. Workers are mandated to wear hard hats and secure themselves with seat belts to prevent potential falls, ensuring their safety and reducing the risk of injuries. Detecting seat belt usage holds immense significance in safety inspections within the power industry. This study introduces detection method of the seat belt for workers at height based on UAV Image and YOLO Algorithm. The YOLOv5 approach involves integrating CSPNet into the Darknet53 backbone, incorporating the Focus layer into CSP-Darknet53, replacing the SPPF block in the SPP model, and implementing the CSPNet strategy in the PANet model. Experimental results demonstrate that the YOLOv5 algorithm achieves an elevated average accuracy of 99.2%, surpassing benchmarks set by FastRcnn, SSD, YOLOX-m, and YOLOv7. It also demonstrates superior adaptability in scenarios involving smaller objects, validated using a UAV-collected dataset of seat belt images. These findings confirm the algorithm's compliance with performance criteria for seat belt detection at power construction sites, making a significant contribution to enhancing safety measures within the power industry's construction practices.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100340"},"PeriodicalIF":0.0,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000067/pdfft?md5=50dec4f4bfbf478e832b65943e75f531&pid=1-s2.0-S2590005624000067-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140042676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}