Journal of Real-Time Image Processing最新文献

High-precision real-time autonomous driving target detection based on YOLOv8 基于 YOLOv8 的高精度实时自动驾驶目标检测

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-19 DOI: 10.1007/s11554-024-01553-2

Huixin Liu, Guohua Lu, Mingxi Li, Weihua Su, Ziyi Liu, Xu Dang, Dongyuan Zang

{"title":"High-precision real-time autonomous driving target detection based on YOLOv8","authors":"Huixin Liu, Guohua Lu, Mingxi Li, Weihua Su, Ziyi Liu, Xu Dang, Dongyuan Zang","doi":"10.1007/s11554-024-01553-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01553-2","url":null,"abstract":"In traffic scenarios, the size of targets varies significantly, and there is a limitation on computing power. This poses a significant challenge for algorithms to detect traffic targets accurately. This paper proposes a new traffic target detection method that balances accuracy and real-time performance—Deep and Filtered You Only Look Once (DF-YOLO). In response to the challenges posed by significant differences in target scales within complex scenes, we designed the Deep and Filtered Path Aggregation Network (DF-PAN). This module effectively fuses multi-scale features, enhancing the model's capability to detect multi-scale targets accurately. In response to the challenge posed by limited computational resources, we design a parameter-sharing detection head (PSD) and use Faster Neural Network (FasterNet) as the backbone network. PSD reduces computational load by parameter sharing and allows for feature extraction capability sharing across different positions. FasterNet enhances memory access efficiency, thereby maximizing computational resource utilization. The experimental results on the KITTI dataset show that our method achieves satisfactory balances between real-time and precision and reaches 90.9% mean average precision(mAP) with 77 frames/s, and the number of parameters is reduced by 28.1% and the detection accuracy is increased by 3% compared to the baseline model. We test it on the challenging BDD100K dataset and the SODA10M dataset, and the results show that DF-YOLO has excellent generalization ability.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"15 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GMS-YOLO: an enhanced algorithm for water meter reading recognition in complex environments GMS-YOLO：复杂环境下水表读数识别的增强算法

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-13 DOI: 10.1007/s11554-024-01551-4

Yu Wang, Xiaodong Xiang

{"title":"GMS-YOLO: an enhanced algorithm for water meter reading recognition in complex environments","authors":"Yu Wang, Xiaodong Xiang","doi":"10.1007/s11554-024-01551-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01551-4","url":null,"abstract":"The disordered arrangement of water-meter pipes and the random rotation angles of their mechanical character wheels frequently result in captured water-meter images exhibiting tilt, blur, and incomplete characters. These issues complicate the detection of water-meter images, rendering traditional OCR (optical character recognition) methods inadequate for current detection requirements. Furthermore, the two-stage detection method, which involves first locating and then recognizing, proves overly cumbersome. In this paper, water-meter reading recognition is approached as an object-detection task, extracting readings using the algorithm’s Predicted Box information, establishing a water-meter dataset, and refining the algorithmic framework to improve the accuracy of recognizing incomplete characters. Utilizing YOLOv8n as the baseline, we propose GMS-YOLO, a novel object-detection algorithm that employs Grouped Multi-Scale Convolution for enhanced performance. First, by substituting the Bottleneck module’s convolution with GMSC (Grouped Multi-Scale Convolution), the model can access various scale receptive fields, thus boosting its feature-extraction prowess. Second, incorporating LSKA (Large Kernel Separable Attention) into the SPPF (Spatial Pyramid Pooling Fast) module improves the perception of fine-grained features. Finally, replacing CIoU (Generalized Intersection over Union) with the ShapeIoU bounding box loss function enhances the model’s ability to localize objects and speeds up its convergence. Evaluating a self-compiled water-meter image dataset, GMS-YOLO attained a mAP@0.5 of 92.4% and a precision of 93.2%, marking a 2.0% and 2.1% enhancement over YOLOv8n, respectively. Despite the increased computational burden, GMS-YOLO maintains an average detection time of 10 ms per image, meeting practical detection needs.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"9 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast rough mode decision algorithm and hardware architecture design for AV1 encoder AV1 编码器的快速粗略模式决策算法和硬件架构设计

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-12 DOI: 10.1007/s11554-024-01552-3

Heng Chen, Xiaofeng Huang, Zehao Tao, Qinghua Sheng, Yan Cui, Yang Zhou, Haibing Yin

{"title":"Fast rough mode decision algorithm and hardware architecture design for AV1 encoder","authors":"Heng Chen, Xiaofeng Huang, Zehao Tao, Qinghua Sheng, Yan Cui, Yang Zhou, Haibing Yin","doi":"10.1007/s11554-024-01552-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01552-3","url":null,"abstract":"To enhance compression efficiency, the AV1 video coding standard has introduced several new intra-prediction modes, such as smooth and finer directional prediction modes. However, this addition increases computational complexity and hinders parallelized hardware implementation. In this paper, a hardware-friendly rough mode decision (RMD) algorithm and its fully pipelined hardware architecture design are proposed to address these challenges. For algorithm optimization, firstly, a novel directional mode pruning algorithm is proposed. Then, the sum of absolute transform differences (SATD) cost accumulated approximation method is adopted during the tree search. Finally, in the reconstruction stage, a reconstruction approximation model based on the DC transform is proposed to solve the low-parallelism problem. For hardware architecture design, the proposed fully pipelined hardware architecture is implemented with 28 pipeline stages. This design can process multiple prediction modes in parallel. Experimental results show that the proposed fast algorithm achieves 46.8% time savings by 1.96% Bjøntegaard delta rate (BD-Rate) increase on average under all-intra (AI) configuration. When synthesized under the 28nm UMC technology, the proposed hardware can operate at a frequency of 316.2 MHz with 1113.14 K gate count.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"62 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AdaptoMixNet: detection of foreign objects on power transmission lines under severe weather conditions AdaptoMixNet：在恶劣天气条件下检测输电线路上的异物

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-12 DOI: 10.1007/s11554-024-01546-1

Xinghai Jia, Chao Ji, Fan Zhang, Junpeng Liu, Mingjiang Gao, Xinbo Huang

{"title":"AdaptoMixNet: detection of foreign objects on power transmission lines under severe weather conditions","authors":"Xinghai Jia, Chao Ji, Fan Zhang, Junpeng Liu, Mingjiang Gao, Xinbo Huang","doi":"10.1007/s11554-024-01546-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01546-1","url":null,"abstract":"With the expansion of power transmission line scale, the surrounding environment is complex and susceptible to foreign objects, severely threatening its safe operation. The current algorithm lacks stability and real-time performance in small target detection and severe weather conditions. Therefore, this paper proposes a method for detecting foreign objects on power transmission lines under severe weather conditions based on AdaptoMixNet. First, an Adaptive Fusion Module (AFM) is introduced, which improves the model's accuracy and adaptability through multi-scale feature extraction, fine-grained information preservation, and enhancing context information. Second, an Adaptive Feature Pyramid Module (AEFPM) is proposed, which enhances the focus on local details while preserving global information, improving the stability and robustness of feature representation. Finally, the Neuron Expansion Recursion Adaptive Filter (CARAFE) is designed, which enhances feature extraction, adaptive filtering, and recursive mechanisms, improving detection accuracy, robustness, and computational efficiency. Experimental results show that the method of this paper exhibits excellent performance in the detection of foreign objects on power transmission lines under complex backgrounds and harsh weather conditions.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"19 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features Mfdd：基于面部特征的多尺度注意力疲劳和分心驾驶检测器

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-11 DOI: 10.1007/s11554-024-01549-y

Yulin Shi, Jintao Cheng, Xingming Chen, Jiehao Luo, Xiaoyu Tang

{"title":"Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features","authors":"Yulin Shi, Jintao Cheng, Xingming Chen, Jiehao Luo, Xiaoyu Tang","doi":"10.1007/s11554-024-01549-y","DOIUrl":"https://doi.org/10.1007/s11554-024-01549-y","url":null,"abstract":"With the rapid expansion of the automotive industry and the continuous growth of vehicle fleets, traffic safety has become a critical global social issue. Developing detection and alert systems for fatigue and distracted driving is essential for enhancing traffic safety. Factors, such as variations in the driver’s facial details, lighting conditions, and camera pixel quality, significantly affect the accuracy of fatigue and distracted driving detection, often resulting in the low effectiveness of existing methods. This study introduces a new network designed to detect fatigue and distracted driving amidst the complex backgrounds typical within vehicles. To extract driver and facial information as well as gradient details more efficiently, we introduce the Multihead Difference Kernel Convolution Module (MDKC) and Multiscale Large Convolutional Fusion Module (MLCF) in baseline. This incorporates a blend of Multihead Mixed Convolution and Large and Small Convolutional Kernels to amplify the spatial intricacies of the backbone. To extract gradient details from different illumination and noise feature maps, we enhance the network’s neck by introducing the Adaptive Convolutional Attention Module (ACAM) in NECK, optimizing feature retention. Extensive comparative experiments validate the efficacy of our network, showcasing superior performance not only on the Fatigue and Distracted Driving Dataset but also competitive results on the public COCO dataset. Source code is available at https://github.com/SCNU-RISLAB/MFDD.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"59 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating YOLOv8 and CSPBottleneck based CNN for enhanced license plate character recognition 整合 YOLOv8 和基于 CSPBottleneck 的 CNN，增强车牌字符识别能力

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-10 DOI: 10.1007/s11554-024-01537-2

Sahil Khokhar, Deepak Kedia

引用次数: 0

A real-time visual SLAM based on semantic information and geometric information in dynamic environment 基于动态环境中语义信息和几何信息的实时视觉 SLAM

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-10 DOI: 10.1007/s11554-024-01527-4

Hongli Sun, Qingwu Fan, Huiqing Zhang, Jiajing Liu

{"title":"A real-time visual SLAM based on semantic information and geometric information in dynamic environment","authors":"Hongli Sun, Qingwu Fan, Huiqing Zhang, Jiajing Liu","doi":"10.1007/s11554-024-01527-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01527-4","url":null,"abstract":"Simultaneous Localization and Mapping (SLAM) is the core technology enabling mobile robots to autonomously explore and perceive the environment. However, dynamic objects in the scene significantly impact the accuracy and robustness of visual SLAM systems, limiting its applicability in real-world scenarios. Hence, we propose a real-time RGB-D visual SLAM algorithm designed for indoor dynamic scenes. Our approach includes a parallel lightweight object detection thread, which leverages the YOLOv7-tiny network to detect potential moving objects and generate 2D semantic information. Subsequently, a novel dynamic feature removal strategy is introduced in the tracking thread. This strategy integrates semantic information, geometric constraints, and feature point depth-based RANSAC to effectively mitigate the influence of dynamic features. To evaluate the effectiveness of the proposed algorithms, we conducted comparative experiments using other state-of-the-art algorithms on the TUM RGB-D dataset and Bonn RGB-D dataset, as well as in real-world dynamic scenes. The results demonstrate that the algorithm maintains excellent accuracy and robustness in dynamic environments, while also exhibiting impressive real-time performance.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"46 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LGFF-YOLO: small object detection method of UAV images based on efficient local–global feature fusion LGFF-YOLO：基于高效局部-全局特征融合的无人机图像小目标检测方法

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-06 DOI: 10.1007/s11554-024-01550-5

Hongxing Peng, Haopei Xie, Huanai Liu, Xianlu Guan

{"title":"LGFF-YOLO: small object detection method of UAV images based on efficient local–global feature fusion","authors":"Hongxing Peng, Haopei Xie, Huanai Liu, Xianlu Guan","doi":"10.1007/s11554-024-01550-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01550-5","url":null,"abstract":"Images captured by Unmanned Aerial Vehicles (UAVs) play a significant role in many fields. However, with the development of UAV technology, challenges such as detecting small and dense objects against complex backgrounds have emerged. In this paper, we propose LGFF-YOLO, a detection model that integrates a novel local–global feature fusion method with the YOLOv8 baseline, specifically designed for small object detection in UAV imagery. Our innovative approach employs the Global Information Fusion Module (GIFM) and the Four-Leaf Clover Fusion Module (FLCM) to enhance the fusion of multi-scale features, improving detection accuracy without increasing model complexity. Next, we proposed the RFA-Block and LDyHead to control the total number of model parameters and improve the representation capability for small object detection. Experimental results on the VisDrone2019 dataset demonstrate a 38.3% mAP with only 4.15M parameters, a 4. 5% increase over baseline YOLOv8, while achieving 79.1 FPS for real-time detection. These advancements enhance the model’s generalization capability, balancing accuracy and speed, and significantly extend its applicability for detecting small objects in UAV images.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"29 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A real-time foreign object detection method based on deep learning in complex open railway environments 复杂开放铁路环境中基于深度学习的异物实时检测方法

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-06 DOI: 10.1007/s11554-024-01548-z

Binlin Zhang, Qing Yang, Fengkui Chen, Dexin Gao

{"title":"A real-time foreign object detection method based on deep learning in complex open railway environments","authors":"Binlin Zhang, Qing Yang, Fengkui Chen, Dexin Gao","doi":"10.1007/s11554-024-01548-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01548-z","url":null,"abstract":"In response to the current challenges of numerous background influencing factors and low detection accuracy in the open railway foreign object detection, a real-time foreign object detection method based on deep learning for open railways in complex environments is proposed. Firstly, the images of foreign objects invading the clearance collected by locomotives during long-term operation are used to create a railway foreign object dataset that fits the current situation. Then, to improve the performance of the target detection algorithm, certain improvements are made to the YOLOv7-tiny network structure. The improved algorithm enhances feature extraction capability and strengthens detection performance. By introducing a Simple, parameter-free Attention Module for convolutional neural network (SimAM) attention mechanism, the representation ability of ConvNets is improved without adding extra parameters. Additionally, drawing on the network structure of the weighted Bi-directional Feature Pyramid Network (BiFPN), the backbone network achieves cross-level feature fusion by adding edges and neck fusion. Subsequently, the feature fusion layer is improved by introducing the GhostNetV2 module, which enhances the fusion capability of different scale features and greatly reduces computational load. Furthermore, the original loss function is replaced with the Normalized Wasserstein Distance (NWD) loss function to enhance the recognition capability of small distant targets. Finally, the proposed algorithm is trained and validated, and compared with other mainstream detection algorithms based on the established railway foreign object dataset. Experimental results show that the proposed algorithm achieves applicability and real-time performance on embedded devices, with high accuracy, improved model performance, and provides precise data support for railway safety assurance.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"7 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient and real-time skin lesion image segmentation using spatial-frequency information and channel convolutional networks 利用空间频率信息和通道卷积网络高效、实时地分割皮肤病变图像

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-09-03 DOI: 10.1007/s11554-024-01542-5

Shangwang Liu, Bingyan Zhou, Yinghai Lin, Peixia Wang

{"title":"Efficient and real-time skin lesion image segmentation using spatial-frequency information and channel convolutional networks","authors":"Shangwang Liu, Bingyan Zhou, Yinghai Lin, Peixia Wang","doi":"10.1007/s11554-024-01542-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01542-5","url":null,"abstract":"Accurate segmentation of skin lesions is essential for physicians to screen in dermoscopy images. However, they commonly face three main limitations: difficulty in accurately processing targets with coarse edges; frequent challenges in recovering detailed feature data; and a lack of adequate capability for the effective amalgamation of multi-scale features. To overcome these problems, we propose a skin lesion segmentation network (SFCC Net) that combines an attention mechanism and a redundancy reduction strategy. The initial step involved the design of a downsampling encoder and an encoder composed of Receptive Field (REFC) Blocks, aimed at supplementing lost details and extracting latent features. Subsequently, the Spatial-Frequency-Channel (SF) Block was employed to minimize feature redundancy and restore fine-grained information. To fully leverage previously learned features, an Up-sampling Convolution (UpC) Block was designed for information integration. The network’s performance was compared with state-of-the-art models on four public datasets. Experimental results demonstrate significant improvements in the network’s performance. On the ISIC datasets, the proposed network outperformed D-LKA Net by 4.19%, 0.19%, and 7.75% in F1, and by 2.14%, 0.51%, and 12.20% in IoU. The frame rate (FPS) of the proposed network when processing skin lesion images underscores its suitability for real-time image analysis. Additionally, the network’s generalization capability was validated on a lung dataset.\u0000","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"60 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0