Journal of Real-Time Image Processing最新文献_第9页

$$eta$$ -repyolo: real-time object detection method based on $$eta$$ -RepConv and YOLOv8 $$eta$ -repyolo：基于 $$eta$ -RepConv 和 YOLOv8 的实时物体检测方法

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-05-03 DOI: 10.1007/s11554-024-01462-4

Shuai Feng, Huaming Qian, Huilin Wang, Wenna Wang

{"title":"$$eta$$ -repyolo: real-time object detection method based on $$eta$$ -RepConv and YOLOv8","authors":"Shuai Feng, Huaming Qian, Huilin Wang, Wenna Wang","doi":"10.1007/s11554-024-01462-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01462-4","url":null,"abstract":"Deep learning-based object detection methods often grapple with excessive model parameters, high complexity, and subpar real-time performance. In response, the YOLO series, particularly the YOLOv5s to YOLOv8s methods, has been developed by scholars to strike a balance between real-time processing and accuracy. Nevertheless, YOLOv8’s precision can fall short in certain specific applications. To address this, we introduce a real-time object detection method called (eta)-RepYOLO, which is built upon the (eta)-RepConv structure. This method is designed to maintain consistent detection speeds while improving accuracy. We begin by crafting a backbone network named (eta)-EfficientRep, which utilizes a strategically designed network unit-(eta)-RepConv and (eta)-RepC2f module, to reparameterize and subsequently generate an efficient inference model. This model achieves superior performance by extracting detailed feature maps from images. Subsequently, we propose the enhanced (eta)-RepPANet and (eta)-RepAFPN as the model’s detection neck, with the addition of the (eta)-RepC2f for optimized feature fusion, thus boosting the neck’s functionality. Our innovation continues with the development of an advanced decoupled head for detection, where the (eta)-RepConv takes the place of the traditional (3 times 3) conv, resulting in a marked increase in detection precision during the inference stage. Our proposed (eta)-RepYOLO method, when applied to distinct neck modules, (eta)-RepPANet and (eta)-RepAFPN, achieves mAP of 84.77%/85.65% on the PASCAL VOC07+12 dataset and AP of 45.3%/45.8% on the MSCOCO dataset, respectively. These figures represent a significant advancement over the YOLOv8s method. Additionally, the model parameters for (eta)-RepYOLO are reduced to 10.8M/8.8M, which is 3.6%/21.4% less than that of YOLOv8, culminating in a more streamlined detection model. The detection speeds clocked on an RTX3060 are 116 FPS/81 FPS, showcasing a substantial enhancement in comparison to YOLOv8s. In summary, our approach delivers competitive performance and presents a more lightweight alternative to the SOTA YOLO models, making it a robust choice for real-time object detection applications.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"31 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140882430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time and accurate model of instance segmentation of foods 实时准确的食品实例分割模型

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-30 DOI: 10.1007/s11554-024-01459-z

Yuhe Fan, Lixun Zhang, Canxing Zheng, Yunqin Zu, Keyi Wang, Xingyuan Wang

{"title":"Real-time and accurate model of instance segmentation of foods","authors":"Yuhe Fan, Lixun Zhang, Canxing Zheng, Yunqin Zu, Keyi Wang, Xingyuan Wang","doi":"10.1007/s11554-024-01459-z","DOIUrl":"https://doi.org/10.1007/s11554-024-01459-z","url":null,"abstract":"Instance segmentation of foods is an important technology to ensure the food success rate of meal-assisting robotics. However, due to foods have strong intraclass variability, interclass similarity, and complex physical properties, which leads to more challenges in recognition, localization, and contour acquisition of foods. To address the above issues, this paper proposed a novel method for instance segmentation of foods. Specifically, in backbone network, deformable convolution was introduced to enhance the ability of YOLOv8 architecture to capture finer-grained spatial information, and efficient multiscale attention based on cross-spatial learning was introduced to improve sensitivity and expressiveness of multiscale inputs. In neck network, classical convolution and C2f modules were replaced by lightweight convolution GSConv and improved VoV-GSCSP aggregation module, respectively, to improve inference speed of models. We abbreviated it as the DEG-YOLOv8n-seg model. The proposed method was compared with baseline model and several state-of-the-art (SOTA) segmentation models on datasets, respectively. The results show that the DEG-YOLOv8n-seg model has higher accuracy, faster speed, and stronger robustness. Specifically, the DEG-YOLOv8n-seg model can achieve 84.6% Box_mAP@0.5 and 84.1% Mask_mAP@0.5 accuracy at 55.2 FPS and 11.1 GFLOPs. The importance of adopting data augmentation and the effectiveness of introducing deformable convolution, EMA, and VoV-GSCSP were verified by ablation experiments. Finally, the DEG-YOLOv8n-seg model was applied to experiments of food instance segmentation for meal-assisting robots. The results show that the DEG-YOLOv8n-seg can achieve better instance segmentation of foods. This work can promote the development of intelligent meal-assisting robotics technology and can provide theoretical foundations for other tasks of the computer vision field with some reference value.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"21 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing UAV tracking: a focus on discriminative representations using contrastive instances 增强无人飞行器的跟踪能力：重点关注使用对比实例的判别表征

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-21 DOI: 10.1007/s11554-024-01456-2

Xucheng Wang, Dan Zeng, Yongxin Li, Mingliang Zou, Qijun Zhao, Shuiwang Li

{"title":"Enhancing UAV tracking: a focus on discriminative representations using contrastive instances","authors":"Xucheng Wang, Dan Zeng, Yongxin Li, Mingliang Zou, Qijun Zhao, Shuiwang Li","doi":"10.1007/s11554-024-01456-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01456-2","url":null,"abstract":"Addressing the core challenges of achieving both high efficiency and precision in UAV tracking is crucial due to limitations in computing resources, battery capacity, and maximum load capacity on UAVs. Discriminative correlation filter (DCF)-based trackers excel in efficiency on a single CPU but lag in precision. In contrast, many lightweight deep learning (DL)-based trackers based on model compression strike a better balance between efficiency and precision. However, higher compression rates can hinder performance by diminishing discriminative representations. Given these challenges, our paper aims to enhance feature representations’ discriminative abilities through an innovative feature-learning approach. We specifically emphasize leveraging contrasting instances to achieve more distinct representations for effective UAV tracking. Our method eliminates the need for manual annotations and facilitates the creation and deployment of lightweight models. As far as our knowledge goes, we are the pioneers in exploring the possibilities of contrastive learning in UAV tracking applications. Through extensive experimentation across four UAV benchmarks, namely, UAVDT, DTB70, UAV123@10fps and VisDrone2018, We have shown that our DRCI (discriminative representation with contrastive instances) tracker outperforms current state-of-the-art UAV tracking methods, underscoring its potential to effectively tackle the persistent challenges in this field.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"56 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140637100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel real-time pixel-level road crack segmentation network 新型实时像素级路面裂缝分割网络

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-20 DOI: 10.1007/s11554-024-01458-0

Rongdi Wang, Hao Wang, Zhenhao He, Jianchao Zhu, Haiqiang Zuo

{"title":"A novel real-time pixel-level road crack segmentation network","authors":"Rongdi Wang, Hao Wang, Zhenhao He, Jianchao Zhu, Haiqiang Zuo","doi":"10.1007/s11554-024-01458-0","DOIUrl":"https://doi.org/10.1007/s11554-024-01458-0","url":null,"abstract":"Road crack detection plays a vital role in preserving the life of roads and ensuring driver safety. Traditional methods relying on manual observation have limitations in terms of subjectivity and inefficiency in quantifying damage. In recent years, advances in deep learning techniques have held promise for automated crack detection, but challenges, such as low contrast, small datasets, and inaccurate localization, remain. In this paper, we propose a deep learning-based pixel-level road crack segmentation network that achieves excellent performance on multiple datasets. In order to enrich the receptive fields of conventional convolutional modules, we design a residual asymmetric convolutional module for feature extraction. In addition to this, a multiple receptive field cascade module and a feature fusion module with non-local attention are proposed. Our network demonstrates superior accuracy and inference speed, achieving 55.60%, 59.01%, 75.65%, and 57.95% IoU on the CrackForest, CrackTree, CDD, and Crack500 datasets, respectively. It also has the ability to process 143 images per second. Experimental results and analysis validate the effectiveness of our approach. This work contributes to the advancement of road crack detection, providing a valuable tool for road maintenance and safety improvement.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"8 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140629057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improved feature extraction network in lightweight YOLOv7 model for real-time vehicle detection on low-cost hardware 改进轻量级 YOLOv7 模型中的特征提取网络，在低成本硬件上实现实时车辆检测

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-20 DOI: 10.1007/s11554-024-01457-1

Johan Lela Andika, Anis Salwa Mohd Khairuddin, Harikrishnan Ramiah, Jeevan Kanesan

{"title":"Improved feature extraction network in lightweight YOLOv7 model for real-time vehicle detection on low-cost hardware","authors":"Johan Lela Andika, Anis Salwa Mohd Khairuddin, Harikrishnan Ramiah, Jeevan Kanesan","doi":"10.1007/s11554-024-01457-1","DOIUrl":"https://doi.org/10.1007/s11554-024-01457-1","url":null,"abstract":"The advancement of unmanned aerial vehicles (UAVs) has drawn researchers to update object detection algorithms for better accuracy and computation performance. Previous works applying deep learning models for object detection applications required high graphics processing unit (GPU) computation power. Generally, object detection models suffer trade-off between accuracy and model size where the relationship is not always linear in deep learning models. Various factors such as architectural design, optimization techniques, and dataset characteristics can significantly influence the accuracy, model size, and computation cost in adopting object detection models for low-cost embedded devices. Hence, it is crucial to employ lightweight object detection models for real-time object identification for the solution to be sustainable. In this work, an improved feature extraction network is proposed by incorporating an efficient long-range aggregation network for vehicle detection (ELAN-VD) in the backbone layer. The architecture improvement in YOLOv7-tiny model is proposed to improve the accuracy of detecting small vehicles in the aerial image. Besides that, the image size output of the second and third prediction boxes is upscaled for better performance. This study showed that the proposed method yields a mean average precision (mAP) of 57.94%, which is higher than that of the conventional YOLOv7-tiny. In addition, the proposed model showed significant performance when compared to previous works, making it viable for application in low-cost embedded devices.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140630353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Driver fatigue detection based on improved YOLOv7 基于改进型 YOLOv7 的驾驶员疲劳检测

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-13 DOI: 10.1007/s11554-024-01455-3

Xianguo Li, Xueyan Li, Zhenqian Shen, Guangmin Qian

{"title":"Driver fatigue detection based on improved YOLOv7","authors":"Xianguo Li, Xueyan Li, Zhenqian Shen, Guangmin Qian","doi":"10.1007/s11554-024-01455-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01455-3","url":null,"abstract":"Fatigue driving is one of the main reasons threatening road traffic safety. Aiming at the problems of complex detection process, low accuracy, and susceptibility to light interference in the current driver fatigue detection algorithm, this paper proposes a driver Eye State detection algorithm based on YOLO, abbreviated as ES-YOLO. The algorithm optimizes the structure of YOLOv7, integrates the multi-scale features using the convolutional block attention mechanism (CBAM), and improves the attention to important spatial locations in the image. Furthermore, using the Focal-EIOU Loss instead of CIOU Loss to increase the attention on difficult samples and reduce the influence of sample class imbalance. Then, based on ES-YOLO, a driver fatigue detection method is proposed, and the driver fatigue judgment logic is designed to monitor the fatigue state in real-time and alarm in time to improve the accuracy of detection. The experiments on the public dataset CEW and the self-made dataset show that the proposed ES-YOLO obtained 99.0% and 98.8% mAP values, respectively, which are better than the compared algorithms. And this method achieves real-time and accurate detection of driver fatigue status. Source code is released in https://www.github/driver-fatigue-detection.git.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"301 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion 基于并行无绳卷积的实时语义分割网络，用于短期密集串联和注意力特征融合

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-10 DOI: 10.1007/s11554-024-01453-5

Lijun Wu, Shangdong Qiu, Zhicong Chen

{"title":"Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion","authors":"Lijun Wu, Shangdong Qiu, Zhicong Chen","doi":"10.1007/s11554-024-01453-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01453-5","url":null,"abstract":"To address the problem of incomplete segmentation of large objects and miss-segmentation of tiny objects that is universally existing in semantic segmentation algorithms, PACAMNet, a real-time segmentation network based on short-term dense concatenate of parallel atrous convolution and fusion of attentional features is proposed, called PACAMNet. First, parallel atrous convolution is introduced to improve the short-term dense concatenate module. By adjusting the size of the atrous factor, multi-scale semantic information is obtained to ensure that the last layer of the module can also obtain rich input feature maps. Second, attention feature fusion module is proposed to align the receptive fields of deep and shallow feature maps via depth-separable convolutions with different sizes, and the channel attention mechanism is used to generate weights to effectively fuse the deep and shallow feature maps. Finally, experiments are carried out based on both Cityscapes and CamVid datasets, and the segmentation accuracy achieve 77.4% and 74.0% at the inference speeds of 98.7 FPS and 134.6 FPS, respectively. Compared with other methods, PACAMNet improves the inference speed of the model while ensuring higher segmentation accuracy, so PACAMNet achieve a better balance between segmentation accuracy and inference speed.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"105 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

F2S-Net: learning frame-to-segment prediction for online action detection F2S-Net：学习帧到段的预测，实现在线动作检测

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-10 DOI: 10.1007/s11554-024-01454-4

Yi Liu, Yu Qiao, Yali Wang

{"title":"F2S-Net: learning frame-to-segment prediction for online action detection","authors":"Yi Liu, Yu Qiao, Yali Wang","doi":"10.1007/s11554-024-01454-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01454-4","url":null,"abstract":"Online action detection (OAD) aims at predicting action per frame from a streaming untrimmed video in real time. Most existing approaches leverage all the historical frames in the sliding window as the temporal context of the current frame since single-frame prediction is often unreliable. However, such a manner inevitably introduces useless even noisy video content, which often misleads action classifier when recognizing the ongoing action in the current frame. To alleviate this difficulty, we propose a concise and novel F2S-Net, which can adaptively discover the contextual segments in the online sliding window, and convert current frame prediction into relevant-segment prediction. More specifically, as the current frame can be either action or background, we develop F2S-Net with a distinct two-branch structure, i.e., the action (or background) branch can exploit the action (or background) segments. Via multi-level action supervision, these two branches can complementarily enhance each other, allowing to identify the contextual segments in the sliding window to robustly predict what is ongoing. We evaluate our approach on popular OAD benchmarks, i.e., THUMOS-14, TVSeries and HDD. The extensive results show that our F2S-Net outperforms the recent state-of-the-art approaches.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"22 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A safety helmet-wearing detection method based on cross-layer connection 基于跨层连接的安全帽佩戴检测方法

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-09 DOI: 10.1007/s11554-024-01437-5

Gang Dong, Yefei Zhang, Weicheng Xie, Yong Huang

{"title":"A safety helmet-wearing detection method based on cross-layer connection","authors":"Gang Dong, Yefei Zhang, Weicheng Xie, Yong Huang","doi":"10.1007/s11554-024-01437-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01437-5","url":null,"abstract":"Given the current safety helmet detection methods, the feature information of the small-scale safety helmet will be lost after the network model is convolved many times, resulting in the problem of missing detection of the safety helmet. To this end, an improved target detection algorithm of YOLOv5 is used to detect the wearing of safety helmets. Firstly, a new small-scale detection layer is added to the head of the network for multi-scale feature fusion, thereby increasing the receptive field area of the feature map to improve the model’s recognition of small targets. Secondly, a cross-layer connection is designed between the feature extraction network and the feature fusion network to enhance the fine-grained features of the target in the shallow layer of the network. Thirdly, a coordinate attention (CA) module is added to the cross-layer connection to capture the global information of the image and improve the localization ability of the target. Finally, the Normalized Wasserstein Distance (NWD) is used to measure the similarity between bounding boxes, replacing the intersection over union (IoU) method. The experimental results show that the improved model achieves 95.09% of the mAP value for safety helmet-wearing detection, which has a good effect on the recognition of small-sized safety helmets of different degrees in the construction work scene.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"19 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Equivalent convolution strategy for the evolution computation in parametric active contour model 参数主动轮廓模型演化计算的等效卷积策略

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-05 DOI: 10.1007/s11554-024-01434-8

Kelun Tang, Lin Lang, Xiaojun Zhou

引用次数: 0