Journal of Real-Time Image Processing最新文献_第10页

Multiple layers complexity allocation with dynamic control scheme for high-efficiency video coding 高效视频编码的多层复杂性分配与动态控制方案

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-04 DOI: 10.1007/s11554-024-01452-6

Jiunn-Tsair Fang, Ju-Kai Chen

引用次数: 0

SDPH: a new technique for spatial detection of path holes from huge volume high-resolution raster images in near real-time SDPH：一种近乎实时地从大体积高分辨率光栅图像中进行路径孔空间检测的新技术

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-04 DOI: 10.1007/s11554-024-01451-7

Murat Tasyurek

{"title":"SDPH: a new technique for spatial detection of path holes from huge volume high-resolution raster images in near real-time","authors":"Murat Tasyurek","doi":"10.1007/s11554-024-01451-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01451-7","url":null,"abstract":"Detecting and repairing road defects is crucial for road safety, vehicle maintenance, and enhancing tourism on well-maintained roads. However, monitoring all roads by vehicle incurs high costs. With the widespread use of remote sensing technologies, high-resolution satellite images offer a cost-effective alternative. This study proposes a new technique, SDPH, for automated detection of damaged roads from vast, high-resolution satellite images. In the SDPH technique, satellite images are organized in a pyramid grid file system, allowing deep learning methods to efficiently process them. The images, generated as (256times 256) dimensions, are stored in a directory with explicit location information. The SDPH technique employs a two-stage object detection models, utilizing classical and modified RCNNv3, YOLOv5, and YOLOv8. Classical RCNNv3, YOLOv5, and YOLOv8 and modified RCNNv3, YOLOv5, and YOLOv8 in the first stage for identifying roads, achieving f1 scores of 0.743, 0.716, 0.710, 0.955, 0.958, and 0.954, respectively. When the YOLOv5, with the highest f1 score, was fed to the second stage; modified RCNNv3, YOLOv5, and YOLOv8 detected road defects, achieving f1 scores of 0.957,0.971 and 0.964 in the second process. When the same CNN model was used for road and road defect detection in the proposed SDPH model, classical RCNNv3, improved RCNNv3, classical YOLOv5, improved YOLOv5, classical YOLOv8, improved RCNNv8 achieved micro f1 scores of 0.752, 0.956, 0.726, 0.969, 0.720 and 0.965, respectively. In addition, these models processed 11, 10, 33, 31, 37, and 36 FPS images by performing both stage operations, respectively. Evaluations on geotiff satellite images from Kayseri Metropolitan Municipality, ranging between 20 and 40 gigabytes, demonstrated the efficiency of the SDPH technique. Notably, the modified YOLOv5 outperformed, detecting paths and defects in 0.032 s with the micro f1 score of 0.969. Fine-tuning on TileCache enhanced f1 scores and reduced computational costs across all models.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"30 1 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online continual streaming learning for embedded space applications 用于嵌入式空间应用的在线持续流式学习

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-02 DOI: 10.1007/s11554-024-01438-4

Alaa Eddine Mazouz, Van-Tam Nguyen

{"title":"Online continual streaming learning for embedded space applications","authors":"Alaa Eddine Mazouz, Van-Tam Nguyen","doi":"10.1007/s11554-024-01438-4","DOIUrl":"https://doi.org/10.1007/s11554-024-01438-4","url":null,"abstract":"This paper proposes an online continual learning (OCL) methodology tested on hardware and validated for space applications using an object detection close-proximity operations task. The proposed OCL algorithm simulates a streaming scenario and uses experience replay to enable the model to update its knowledge without suffering catastrophic forgetting by saving past inputs in an onboard reservoir that will be sampled during updates. A stream buffer is introduced to enable online training, i.e., the ability to update the model as data is streamed, one sample at a time, rather than being available in batches. Hyperparameters such as buffer sizes, update rate, batch size, batch concatenation parameters and number of iterations per batch are all investigated to find an optimized approach for the incremental domain and streaming learning task. The algorithm is tested on a customized dataset for space applications simulating changes in visual environments that significantly impact the deployed model’s performance. Our OCL methodology uses Weighted Sampling, a novel approach which allows the system to analytically choose more useful input samples during training, the results show that a model can be updated online achieving up to 60% Average Learning while Average Forgetting can be as low as 13% all with a Model Size Efficiency of 1, meaning the model size does not increase. An additional contribution is an implementation of On-Device Continual Training for embedded applications, a hardware experiment is carried out on the Zynq 7100 FPGA where a pre-trained CNN model is updated online using our FPGA backpropagation pipeline and OCL methodology to take into account new data and satisfactorily complete the planned task in less than 5 min achieving 90 FPS.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"47 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance evaluation of all intra Kvazaar and x265 HEVC encoders on embedded system Nvidia Jetson platform 嵌入式系统 Nvidia Jetson 平台上所有 Kvazaar 和 x265 HEVC 编码器的性能评估

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-04-02 DOI: 10.1007/s11554-024-01429-5

R. James, Mohammed Abo-Zahhad, Koji Inoue, Mohammed S. Sayed

{"title":"Performance evaluation of all intra Kvazaar and x265 HEVC encoders on embedded system Nvidia Jetson platform","authors":"R. James, Mohammed Abo-Zahhad, Koji Inoue, Mohammed S. Sayed","doi":"10.1007/s11554-024-01429-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01429-5","url":null,"abstract":"The growing demand for high-quality video requires complex coding techniques that cost resource consumption and increase encoding time which represents a challenge for real-time processing on Embedded Systems. Kvazaar and x265 encoders are two efficient implementations of the High-Efficient Video Coding (HEVC) standard. In this paper, the performance of All Intra Kvazaar and x265 encoders on the Nvidia Jetson platform was evaluated using two coding configurations; highspeed preset and high-quality preset. In our work, we used two scenarios, first, the two encoders were run on the CPU, and based on the average encoding time Kvazaar proved to be 65.44% and 69.4% faster than x265 with 1.88% and 0.6% BD-rate improvement over x265 at high-speed and high-quality preset, respectively. In the second scenario, the two encoders were run on the GPU of the Nvidia Jetson, and the results show the average encoding time under each preset is reduced by half of the CPU-based scenario. In addition, Kvazaar is 54.5% and 56.70% faster with 1.93% and 0.45% BD-rate improvement over x265 at high-speed and high-quality preset, respectively. Regarding the scalability, the two encoders on the CPU are linearly scaled up to four threads and speed remains constant afterward. On the GPU, the two encoders are scaled linearly with the number of threads. The obtained results confirmed that, Kvazaar is more efficient and that it can be used on Embedded Systems for real-time video applications due to its high speed and performance over the x265 HEVC encoder","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"124 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140601914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Slim-neck by GSConv: a lightweight-design for real-time detector architectures GSConv 的 Slim-neck：用于实时检测器架构的轻量级设计

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-03-29 DOI: 10.1007/s11554-024-01436-6

Hulin Li, Jun Li, Hanbing Wei, Zheng Liu, Zhenfei Zhan, Qiliang Ren

引用次数: 0

FPGA-SoC implementation of YOLOv4 for flying-object detection 用于飞行物探测的 YOLOv4 FPGA-SoC 实现

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-03-29 DOI: 10.1007/s11554-024-01440-w

Dai-Duong Nguyen, Dang-Tuan Nguyen, Minh-Thuy Le, Quoc-Cuong Nguyen

{"title":"FPGA-SoC implementation of YOLOv4 for flying-object detection","authors":"Dai-Duong Nguyen, Dang-Tuan Nguyen, Minh-Thuy Le, Quoc-Cuong Nguyen","doi":"10.1007/s11554-024-01440-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01440-w","url":null,"abstract":"Flying-object detection has become an increasingly attractive avenue for research, particularly with the rising prevalence of unmanned aerial vehicle (UAV). Utilizing deep learning methods offers an effective means of detection with high accuracy. Meanwhile, the demand to implement deep learning models on embedded devices is growing, fueled by the requirement for capabilities that are both real-time and power efficient. FPGA have emerged as the optimal choice for its parallelism, flexibility and energy efficiency. In this paper, we propose an FPGA-based design for YOLOv4 network to address the problem of flying-object detection. Our proposed design explores and provides a suitable solution for overcoming the challenge of limited floating-point resources while maintaining the accuracy and obtain real-time performance and energy efficiency. We have generated an appropriate dataset of flying objects for implementing, training and fine-tuning the network parameters base on this dataset, and then changing some suitable components in the YOLO networks to fit for the deployment on FPGA. Our experiments in Xilinx ZCU104 development kit show that with our implementation, the accuracy is competitive with the original model running on CPU and GPU despite the process of format conversion and model quantization. In terms of speed, the FPGA implementation with the ZCU104 kit is inferior to the ultra high-end GPU, the RTX 2080Ti, but outperforms the GTX 1650. In terms of power consumption, the FPGA implementation is significantly lower than the GPU GTX 1650 about 3 times and about 7 times lower than RTX 2080Ti. In terms of energy efficiency, FPGA is completely superior to GPU with 2–3 times more efficient than the RTX 2080Ti and 3–4 times that of the GTX 1650.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"130 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140324535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LSDNet: a lightweight ship detection network with improved YOLOv7 LSDNet：改进了 YOLOv7 的轻量级船舶探测网络

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-03-27 DOI: 10.1007/s11554-024-01441-9

Cui Lang, Xiaoyan Yu, Xianwei Rong

引用次数: 0

FE-YOLO: YOLO ship detection algorithm based on feature fusion and feature enhancement FE-YOLO：基于特征融合和特征增强的 YOLO 船舶探测算法

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-03-27 DOI: 10.1007/s11554-024-01445-5

Shouwen Cai, Hao Meng, Junbao Wu

{"title":"FE-YOLO: YOLO ship detection algorithm based on feature fusion and feature enhancement","authors":"Shouwen Cai, Hao Meng, Junbao Wu","doi":"10.1007/s11554-024-01445-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01445-5","url":null,"abstract":"The technology for detecting maritime targets is crucial for realizing ship intelligence. However, traditional detection algorithms are not ideal due to the diversity of marine targets and complex background environments. Therefore, we choose YOLOv7 as the baseline and propose an end-to-end feature fusion and feature enhancement YOLO (FE-YOLO). First, we introduce channel attention and lightweight Ghostconv into the extended efficient layer aggregation network of YOLOv7, resulting in the improved extended efficient layer aggregation network (IELAN) module. This improvement enables the model to capture context information better and thus enhance the target features. Second, to enhance the network’s feature fusion capability, we design the light spatial pyramid pooling combined with the spatial channel pooling (LSPPCSPC) module and the coordinate attention feature pyramid network (CA-FPN). Furthermore, we develop an N-Loss based on normalized Wasserstein distance (NWD), effectively addressing the class imbalance issue in the ship dataset. Experimental results on the open-source Singapore maritime dataset (SMD) and SeaShips dataset demonstrate that compared to the baseline YOLOv7, FE-YOLO achieves an increase of 4.6% and 3.3% in detection accuracy, respectively.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"76 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IPCRGC-YOLOv7: face mask detection algorithm based on improved partial convolution and recursive gated convolution IPCRGC-YOLOv7：基于改进的部分卷积和递归门控卷积的人脸面具检测算法

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-03-26 DOI: 10.1007/s11554-024-01448-2

Huaping Zhou, Anpei Dang, Kelei Sun

{"title":"IPCRGC-YOLOv7: face mask detection algorithm based on improved partial convolution and recursive gated convolution","authors":"Huaping Zhou, Anpei Dang, Kelei Sun","doi":"10.1007/s11554-024-01448-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01448-2","url":null,"abstract":"In complex scenarios, current detection algorithms often face challenges such as misdetection and omission when identifying irregularities in pedestrian mask wearing. This paper introduces an enhanced detection method called IPCRGC-YOLOv7 (Improved Partial Convolution Recursive Gate Convolution-YOLOv7) as a solution. Firstly, we integrate the Partial Convolution structure into the backbone network to effectively reduce the number of model parameters. To address the problem of vanishing training gradients, we utilize the residual connection structure derived from the RepVGG network. Additionally, we introduce an efficient aggregation module, PRE-ELAN (Partially Representative Efficiency-ELAN), to replace the original Efficient Long-Range Attention Network (ELAN) structure. Next, we improve the Cross Stage Partial Network (CSPNet) module by incorporating recursive gated convolution. Introducing a new module called CSPNRGC (Cross Stage Partial Network Recursive Gated Convolution), we replace the ELAN structure in the Neck part. This enhancement allows us to achieve higher order spatial interactions across different network hierarchies. Lastly, in the loss function component, we replace the original cross-entropy loss function with Efficient-IoU to enhance loss calculation accuracy. To address the challenge of balancing the contributions of high-quality and low-quality sample weights in the loss, we propose a new loss function called Wise-EIoU (Wise-Efficient IoU). The experimental results show that the IPCRGC-YOLOv7 algorithm improves accuracy by 4.71%, recall by 5.94%, mean Average Precision (mAP@0.5) by 2.9%, and mAP@.5:.95 by 2.7% when compared to the original YOLOv7 algorithm, which can meet the requirements for mask wearing detection accuracy in practical application scenarios.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"128 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An end-to-end framework for real-time violent behavior detection based on 2D CNNs 基于二维 CNN 的端到端暴力行为实时检测框架

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-03-25 DOI: 10.1007/s11554-024-01443-7

Peng Zhang, Lijia Dong, Xinlei Zhao, Weimin Lei, Wei Zhang

{"title":"An end-to-end framework for real-time violent behavior detection based on 2D CNNs","authors":"Peng Zhang, Lijia Dong, Xinlei Zhao, Weimin Lei, Wei Zhang","doi":"10.1007/s11554-024-01443-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01443-7","url":null,"abstract":"Violent behavior detection (VioBD), as a special action recognition task, aims to detect violent behaviors in videos, such as mutual fighting and assault. Some progress has been made in the research of violence detection, but the existing methods have poor real-time performance and the algorithm performance is limited by the interference of complex backgrounds and the occlusion of dense crowds. To solve the above problems, we propose an end-to-end real-time violence detection framework based on 2D CNNs. First, we propose a lightweight skeletal image (SI) as the input modality, which can obtain the human body posture information and richer contextual information, and at the same time remove the background interference. As tested, at the same accuracy, the resolution of SI modality is only one-third of that of RGB modality, which greatly improves the real-time performance of model training and inference, and at the same resolution, SI modality has higher inaccuracy. Second, we also design a parallel prediction module (PPM), which can simultaneously obtain the single image detection results and the inter-frame motion information of the video, which can improve the real-time performance of the algorithm compared with the traditional “detect the image first, understand the video later\" mode. In addition, we propose an auxiliary parameter generation module (APGM) with both efficiency and accuracy, APGM is a 2D CNNs-based video understanding module for weighting the spatial information of the video features, processing speed can reach 30–40 frames per second, and compared with models such as CNN-LSTM (Iqrar et al., Aamir: Cnn-lstm based smart real-time video surveillance system. In: 2022 14th International Conference on Mathematics, Actuarial, Science, Computer Science and Statistics (MACS), pages 1–5. IEEE, 2022) and Ludl et al. (Cristóbal: Simple yet efficient real-time pose-based action recognition. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 581–588. IEEE, 1999), the propagation effect speed can be increased by an average of (3 sim 20) frames per second per group of clips, which further improves the video motion detection efficiency and accuracy, greatly improving real-time performance. We conducted experiments on some challenging benchmarks, and RVBDN can maintain excellent speed and accuracy in long-term interactions, and are able to meet real-time requirements in methods for violence detection and spatio-temporal action detection. Finally, we update our proposed new dataset on violence detection images (violence image dataset). Dataset is available at https://github.com/ChinaZhangPeng/Violence-Image-Dataset","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"27 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140302668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0