Journal of Real-Time Image Processing最新文献_第3页

Adversarial generative learning and timed path optimization for real-time visual image prediction to guide robot arm movements 用于实时视觉图像预测的对抗生成学习和定时路径优化，以指导机器人手臂运动

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-15 DOI: 10.1007/s11554-024-01526-5

Xin Li, Changhai Ru, Haonan Sun

{"title":"Adversarial generative learning and timed path optimization for real-time visual image prediction to guide robot arm movements","authors":"Xin Li, Changhai Ru, Haonan Sun","doi":"10.1007/s11554-024-01526-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01526-5","url":null,"abstract":"Real-time visual image prediction, crucial for directing robotic arm movements, represents a significant technique in artificial intelligence and robotics. The primary technical challenges involve the robot’s inaccurate perception and understanding of the environment, coupled with imprecise control of movements. This study proposes ForGAN-MCTS, a generative adversarial network-based action sequence prediction algorithm, aimed at refining visually guided rearrangement planning for movable objects. Initially, the algorithm unveils a scalable and robust strategy for rearrangement planning, capitalizing on the capabilities of a Monte Carlo Tree Search strategy. Secondly, to enable the robot’s successful execution of grasping maneuvers, the algorithm proposes a generative adversarial network-based real-time prediction method, employing a network trained solely on synthetic data for robust estimation of multi-object workspace states via a single uncalibrated RGB camera. The efficacy of the newly proposed algorithm is corroborated through extensive experiments conducted by using a UR-5 robotic arm. The experimental results demonstrate that the algorithm surpasses existing methods in terms of planning efficacy and processing speed. Additionally, the algorithm is robust to camera motion and can effectively mitigate the effects of external perturbations.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"7 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TinyCount: an efficient crowd counting network for intelligent surveillance TinyCount：用于智能监控的高效人群计数网络

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-13 DOI: 10.1007/s11554-024-01531-8

Hyeonbeen Lee, Jangho Lee

{"title":"TinyCount: an efficient crowd counting network for intelligent surveillance","authors":"Hyeonbeen Lee, Jangho Lee","doi":"10.1007/s11554-024-01531-8","DOIUrl":"https://doi.org/10.1007/s11554-024-01531-8","url":null,"abstract":"Crowd counting, the task of estimating the total number of people in an image, is essential for intelligent surveillance. Integrating a well-trained crowd counting network into edge devices, such as intelligent CCTV systems, enables its application across various domains, including the prevention of crowd collapses and urban planning. For a model to be embedded in edge devices, it requires robust performance, reduced parameter count, and faster response times. This study proposes a lightweight and powerful model called TinyCount, which has only 60k parameters. The proposed TinyCount is a fully convolutional network consisting of a feature extract module (FEM) for robust and rapid feature extraction, a scale perception module (SPM) for scale variation perception and an upsampling module (UM) that adjusts the feature map to the same size as the original image. TinyCount demonstrated competitive performance across three representative crowd counting datasets, despite utilizing approximately 3.33 to 271 times fewer parameters than other crowd counting approaches. The proposed model achieved relatively fast inference times by leveraging the MobileNetV2 architecture with dilated and transposed convolutions. The application of SEblock and findings from existing studies further proved its effectiveness. Finally, we evaluated the proposed TinyCount on multiple edge devices, including the Raspberry Pi 4, NVIDIA Jetson Nano, and NVIDIA Jetson AGX Xavier, to demonstrate its potential for practical applications.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"9 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic detection of defects in electronic plastic packaging using deep convolutional neural networks 利用深度卷积神经网络自动检测电子塑料包装中的缺陷

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-12 DOI: 10.1007/s11554-024-01534-5

Wanchun Ren, Pengcheng Zhu, Shaofeng Cai, Yi Huang, Haoran Zhao, Youji Hama, Zhu Yan, Tao Zhou, Junde Pu, Hongwei Yang

{"title":"Automatic detection of defects in electronic plastic packaging using deep convolutional neural networks","authors":"Wanchun Ren, Pengcheng Zhu, Shaofeng Cai, Yi Huang, Haoran Zhao, Youji Hama, Zhu Yan, Tao Zhou, Junde Pu, Hongwei Yang","doi":"10.1007/s11554-024-01534-5","DOIUrl":"https://doi.org/10.1007/s11554-024-01534-5","url":null,"abstract":"As the mainstream chip packaging technology, plastic-encapsulated chips (PEC) suffer from process defects such as delamination and voids, which seriously impact the chip's reliability. Therefore, it is urgent to detect defects promptly and accurately. However, the current manual detection methods cannot meet the application's requirements, as they are both inaccurate and inefficient. This study utilized the deep convolutional neural network (DCNN) technique to analyze PEC's scanning acoustic microscope (SAM) images and identify their internal defects. First, the SAM technology was used to collect and set up datasets of seven typical PEC defects. Then, according to the characteristics of densely packed PEC and an incredibly tiny size ratio in SAM, a PECNet network was established to detect PEC based on the traditional RetinaNet network, combining the CoTNet50 backbone network and the feature pyramid network structure. Furthermore, a PEDNet was designed to classify PEC defects based on the MobileNetV2 network, integrating cross-local connections and progressive classifiers. The experimental results demonstrated that the PECNet network's chip recognition accuracy reaches 98.6%, and its speed of a single image requires only nine milliseconds. Meanwhile, the PEDNet network's average defect classification accuracy is 97.8%, and the recognition speed of a single image is only 0.0021 s. This method provides a precise and efficient technique for defect detection in PEC.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"79 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time detection model of electrical work safety belt based on lightweight improved YOLOv5 基于轻量级改进型 YOLOv5 的电气工作安全带实时检测模型

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-10 DOI: 10.1007/s11554-024-01533-6

Li Liu, Kaiye Huang, Yuang Bai, Qifan Zhang, Yujian Li

{"title":"Real-time detection model of electrical work safety belt based on lightweight improved YOLOv5","authors":"Li Liu, Kaiye Huang, Yuang Bai, Qifan Zhang, Yujian Li","doi":"10.1007/s11554-024-01533-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01533-6","url":null,"abstract":"Aiming at the issue that the existing aerial work safety belt wearing detection model cannot meet the real-time operation on edge devices, this paper proposes a lightweight aerial work safety belt detection model with higher accuracy. First, the model is made lightweight by introducing Ghost convolution and model pruning. Second, for complex scenarios involving occlusion, color confusion, etc., the model’s performance is optimized by introducing the new up-sampling operator, the attention mechanism, and the feature fusion network. Lastly, the model is trained using knowledge distillation to compensate for accuracy loss resulting from the lightweight design, thereby maintain a higher accuracy. Experimental results based on the Guangdong Power Grid Intelligence Challenge safety belt wearable dataset show that, in the comparison experiments, the improved model, compared with the mainstream object detection algorithm YOU ONLY LOOK ONCE v5s (YOLOv5s), has only 8.7% of the parameters of the former with only 3.7% difference in the mean Average Precision (mAP.50) metrics and the speed is improved by 100.4%. Meanwhile, the ablation experiments show that the improved model’s parameter count is reduced by 66.9% compared with the original model, while mAP.50 decreases by only 1.9%. The overhead safety belt detection model proposed in this paper combines the model’s lightweight design, SimAM attention mechanism, Bidirectional Feature Pyramid Network feature fusion network, Carafe operator, and knowledge distillation training strategy, enabling the model to maintain lightweight and real-time performance while achieving high detection accuracy.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"7 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EV-TIFNet: lightweight binocular fusion network assisted by event camera time information for 3D human pose estimation EV-TIFNet：由事件相机时间信息辅助的轻量级双目融合网络，用于三维人体姿态估计

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-09 DOI: 10.1007/s11554-024-01528-3

Xin Zhao, Lianping Yang, Wencong Huang, Qi Wang, Xin Wang, Yantao Lou

{"title":"EV-TIFNet: lightweight binocular fusion network assisted by event camera time information for 3D human pose estimation","authors":"Xin Zhao, Lianping Yang, Wencong Huang, Qi Wang, Xin Wang, Yantao Lou","doi":"10.1007/s11554-024-01528-3","DOIUrl":"https://doi.org/10.1007/s11554-024-01528-3","url":null,"abstract":"Human pose estimation using RGB cameras often encounters performance degradation in challenging scenarios such as motion blur or suboptimal lighting. In comparison, event cameras, endowed with a wide dynamic range, microsecond-scale temporal resolution, minimal latency, and low power consumption, demonstrate remarkable adaptability in extreme visual environments. Nevertheless, the exploitation of event cameras for pose estimation in current research has not yet fully harnessed the potential of event-driven data, and enhancing model efficiency remains an ongoing pursuit. This work focuses on devising an efficient, compact pose estimation algorithm, with special attention on optimizing the fusion of multi-view event streams for improved pose prediction accuracy. We propose EV-TIFNet, a compact dual-view interactive network, which incorporates event frames along with our custom-designed Global Spatio-Temporal Feature Maps (GTF Maps). To enhance the network’s ability to understand motion characteristics and localize keypoints, we have tailored a dedicated Auxiliary Information Extraction Module (AIE Module) for the GTF Maps. Experimental results demonstrate that our model, with a compact parameter count of 0.55 million, achieves notable advancements on the DHP19 dataset, reducing the (hbox {MPJPE}_{3D}) to 61.45 mm. Building upon the sparsity of event data, the integration of sparse convolution operators replaces a significant portion of traditional convolutional layers, leading to a reduction in computational demand by 28.3%, totalling 8.71 GFLOPs. These design choices highlight the model’s suitability and efficiency in scenarios where computational resources are limited.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"85 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lightweight and privacy-preserving hierarchical federated learning mechanism for artificial intelligence-generated image content 针对人工智能生成的图像内容的轻量级和保护隐私的分层联合学习机制

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-08 DOI: 10.1007/s11554-024-01524-7

Bingquan Wang, Fangling Yang

{"title":"Lightweight and privacy-preserving hierarchical federated learning mechanism for artificial intelligence-generated image content","authors":"Bingquan Wang, Fangling Yang","doi":"10.1007/s11554-024-01524-7","DOIUrl":"https://doi.org/10.1007/s11554-024-01524-7","url":null,"abstract":"With the rapid development of artificial intelligence and Big Data, the application of artificial intelligence-generated image content (AIGIC) is becoming increasingly widespread in various fields. However, the image data utilized by AIGIC is diverse and often contains sensitive personal information, characterized by heterogeneity and privacy concerns. This leads to prolonged implementation times for image data privacy protection, and a high risk of unauthorized third-party access, resulting in serious privacy breaches and security risks. To address this issue, this paper combines Hierarchical Federated Learning (HFL) with Homomorphic Encryption to first address the encryption and transmission challenges in the image processing pipeline of AIGIC. Building upon this foundation, a novel HFL group collaborative training strategy is designed to further streamline the privacy protection process of AIGIC image data, effectively masking the heterogeneity of raw image data and achieving balanced allocation of computational resources. Additionally, a model compression algorithm based on pruning is introduced to alleviate the data transmission pressure in the image encryption process. Optimization of the homomorphic encryption modulo operations significantly reduces the computational burden, enabling real-time enhancement of image data privacy protection from multiple dimensions including computational and transmission resources. To verify the effectiveness of the proposed mechanism, extensive simulation verification of the lightweight privacy protection process for AIGIC image data was performed, and a comparative analysis of the time complexity of the mechanism was conducted. Experimental results indicate substantial advantages of the proposed algorithm over traditional real-time privacy protection algorithms in AIGIC.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"4 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

YOLO-LF: a lightweight multi-scale feature fusion algorithm for wheat spike detection YOLO-LF：用于小麦穗检测的轻量级多尺度特征融合算法

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-08 DOI: 10.1007/s11554-024-01529-2

Shuren Zhou, Shengzhen Long

{"title":"YOLO-LF: a lightweight multi-scale feature fusion algorithm for wheat spike detection","authors":"Shuren Zhou, Shengzhen Long","doi":"10.1007/s11554-024-01529-2","DOIUrl":"https://doi.org/10.1007/s11554-024-01529-2","url":null,"abstract":"Wheat is one of the most significant crops in China, as its yield directly affects the country’s food security. Due to its dense, overlapping, and relatively fuzzy distribution, wheat spikes are prone to being missed in practical detection. Existing object detection models suffer from large model size, high computational complexity, and long computation times. Consequently, this study proposes a lightweight real-time wheat spike detection model called YOLO-LF. Initially, a lightweight backbone network is improved to reduce the model size and lower the number of parameters, thereby improving the runtime speed. Second, the structure of the neck is redesigned in the context of the wheat spike dataset to enhance the feature extraction capability of the network for wheat spikes and to achieve lightweightness. Finally, a lightweight detection head was designed to significantly reduce the FLOPs of the model and achieve further lightweighting. Experimental results on the test set indicate that the size of our model is 1.7 MB, the number of parameters is 0.76 M, and the FLOPs are 2.9, which represent reductions of 73, 74, and 64% compared to YOLOv8n, respectively. Our model demonstrates a latency of 8.6 ms and an FPS of 115 on Titan X, whereas YOLOv8n has a latency of 10.2 ms and an FPS of 97 on the same hardware. In contrast, our model is more lightweight and faster to detect, while the mAP@0.5 only decreases by 0.9%, outperforming YOLOv8 and other mainstream detection networks in overall performance. Consequently, our model can be deployed on mobile devices to provide effective assistance in the real-time detection of wheat spikes.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"15 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WoodGLNet: a multi-scale network integrating global and local information for real-time classification of wood images WoodGLNet：整合全局和局部信息的多尺度网络，用于木材图像的实时分类

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-05 DOI: 10.1007/s11554-024-01521-w

Zhishuai Zheng, Zhedong Ge, Zhikang Tian, Xiaoxia Yang, Yucheng Zhou

{"title":"WoodGLNet: a multi-scale network integrating global and local information for real-time classification of wood images","authors":"Zhishuai Zheng, Zhedong Ge, Zhikang Tian, Xiaoxia Yang, Yucheng Zhou","doi":"10.1007/s11554-024-01521-w","DOIUrl":"https://doi.org/10.1007/s11554-024-01521-w","url":null,"abstract":"Current research on image classification has combined convolutional neural networks (CNNs) and transformers to introduce inductive biases to the model, enhancing its ability to handle long-range dependencies. However, these integrated models have limitations. Standard CNNs have a static nature, restricting their convolution from dynamically adjusting to input images, thus limiting feature expression capabilities. In addition, the static nature of CNNs impedes the seamless integration between features dynamically generated by self-attention mechanisms and static features generated by convolution when combined with transformers. Furthermore, during image processing, each model stage contains abundant information that cannot be fully utilized by single-scale convolution, ultimately impacting the network’s classification performance. To tackle these challenges, we propose WoodGLNet, a real-time multi-scale pyramid network that aggregates global and local information in an input-dependent manner and facilitates feature interaction through three scales of convolution. WoodGLNet utilizes efficient multi-scale global spatial decay attention modules and input-dependent multi-scale dynamic convolutions at different stages, enhancing the network’s inductive biases and expanding the effective receptive field. In CIFAR100 and CIFAR10 image classification tasks, WoodGLNet-T achieves Top-1 accuracies of 76.34% and 92.35%, respectively, outperforming EfficientNet-B3 by 1.03 and 0.86 percentage points. WoodGLNet-S and WoodGLNet-B attain Top-1 accuracies of 77.56%, 93.66%, and 80.12%, 94.27%, respectively. The experimental subjects of this study were sourced from the Shandong Province Construction Structural Material Specimen Museum, tasked with wood testing and requiring high real-time performance. To assess WoodGLNet’s real-time detection capabilities, 20 types of precious wood from the museum were identified in real time using the WoodGLNet network. The results indicated that WoodGLNet achieved a classification accuracy of up to 99.60%, with a recognition time of 0.013 s per single image. These findings demonstrate the network’s exceptional real-time classification and generalization abilities.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"24 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Railway rutting defects detection based on improved RT-DETR 基于改进型 RT-DETR 的铁路车辙缺陷检测

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-05 DOI: 10.1007/s11554-024-01530-9

Chenghai Yu, Xiangwei Chen

{"title":"Railway rutting defects detection based on improved RT-DETR","authors":"Chenghai Yu, Xiangwei Chen","doi":"10.1007/s11554-024-01530-9","DOIUrl":"https://doi.org/10.1007/s11554-024-01530-9","url":null,"abstract":"Railway turnouts are critical components of the rail track system, and their defects can lead to severe safety incidents and significant property damage. The irregular distribution and varying sizes of railway-turnout defects, combined with changing environmental lighting and complex backgrounds, pose challenges for traditional detection methods, often resulting in low accuracy and poor real-time performance. To address the issue of improving the detection performance of railway-turnout defects, this study proposes a high-precision recognition model, Faster-Hilo-BiFPN-DETR (FHB-DETR), based on the RT-DETR architecture. First, we designed the Faster CGLU module based on Faster Block, which optimizes the aggregation of local and global feature information through partial convolution and gating mechanisms. This approach reduces both computational load and parameter count while enhancing feature extraction capabilities. Second, we replaced the multi-head self-attention mechanism with Hilo attention, reducing parameter count and computational load, and improving real-time performance. In terms of feature fusion, we utilized BiFPN instead of CCFF to better capture subtle defect features and optimized the weighting of feature maps through a weighted mechanism. Experimental results show that compared to RT-DETR, FHB-DETR improved mAP50 by 3.5%, reduced parameter count by 25%, and decreased computational complexity by 6%, while maintaining a high frame rate, meeting real-time performance requirements.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"83 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Embedded planogram compliance control system 嵌入式平面图合规控制系统

IF 3 4区计算机科学

Journal of Real-Time Image Processing Pub Date : 2024-08-05 DOI: 10.1007/s11554-024-01525-6

Mehmet Erkin Yücel, Serkan Topaloğlu, Cem Ünsalan

{"title":"Embedded planogram compliance control system","authors":"Mehmet Erkin Yücel, Serkan Topaloğlu, Cem Ünsalan","doi":"10.1007/s11554-024-01525-6","DOIUrl":"https://doi.org/10.1007/s11554-024-01525-6","url":null,"abstract":"The retail sector presents several open and challenging problems that could benefit from advanced pattern recognition and computer vision techniques. One such critical challenge is planogram compliance control. In this study, we propose a complete embedded system to tackle this issue. Our system consists of four key components as image acquisition and transfer via stand-alone embedded camera module, object detection via computer vision and deep learning methods working on single-board computers, planogram compliance control method again working on single-board computers, and energy harvesting and power management block to accompany the embedded camera modules. The image acquisition and transfer block is implemented on the ESP-EYE camera module. The object detection block is based on YOLOv5 as the deep learning method and local feature extraction. We implement these methods on Raspberry Pi 4, NVIDIA Jetson Orin Nano, and NVIDIA Jetson AGX Orin as single-board computers. The planogram compliance control block utilizes sequence alignment through a modified Needleman–Wunsch algorithm. This block is also working along with the object detection block on the same single-board computers. The energy harvesting and power management block consists of solar and RF energy-harvesting modules with suitable battery pack for operation. We tested the proposed embedded planogram compliance control system on two different datasets to provide valuable insights on its strengths and weaknesses. The results show that the proposed method achieves F1 scores of 0.997 and 1.0 in object detection and planogram compliance control blocks, respectively. Furthermore, we calculated that the complete embedded system can work in stand-alone form up to 2 years based on battery. This duration can be further extended with the integration of the proposed solar and RF energy-harvesting options.","PeriodicalId":51224,"journal":{"name":"Journal of Real-Time Image Processing","volume":"58 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0