Journal of Parallel and Distributed Computing最新文献

筛选
英文 中文
DRViT: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on FPGA
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-28 DOI: 10.1016/j.jpdc.2025.105042
Xiangfeng Sun , Yuanting Zhang , Qinyu Wang , Xiaofeng Zou , Yujia Liu , Ziqian Zeng , Huiping Zhuang
{"title":"DRViT: A dynamic redundancy-aware vision transformer accelerator via algorithm and architecture co-design on FPGA","authors":"Xiangfeng Sun ,&nbsp;Yuanting Zhang ,&nbsp;Qinyu Wang ,&nbsp;Xiaofeng Zou ,&nbsp;Yujia Liu ,&nbsp;Ziqian Zeng ,&nbsp;Huiping Zhuang","doi":"10.1016/j.jpdc.2025.105042","DOIUrl":"10.1016/j.jpdc.2025.105042","url":null,"abstract":"<div><div>The multi-modal artificial intelligence (MAI) has attracted significant interest due to its capability to process and integrate data from multiple modalities, including images, text, and audio. Addressing MAI tasks in distributed systems necessitate robust and efficient architectures. The Transformer architecture has emerged as a primary network in this context. The integration of Vision Transformers (ViTs) within multimodal frameworks is crucial for enhancing the processing and comprehension of image data across diverse modalities. However, the complex architecture of ViTs and the extensive resources required for processing large-scale image data pose high computational and storage demands. These demands are particularly challenging for deploying ViTs on edge devices within distributed frameworks. To address this issue, we propose a novel dynamic redundancy-aware ViT accelerator based on parallel computing, termed DRViT. DRViT is supported by an algorithm and architecture co-design. We first propose a hardware-friendly lightweight algorithm featuring token merging, token pruning, and an INT8 quantization scheme. Then, we design a specialized architecture to support this algorithm, transforming the lightweight algorithm into significant latency and energy-efficiency improvements. Our design is implemented on the Xilinx Alveo U250, achieving an overall inference latency of 0.86 ms and 1.17 ms per image for ViT-tiny at 140 MHz and 100 MHz, respectively. The throughput can reach 1,380 GOP/s at peak, demonstrating superior performance compared to state-of-the-art accelerators, even at lower frequencies.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105042"},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latency-aware placement of stream processing operators in modern-day stream processing frameworks
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-27 DOI: 10.1016/j.jpdc.2025.105041
Raphael Ecker , Vasileios Karagiannis , Michael Sober , Stefan Schulte
{"title":"Latency-aware placement of stream processing operators in modern-day stream processing frameworks","authors":"Raphael Ecker ,&nbsp;Vasileios Karagiannis ,&nbsp;Michael Sober ,&nbsp;Stefan Schulte","doi":"10.1016/j.jpdc.2025.105041","DOIUrl":"10.1016/j.jpdc.2025.105041","url":null,"abstract":"<div><div>The rise of the Internet of Things has substantially increased the number of interconnected devices at the edge of the network. As a result, a large number of computations are now distributed in the compute continuum, spanning from the edge to the cloud, generating vast amounts of data. Stream processing is typically employed to process this data in near real-time due to its efficiency in handling continuous streams of information in a scalable manner. However, many stream processing approaches do not consider the underlying network devices of the compute continuum as candidate resources for processing data. Moreover, many existing works do not consider the incurred network latency of performing computations on multiple devices in a distributed way. To avoid this, we formulate an optimization problem for utilizing the complete compute continuum resources and design heuristics to solve this problem efficiently. Furthermore, we integrate our heuristics into Apache Storm and perform experiments that show latency- and throughput-related benefits compared to alternatives.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105041"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Skyward secure: Advancing drone data-sharing in 6G with decentralized dataspace and supported technologies
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-21 DOI: 10.1016/j.jpdc.2025.105040
Saeed Hamood Alsamhi , Sumit Srivastava , Mamoon Rashid , Amnnah Alhabeeb , Santosh Kumar , Navin Singh Rajput , Ammar Hawbani , Liang Zhao , Mohammed A.A. Al-qaness , Edward Curry
{"title":"Skyward secure: Advancing drone data-sharing in 6G with decentralized dataspace and supported technologies","authors":"Saeed Hamood Alsamhi ,&nbsp;Sumit Srivastava ,&nbsp;Mamoon Rashid ,&nbsp;Amnnah Alhabeeb ,&nbsp;Santosh Kumar ,&nbsp;Navin Singh Rajput ,&nbsp;Ammar Hawbani ,&nbsp;Liang Zhao ,&nbsp;Mohammed A.A. Al-qaness ,&nbsp;Edward Curry","doi":"10.1016/j.jpdc.2025.105040","DOIUrl":"10.1016/j.jpdc.2025.105040","url":null,"abstract":"<div><div>The capacity of Dataspace enables the distribution of heterogeneous data from several sources and domains and has attracted attention for resolving data integration challenges. Drone data sharing faces challenges such as protecting privacy and security, building trust and dependability, controlling latency and scalability, facilitating real-time data processing, and preserving the caliber of shared models. Therefore, sixth-generation (6G) networks provide high throughput and low latency to improve drone operations; security issues are exacerbated by the sensitive nature of shared data and the lack of centralized monitoring. To address the challenges, this paper presents a conceptual framework for a Dataspace in the Sky to enable secure and efficient drone data-sharing within 6G networks in the transition from Industry 4.0 to Industry 5.0. The Dataspace in the Sky integrates Federated Learning (FL), a decentralized Machine Learning (ML) approach that enhances security and privacy by sharing models instead of raw data, facilitating effective drone collaboration. However, the quality of shared local models often suffers due to inconsistent data contributions and unreliable recording mechanisms, which can undermine the performance of FL. To tackle the challenges, the framework employs blockchain (BC) to decentralize and secure the Dataspace, ensuring the integrity of contribution records and improving the reliability of shared models. Dataspace in the Sky empowered decentralized data sharing which addresses latency issues by decentralizing decision-making and enhances trust and reliability by leveraging immutable and transparent BC mechanisms. The robustness of Dataspace in the Sky solution is not only secures drone-sharing operations in 6G environments but enables the development of citizen-friendly mobility services, expanding opportunities across smart environments.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"199 ","pages":"Article 105040"},"PeriodicalIF":3.4,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FASNet: Federated adversarial Siamese networks for robust malware image classification
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-16 DOI: 10.1016/j.jpdc.2025.105039
Namrata Govind Ambekar , Sonali Samal , N. Nandini Devi , Surmila Thokchom
{"title":"FASNet: Federated adversarial Siamese networks for robust malware image classification","authors":"Namrata Govind Ambekar ,&nbsp;Sonali Samal ,&nbsp;N. Nandini Devi ,&nbsp;Surmila Thokchom","doi":"10.1016/j.jpdc.2025.105039","DOIUrl":"10.1016/j.jpdc.2025.105039","url":null,"abstract":"<div><div>Malware detection faces considerable challenges due to the ever-evolving and complex nature of cyber threats. Various deep learning models have demonstrated effectiveness in identifying malware within organizations. However, developing a reliable distributed malware detection model using diverse data from multiple sources faces significant challenges, which are worsened by privacy concerns, including data distribution issues and the absence of balanced datasets. This requires advanced data privacy techniques. To address this, the proposed FASNet approach makes the following key contributions: This study introduces FASNet, a novel privacy-centric distributed malware detection model designed to enhance detection accuracy and robustness. FASNet employs state-of-the-art Siamese networks as feature extractors and incorporates two significant advancements: federated learning and adversarial training. Federated learning, implemented with a client size of three, ensures that model training is conducted on individual devices, eliminating the need for centralized data collection and addressing data privacy concerns. This design also prevents data dilution and communication overhead while maintaining effective training on each device. Additionally, adversarial training utilizing the Fast Gradient Sign Method (FGSM) generates adversarial images to strengthen the model's resilience. By training on both original and adversarial malware images, FASNet improves its ability to accurately classify malware images that have been intentionally perturbed to mislead the system. Experimental results on the Blended dataset demonstrate the efficacy of the proposed FASNet approach, achieving notable performance with a testing accuracy of 0.9510, precision of 0.9417, recall of 0.9510, f1 score of 0.9384, Matthews Correlation Coefficient (MCC) of 0.9464, Jaccard Index (JI) of 0.9271 and Fowlkes-Mallows Index (FMI) of 0.9725. These experimental findings show that the proposed FASNet method effectively tackles two main challenges: privacy-centric malware detection and an imbalanced dataset.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105039"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed robust multitask clustering in wireless sensor networks using Multi-Factorial Evolutionary Algorithm
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-16 DOI: 10.1016/j.jpdc.2025.105038
Anita Panwar, Satyasai Jagannath Nanda
{"title":"Distributed robust multitask clustering in wireless sensor networks using Multi-Factorial Evolutionary Algorithm","authors":"Anita Panwar,&nbsp;Satyasai Jagannath Nanda","doi":"10.1016/j.jpdc.2025.105038","DOIUrl":"10.1016/j.jpdc.2025.105038","url":null,"abstract":"<div><div>When data collected at the local nodes of a wireless sensor network (WSN) are volumetric in nature, there is a need for local processing, then distributed clustering plays an important role. Traditional clustering algorithms based on K-means, K-medoid are not effective in these scenarios for accurate data segregation. Further, there is a requirement of techniques that can effectively handle outliers and noise present in the sensed data. Thus, there is a need to design robust distributed data clustering algorithms. Multi-Task Optimization (MTO) has taken the attention of researchers in the last couple of years after the introduction of Multi-Factorial Evolutionary Algorithm (MFEA). The MFEA can handle several single objective tasks usually related to one another and share implicit knowledge or abilities common to them. In this manuscript, the MFEA is employed to solve two tasks: 1) outlier detection and 2) perform distributed clustering at the nodes of WSN. The resultant algorithm, termed as Distributed MFEA (DMFEA), effectively removes noise and segregates data present at multiple nodes of WSN. Simulation study reveals the superior performance of DMFEA over benchmark algorithms like distributed versions of K-means, particle swarm optimization, and moth-flame optimization on two synthetic and six real-life datasets based on forest fire monitoring, air pollution indexing, Intel laboratory environment sensing, agriculture soil quality labeling, river water quality analysis, and land mine detection. The superior performance of DMFEA is demonstrated based on the Silhouette Index of obtained clusters and the percentage of outliers detected. Additionally, the DMFEA average rank in Kruskal Wallis test, is better over the three comparative algorithms.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105038"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GreediRIS: Scalable influence maximization using distributed streaming maximum cover
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-14 DOI: 10.1016/j.jpdc.2025.105037
Reet Barik , Wade Cappa , S.M. Ferdous , Marco Minutoli , Mahantesh Halappanavar , Ananth Kalyanaraman
{"title":"GreediRIS: Scalable influence maximization using distributed streaming maximum cover","authors":"Reet Barik ,&nbsp;Wade Cappa ,&nbsp;S.M. Ferdous ,&nbsp;Marco Minutoli ,&nbsp;Mahantesh Halappanavar ,&nbsp;Ananth Kalyanaraman","doi":"10.1016/j.jpdc.2025.105037","DOIUrl":"10.1016/j.jpdc.2025.105037","url":null,"abstract":"<div><div>Influence maximization—the problem of identifying a subset of <em>k</em> influential seeds (vertices) in a network—is a classical problem in network science with numerous applications. The problem is NP-hard, but there exist efficient polynomial time approximations. However, scaling these algorithms still remain a daunting task due to the complexities associated with steps involving stochastic sampling and large-scale aggregations. In this paper, we present a new parallel distributed approximation algorithm for influence maximization with provable approximation guarantees. Our approach, which we call <span>GreediRIS</span>, leverages the <span>RandGreedi</span> framework—a state-of-the-art approach for distributed submodular optimization—for solving a step that computes a maximum <em>k</em> cover. <span>GreediRIS</span> combines distributed and streaming models of computations, along with pruning techniques, to effectively address the communication bottlenecks of the algorithm. Experimental results on up to 512 nodes (32K cores) of the NERSC Perlmutter supercomputer show that <span>GreediRIS</span> can achieve good strong scaling performance, preserve quality, and significantly outperform the other state-of-the-art distributed implementations. For instance, on 512 nodes, the most performant variant of <span>GreediRIS</span> achieves geometric mean speedups of 28.99× and 36.35× for two different diffusion models, over a state-of-the-art parallel implementation. We also present a communication-optimized version of <span>GreediRIS</span> that further improves the speedups by two orders of magnitude.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105037"},"PeriodicalIF":3.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A parallel algorithm for minimum weight set cover with small neighborhood property
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-14 DOI: 10.1016/j.jpdc.2025.105034
Yingli Ran , Yaoyao Zhang , Zhao Zhang
{"title":"A parallel algorithm for minimum weight set cover with small neighborhood property","authors":"Yingli Ran ,&nbsp;Yaoyao Zhang ,&nbsp;Zhao Zhang","doi":"10.1016/j.jpdc.2025.105034","DOIUrl":"10.1016/j.jpdc.2025.105034","url":null,"abstract":"<div><div>This paper studies the minimum weight set cover (MinWSC) problem with a small neighborhood cover property (<em>τ</em>-SNC). A parallel algorithm is presented, obtaining approximation ratio <span><math><mi>τ</mi><mo>(</mo><mn>1</mn><mo>+</mo><mn>3</mn><mi>ε</mi><mo>)</mo></math></span> in <span><math><mi>O</mi><mo>(</mo><mi>L</mi><msub><mrow><mi>log</mi></mrow><mrow><mn>1</mn><mo>+</mo><mi>ε</mi></mrow></msub><mo>⁡</mo><mfrac><mrow><msup><mrow><mi>n</mi></mrow><mrow><mn>3</mn></mrow></msup></mrow><mrow><msup><mrow><mi>ε</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></mfrac><mo>+</mo><mn>4</mn><msup><mrow><mi>τ</mi></mrow><mrow><mn>3</mn></mrow></msup><msup><mrow><mn>2</mn></mrow><mrow><mi>τ</mi></mrow></msup><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>log</mi><mo>⁡</mo><mi>n</mi><mo>)</mo></math></span> rounds, where <span><math><mn>0</mn><mo>&lt;</mo><mi>ε</mi><mo>&lt;</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac></math></span> is a constant, <em>n</em> is the number of elements, and <em>L</em> is the depth of SNC-decomposition. Our results not only improve the approximation ratio obtained in <span><span>[2]</span></span>, but also answer two questions proposed in <span><span>[2]</span></span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105034"},"PeriodicalIF":3.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpecSeq++: A high parallel boundary matrix reduction to support real large-scale point clouds
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-14 DOI: 10.1016/j.jpdc.2025.105036
Quming Li , Zhibin Huang , Yiming Chen , Di Hu , Zhitao Dai , Min Yu , Zhou Liu
{"title":"SpecSeq++: A high parallel boundary matrix reduction to support real large-scale point clouds","authors":"Quming Li ,&nbsp;Zhibin Huang ,&nbsp;Yiming Chen ,&nbsp;Di Hu ,&nbsp;Zhitao Dai ,&nbsp;Min Yu ,&nbsp;Zhou Liu","doi":"10.1016/j.jpdc.2025.105036","DOIUrl":"10.1016/j.jpdc.2025.105036","url":null,"abstract":"<div><div>The boundary matrix serves as a crucial representation for computing the persistence diagrams, which is a typical topological data analysis method, and its reduction is the most central and time-consuming step. However, most of the current methods do not have a high degree of parallelism. Therefore, a fully GPU-parallelized boundary matrix reduction algorithm, denoted by SpecSeq++, is proposed. It introduces some novel methods, such as the high-dimension guided clearing theorem, the new method for pivot determination within blocks, and a novel dynamic block partition strategy to mitigate load balancing issues and the long-tail effect in intra-block parallel computation. Based on the experiments with three types of boundary matrices of different sizes and different complexes, the results show that SpecSeq++ has better performance, and in the best-case scenario, SpecSeq++ performs more than 700x better than phat-twist optimized with the dualization while its average GPU memory overhead is only twice that of the serial method. It provides strong support for the practical application of topological data analysis on real point cloud data. Codes are available at <span><span>https://github.com/BuptCIAGroup/SpecSeqPlusPlus</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105036"},"PeriodicalIF":3.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-01-03 DOI: 10.1016/S0743-7315(24)00194-1
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00194-1","DOIUrl":"10.1016/S0743-7315(24)00194-1","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"197 ","pages":"Article 105030"},"PeriodicalIF":3.4,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative hardware implementation of histogram of oriented gradients as a descriptor in embedded tracking of swarm robots
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2024-12-17 DOI: 10.1016/j.jpdc.2024.105026
Diego Legarda, Karen Pérez, Daniel M. Muñoz
{"title":"A comparative hardware implementation of histogram of oriented gradients as a descriptor in embedded tracking of swarm robots","authors":"Diego Legarda,&nbsp;Karen Pérez,&nbsp;Daniel M. Muñoz","doi":"10.1016/j.jpdc.2024.105026","DOIUrl":"10.1016/j.jpdc.2024.105026","url":null,"abstract":"<div><div>The Histogram of Oriented Gradients (HOG) algorithm is widely utilized in image processing for tasks such as detection, classification, and tracking. However, several challenges arise when implementing this algorithm on computing platforms with limited memory and low power consumption, such as mobile robots and drones. This work presents an in-depth analysis and implementation of three innovative hardware architectures for HOG, specifically designed for real-time processing using Field Programmable Gate Arrays (FPGAs) in the context of mobile robot localization. The primary focus of these architectures is to simplify the processing operations involved in gradient magnitude and orientation calculation, histogram generation, and normalization. These simplifications lead to a reduction in resource utilization and energy consumption. Experimental results conducted on a Zynq 7020 device demonstrated minimal relative error values throughout the entire process, along with a significant execution time improvement of over 1000 times when compared to the software-based solution.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"198 ","pages":"Article 105026"},"PeriodicalIF":3.4,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143129228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信