探索边缘AI智慧城市应用的GPU共享技术

IF 1.6 4区计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

ETRI Journal Pub Date : 2025-09-30 DOI:10.4218/etrij.2025-0065

Sooyeon Woo, Jihwan Yeo, Jinhong Kim, Kyungwoon Lee

{"title":"探索边缘AI智慧城市应用的GPU共享技术","authors":"Sooyeon Woo, Jihwan Yeo, Jinhong Kim, Kyungwoon Lee","doi":"10.4218/etrij.2025-0065","DOIUrl":null,"url":null,"abstract":"<p>The growing adoption of edge AI in smart city applications such as traffic management, surveillance, and environmental monitoring necessitates efficient computational strategies to satisfy the requirements for low latency and high accuracy. This study investigated GPU sharing techniques to improve resource utilization and throughput when running multiple AI applications simultaneously on edge devices. Using the NVIDIA Jetson AGX Orin platform and object detection workloads with the YOLOv8 model, we explored the performance tradeoffs of the threading and multiprocessing approaches. Our findings reveal distinct advantages and limitations. Threading minimizes memory usage by sharing CUDA contexts, whereas multiprocessing achieves higher GPU utilization and shorter inference times by leveraging independent CUDA contexts. However, scalability challenges arise from resource contention and synchronization overheads. This study provides insights into optimizing GPU sharing for edge AI applications, highlighting key tradeoffs and opportunities for enhancing performance in resource-constrained environments.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"47 5","pages":"855-864"},"PeriodicalIF":1.6000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2025-0065","citationCount":"0","resultStr":"{\"title\":\"Exploring GPU sharing techniques for edge AI smart city applications\",\"authors\":\"Sooyeon Woo, Jihwan Yeo, Jinhong Kim, Kyungwoon Lee\",\"doi\":\"10.4218/etrij.2025-0065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The growing adoption of edge AI in smart city applications such as traffic management, surveillance, and environmental monitoring necessitates efficient computational strategies to satisfy the requirements for low latency and high accuracy. This study investigated GPU sharing techniques to improve resource utilization and throughput when running multiple AI applications simultaneously on edge devices. Using the NVIDIA Jetson AGX Orin platform and object detection workloads with the YOLOv8 model, we explored the performance tradeoffs of the threading and multiprocessing approaches. Our findings reveal distinct advantages and limitations. Threading minimizes memory usage by sharing CUDA contexts, whereas multiprocessing achieves higher GPU utilization and shorter inference times by leveraging independent CUDA contexts. However, scalability challenges arise from resource contention and synchronization overheads. This study provides insights into optimizing GPU sharing for edge AI applications, highlighting key tradeoffs and opportunities for enhancing performance in resource-constrained environments.</p>\",\"PeriodicalId\":11901,\"journal\":{\"name\":\"ETRI Journal\",\"volume\":\"47 5\",\"pages\":\"855-864\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2025-0065\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ETRI Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2025-0065\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETRI Journal","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2025-0065","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

随着交通管理、监控和环境监测等智能城市应用越来越多地采用边缘人工智能，需要高效的计算策略来满足低延迟和高精度的要求。本研究研究了GPU共享技术，以提高在边缘设备上同时运行多个人工智能应用程序时的资源利用率和吞吐量。使用NVIDIA Jetson AGX Orin平台和YOLOv8模型的对象检测工作负载，我们探索了线程和多处理方法的性能权衡。我们的发现揭示了明显的优势和局限性。线程通过共享CUDA上下文最小化内存使用，而多处理通过利用独立的CUDA上下文实现更高的GPU利用率和更短的推理时间。然而，可伸缩性挑战来自资源争用和同步开销。本研究为优化边缘人工智能应用的GPU共享提供了见解，突出了在资源受限环境中提高性能的关键权衡和机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Exploring GPU sharing techniques for edge AI smart city applications

查看原文本刊更多论文

Exploring GPU sharing techniques for edge AI smart city applications

The growing adoption of edge AI in smart city applications such as traffic management, surveillance, and environmental monitoring necessitates efficient computational strategies to satisfy the requirements for low latency and high accuracy. This study investigated GPU sharing techniques to improve resource utilization and throughput when running multiple AI applications simultaneously on edge devices. Using the NVIDIA Jetson AGX Orin platform and object detection workloads with the YOLOv8 model, we explored the performance tradeoffs of the threading and multiprocessing approaches. Our findings reveal distinct advantages and limitations. Threading minimizes memory usage by sharing CUDA contexts, whereas multiprocessing achieves higher GPU utilization and shorter inference times by leveraging independent CUDA contexts. However, scalability challenges arise from resource contention and synchronization overheads. This study provides insights into optimizing GPU sharing for edge AI applications, highlighting key tradeoffs and opportunities for enhancing performance in resource-constrained environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ETRI Journal 工程技术-电信学

CiteScore

4.00

自引率

7.10%

发文量

审稿时长

6.9 months

期刊介绍： ETRI Journal is an international, peer-reviewed multidisciplinary journal published bimonthly in English. The main focus of the journal is to provide an open forum to exchange innovative ideas and technology in the fields of information, telecommunications, and electronics. Key topics of interest include high-performance computing, big data analytics, cloud computing, multimedia technology, communication networks and services, wireless communications and mobile computing, material and component technology, as well as security. With an international editorial committee and experts from around the world as reviewers, ETRI Journal publishes high-quality research papers on the latest and best developments from the global community.