Meruyert Karzhaubayeva;Aidar Amangeldi;Jurn-Gyu Park
{"title":"CNN Workloads Characterization and Integrated CPU–GPU DVFS Governors on Embedded Systems","authors":"Meruyert Karzhaubayeva;Aidar Amangeldi;Jurn-Gyu Park","doi":"10.1109/LES.2023.3299335","DOIUrl":null,"url":null,"abstract":"Dynamic power management (DPM) techniques on mobile systems are indispensable for deep learning (DL) inference optimization, which is mainly performed on battery-based mobile and/or embedded platforms with constrained resources. To this end, we characterize CNN workloads using object detection applications of YOLOv4/-tiny and YOLOv3/-tiny, and then propose integrated CPU–GPU DVFS governor policies that scale integrated pairs of CPU and GPU frequencies to improve energy–delay product (EDP) with negligible inference execution time degradation. Our results show up to 16.7% EDP improvements with negligible (mostly less than 2%) performance degradation using object detection applications on NVIDIA Jetson TX2.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"15 4","pages":"202-205"},"PeriodicalIF":1.7000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10261988/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Dynamic power management (DPM) techniques on mobile systems are indispensable for deep learning (DL) inference optimization, which is mainly performed on battery-based mobile and/or embedded platforms with constrained resources. To this end, we characterize CNN workloads using object detection applications of YOLOv4/-tiny and YOLOv3/-tiny, and then propose integrated CPU–GPU DVFS governor policies that scale integrated pairs of CPU and GPU frequencies to improve energy–delay product (EDP) with negligible inference execution time degradation. Our results show up to 16.7% EDP improvements with negligible (mostly less than 2%) performance degradation using object detection applications on NVIDIA Jetson TX2.
期刊介绍:
The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.