Jun-Seok Park, Changsoo Park, S. Kwon, Hyeong-Seok Kim, Taeho Jeon, Yesung Kang, Heonsoo Lee, Dongwoo Lee, James Kim, YoungJong Lee, Sangkyu Park, Jun-Woo Jang, Sanghyuck Ha, MinSeong Kim, Jihoon Bang, Sukhwan Lim, Inyup Kang
{"title":"A Multi-Mode 8K-MAC HW-Utilization-Aware Neural Processing Unit with a Unified Multi-Precision Datapath in 4nm Flagship Mobile SoC","authors":"Jun-Seok Park, Changsoo Park, S. Kwon, Hyeong-Seok Kim, Taeho Jeon, Yesung Kang, Heonsoo Lee, Dongwoo Lee, James Kim, YoungJong Lee, Sangkyu Park, Jun-Woo Jang, Sanghyuck Ha, MinSeong Kim, Jihoon Bang, Sukhwan Lim, Inyup Kang","doi":"10.1109/ISSCC42614.2022.9731639","DOIUrl":null,"url":null,"abstract":"Recent work on neural-network accelerators has focused on obtaining high performance in order to meet the needs of real-time applications with vastly different performance requirements, including high precision computation, efficiency for various Deep Learning (DL) layer types, and extremely low power to run always-on applications. Applying a single mode or datatype uniformly across these different scenarios would be less efficient than using different operating modes according to different operating scenarios. For example, super-resolution typically requires FP16 precision for higher image quality, while NNs for face-detection need only INT4 or INT8 precision. Using higher precision than INT8 for face detection would result in higher power consumption. A highly programmable NPU capable of covering the diverse workloads observed in the real world is therefore desired. In this paper, we present a neural processing unit (NPU) optimized with the following features: i) reconfigurable data prefetching and operational flow for high compute utilization, ii) multi-precision MACs supporting INT4,8,16, and float16, iii) a dynamic operation mode to cover extremely low-power or low-latency requirements. These features provide the flexibility needed by real world applications within the power constraints of various product domains.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"9 1","pages":"246-248"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC42614.2022.9731639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Recent work on neural-network accelerators has focused on obtaining high performance in order to meet the needs of real-time applications with vastly different performance requirements, including high precision computation, efficiency for various Deep Learning (DL) layer types, and extremely low power to run always-on applications. Applying a single mode or datatype uniformly across these different scenarios would be less efficient than using different operating modes according to different operating scenarios. For example, super-resolution typically requires FP16 precision for higher image quality, while NNs for face-detection need only INT4 or INT8 precision. Using higher precision than INT8 for face detection would result in higher power consumption. A highly programmable NPU capable of covering the diverse workloads observed in the real world is therefore desired. In this paper, we present a neural processing unit (NPU) optimized with the following features: i) reconfigurable data prefetching and operational flow for high compute utilization, ii) multi-precision MACs supporting INT4,8,16, and float16, iii) a dynamic operation mode to cover extremely low-power or low-latency requirements. These features provide the flexibility needed by real world applications within the power constraints of various product domains.