Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti
{"title":"Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks","authors":"Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti","doi":"10.1109/PDP59025.2023.00036","DOIUrl":null,"url":null,"abstract":"The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP59025.2023.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.