Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti
{"title":"卷积神经网络体系结构感知优化技术的评价","authors":"Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti","doi":"10.1109/PDP59025.2023.00036","DOIUrl":null,"url":null,"abstract":"The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks\",\"authors\":\"Raúl Marichal, Guillermo Toyos, Ernesto Dufrechu, P. Ezzatti\",\"doi\":\"10.1109/PDP59025.2023.00036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.\",\"PeriodicalId\":153500,\"journal\":{\"name\":\"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP59025.2023.00036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP59025.2023.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks
The growing need to perform Neural network inference with low latency is giving place to a broad spectrum of heterogeneous devices with deep learning capabilities. Therefore, obtaining the best performance from each device and choosing the most suitable platform for a given problem has become challenging. This paper evaluates multiple inference platforms using architecture-aware optimizations for convolutional neural networks. Specifically, we use TensorRT and OpenVINO frameworks for hardware optimizations on top of the platform-aware NetAdapt algorithm. The experimental evaluation shows that on MobileNet and AlexNet, using NetAdapt with TensorRT or Open-VINO can improve latency up to 10 x and 5.3 x, respectively. Moreover, a throughput test using different batch sizes showed variable performance improvement on the different devices. Discussing the experimental results can guide the selection of devices and optimizations for different AI solutions.