{"title":"Speeding up Machine Learning Inference on Edge Devices by Improving Memory Access Patterns using Coroutines","authors":"Bruce Belson, B. Philippa","doi":"10.1109/CSE57773.2022.00011","DOIUrl":null,"url":null,"abstract":"We demonstrate a novel method of speeding up large iterative tasks such as machine learning inference. Our approach is to improve the memory access pattern, taking advantage of coroutines as a programming language feature to minimise the developer effort and reduce code complexity. We evaluate our approach using a comprehensive set of bench-marks run on three hardware platforms (one ARM and two Intel CPUs). The best observed performance boosts were 65% for scanning the nodes in a B+ tree, 34% for support vector machine inference, 12% for image pixel normalisation, and 15.5% for two dimensional convolution. Performance varied with data size, numeric type, and other factors, but overall the method is practical and can lead to significant improvements for edge computing.","PeriodicalId":165085,"journal":{"name":"2022 IEEE 25th International Conference on Computational Science and Engineering (CSE)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 25th International Conference on Computational Science and Engineering (CSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSE57773.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We demonstrate a novel method of speeding up large iterative tasks such as machine learning inference. Our approach is to improve the memory access pattern, taking advantage of coroutines as a programming language feature to minimise the developer effort and reduce code complexity. We evaluate our approach using a comprehensive set of bench-marks run on three hardware platforms (one ARM and two Intel CPUs). The best observed performance boosts were 65% for scanning the nodes in a B+ tree, 34% for support vector machine inference, 12% for image pixel normalisation, and 15.5% for two dimensional convolution. Performance varied with data size, numeric type, and other factors, but overall the method is practical and can lead to significant improvements for edge computing.