{"title":"通过使用协程改进内存访问模式加速边缘设备上的机器学习推理","authors":"Bruce Belson, B. Philippa","doi":"10.1109/CSE57773.2022.00011","DOIUrl":null,"url":null,"abstract":"We demonstrate a novel method of speeding up large iterative tasks such as machine learning inference. Our approach is to improve the memory access pattern, taking advantage of coroutines as a programming language feature to minimise the developer effort and reduce code complexity. We evaluate our approach using a comprehensive set of bench-marks run on three hardware platforms (one ARM and two Intel CPUs). The best observed performance boosts were 65% for scanning the nodes in a B+ tree, 34% for support vector machine inference, 12% for image pixel normalisation, and 15.5% for two dimensional convolution. Performance varied with data size, numeric type, and other factors, but overall the method is practical and can lead to significant improvements for edge computing.","PeriodicalId":165085,"journal":{"name":"2022 IEEE 25th International Conference on Computational Science and Engineering (CSE)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speeding up Machine Learning Inference on Edge Devices by Improving Memory Access Patterns using Coroutines\",\"authors\":\"Bruce Belson, B. Philippa\",\"doi\":\"10.1109/CSE57773.2022.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We demonstrate a novel method of speeding up large iterative tasks such as machine learning inference. Our approach is to improve the memory access pattern, taking advantage of coroutines as a programming language feature to minimise the developer effort and reduce code complexity. We evaluate our approach using a comprehensive set of bench-marks run on three hardware platforms (one ARM and two Intel CPUs). The best observed performance boosts were 65% for scanning the nodes in a B+ tree, 34% for support vector machine inference, 12% for image pixel normalisation, and 15.5% for two dimensional convolution. Performance varied with data size, numeric type, and other factors, but overall the method is practical and can lead to significant improvements for edge computing.\",\"PeriodicalId\":165085,\"journal\":{\"name\":\"2022 IEEE 25th International Conference on Computational Science and Engineering (CSE)\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 25th International Conference on Computational Science and Engineering (CSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSE57773.2022.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 25th International Conference on Computational Science and Engineering (CSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSE57773.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speeding up Machine Learning Inference on Edge Devices by Improving Memory Access Patterns using Coroutines
We demonstrate a novel method of speeding up large iterative tasks such as machine learning inference. Our approach is to improve the memory access pattern, taking advantage of coroutines as a programming language feature to minimise the developer effort and reduce code complexity. We evaluate our approach using a comprehensive set of bench-marks run on three hardware platforms (one ARM and two Intel CPUs). The best observed performance boosts were 65% for scanning the nodes in a B+ tree, 34% for support vector machine inference, 12% for image pixel normalisation, and 15.5% for two dimensional convolution. Performance varied with data size, numeric type, and other factors, but overall the method is practical and can lead to significant improvements for edge computing.