Imlijungla Longchar, Amey Varhade, Chetan Ingle, Saurabh Baranwal, H. Kapoor
{"title":"CluSpa:利用聚类和稀疏性减少CNN推理的计算量","authors":"Imlijungla Longchar, Amey Varhade, Chetan Ingle, Saurabh Baranwal, H. Kapoor","doi":"10.1145/3564121.3564132","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) have grown in popularity and usage tremendously over the last few years, spanning across different task such as computer vision tasks, natural language processing, video recognition, and recommender systems. Despite the algorithmic advancements that drove the growth of CNN still has considerable computational and memory overhead that poses challenges in achieving real-time performance. Each input image requires millions to even billions of elementary arithmetic operations before the network obtains the result. In CNNs, convolutional and pooling layers are followed by activation layers involving various activation functions. Hence, a lot of work has been done to reduce these costs in the last few years. Numerous optimizations have addressed at both hardware and software levels. In this paper, we propose a software-based solution for improving the performance of inference of networks. We suggest a technique for the approximate computation of the convolution operation based on clustering and sharing of weights. We have utilized Gaussian Mixture Models for clustering. We exploit weight sparsity to further reduce computations on top of the clustering method. We were able to achieve a considerable reduction in the MAC operations and the overall computation speedup on popular CNN architectures","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CluSpa: Computation Reduction in CNN Inference by exploiting Clustering and Sparsity\",\"authors\":\"Imlijungla Longchar, Amey Varhade, Chetan Ingle, Saurabh Baranwal, H. Kapoor\",\"doi\":\"10.1145/3564121.3564132\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNNs) have grown in popularity and usage tremendously over the last few years, spanning across different task such as computer vision tasks, natural language processing, video recognition, and recommender systems. Despite the algorithmic advancements that drove the growth of CNN still has considerable computational and memory overhead that poses challenges in achieving real-time performance. Each input image requires millions to even billions of elementary arithmetic operations before the network obtains the result. In CNNs, convolutional and pooling layers are followed by activation layers involving various activation functions. Hence, a lot of work has been done to reduce these costs in the last few years. Numerous optimizations have addressed at both hardware and software levels. In this paper, we propose a software-based solution for improving the performance of inference of networks. We suggest a technique for the approximate computation of the convolution operation based on clustering and sharing of weights. We have utilized Gaussian Mixture Models for clustering. We exploit weight sparsity to further reduce computations on top of the clustering method. We were able to achieve a considerable reduction in the MAC operations and the overall computation speedup on popular CNN architectures\",\"PeriodicalId\":166150,\"journal\":{\"name\":\"Proceedings of the Second International Conference on AI-ML Systems\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Second International Conference on AI-ML Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3564121.3564132\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564121.3564132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CluSpa: Computation Reduction in CNN Inference by exploiting Clustering and Sparsity
Convolutional Neural Networks (CNNs) have grown in popularity and usage tremendously over the last few years, spanning across different task such as computer vision tasks, natural language processing, video recognition, and recommender systems. Despite the algorithmic advancements that drove the growth of CNN still has considerable computational and memory overhead that poses challenges in achieving real-time performance. Each input image requires millions to even billions of elementary arithmetic operations before the network obtains the result. In CNNs, convolutional and pooling layers are followed by activation layers involving various activation functions. Hence, a lot of work has been done to reduce these costs in the last few years. Numerous optimizations have addressed at both hardware and software levels. In this paper, we propose a software-based solution for improving the performance of inference of networks. We suggest a technique for the approximate computation of the convolution operation based on clustering and sharing of weights. We have utilized Gaussian Mixture Models for clustering. We exploit weight sparsity to further reduce computations on top of the clustering method. We were able to achieve a considerable reduction in the MAC operations and the overall computation speedup on popular CNN architectures