在 SIMD 架构上使用三种张量布局实现高性能 Im2win 和直接卷积

Xiang Fu, Xinpeng Zhang, Jixiang Ma, Peng Zhao, Shuai Lu, Xu T. Liu
{"title":"在 SIMD 架构上使用三种张量布局实现高性能 Im2win 和直接卷积","authors":"Xiang Fu, Xinpeng Zhang, Jixiang Ma, Peng Zhao, Shuai Lu, Xu T. Liu","doi":"arxiv-2408.00278","DOIUrl":null,"url":null,"abstract":"Convolution is the core component within deep neural networks and it is\ncomputationally intensive and time consuming. Tensor data layouts significantly\nimpact convolution operations in terms of memory access and computational\nefficiency. Yet, there is still a lack of comprehensive performance\ncharacterization on data layouts on SIMD architectures concerning convolution\nmethods. This paper proposes three novel data layouts for im2win convolution:\nNHWC, CHWN, and CHWN8, and introduces a set of general optimization techniques\nfor both direct and im2win convolutions. We compare the optimized im2win\nconvolution with the direct convolution and PyTorch's im2col-based convolution\nacross the aforementioned layouts on SIMD machines. The experiments\ndemonstrated that the im2win convolution with the new NHWC layout achieved up\nto 355% performance speedup over NCHW layout. Our optimizations also\nsignificantly improve the performance of both im2win and direct convolutions.\nOur optimized im2win and direct convolutions achieved up to 95% and 94% of\nmachine's theoretical peak performance, respectively.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"98 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures\",\"authors\":\"Xiang Fu, Xinpeng Zhang, Jixiang Ma, Peng Zhao, Shuai Lu, Xu T. Liu\",\"doi\":\"arxiv-2408.00278\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolution is the core component within deep neural networks and it is\\ncomputationally intensive and time consuming. Tensor data layouts significantly\\nimpact convolution operations in terms of memory access and computational\\nefficiency. Yet, there is still a lack of comprehensive performance\\ncharacterization on data layouts on SIMD architectures concerning convolution\\nmethods. This paper proposes three novel data layouts for im2win convolution:\\nNHWC, CHWN, and CHWN8, and introduces a set of general optimization techniques\\nfor both direct and im2win convolutions. We compare the optimized im2win\\nconvolution with the direct convolution and PyTorch's im2col-based convolution\\nacross the aforementioned layouts on SIMD machines. The experiments\\ndemonstrated that the im2win convolution with the new NHWC layout achieved up\\nto 355% performance speedup over NCHW layout. Our optimizations also\\nsignificantly improve the performance of both im2win and direct convolutions.\\nOur optimized im2win and direct convolutions achieved up to 95% and 94% of\\nmachine's theoretical peak performance, respectively.\",\"PeriodicalId\":501347,\"journal\":{\"name\":\"arXiv - CS - Neural and Evolutionary Computing\",\"volume\":\"98 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Neural and Evolutionary Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.00278\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

卷积是深度神经网络的核心组件,其计算密集且耗时。张量数据布局在内存访问和计算效率方面极大地影响了卷积操作。然而,关于卷积方法的 SIMD 架构上的数据布局,仍然缺乏全面的性能描述。本文提出了三种新颖的 im2win 卷积数据布局:NHWC、CHWN 和 CHWN8,并介绍了一套针对直接卷积和 im2win 卷积的通用优化技术。我们比较了经过优化的 im2win 卷积与直接卷积以及 PyTorch 基于 im2col 的卷积在 SIMD 机器上的上述布局。实验证明,采用新的 NHWC 布局的 im2win 卷积比 NCHW 布局的性能提高了 355%。我们的优化还显著提高了im2win卷积和直接卷积的性能,优化后的im2win卷积和直接卷积分别达到了机器理论峰值性能的95%和94%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures
Convolution is the core component within deep neural networks and it is computationally intensive and time consuming. Tensor data layouts significantly impact convolution operations in terms of memory access and computational efficiency. Yet, there is still a lack of comprehensive performance characterization on data layouts on SIMD architectures concerning convolution methods. This paper proposes three novel data layouts for im2win convolution: NHWC, CHWN, and CHWN8, and introduces a set of general optimization techniques for both direct and im2win convolutions. We compare the optimized im2win convolution with the direct convolution and PyTorch's im2col-based convolution across the aforementioned layouts on SIMD machines. The experiments demonstrated that the im2win convolution with the new NHWC layout achieved up to 355% performance speedup over NCHW layout. Our optimizations also significantly improve the performance of both im2win and direct convolutions. Our optimized im2win and direct convolutions achieved up to 95% and 94% of machine's theoretical peak performance, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信