一种用于运动估计的高效PIM(内存处理器)体系结构

Jung-Yup Kang, S. Gupta, Saurabh Shah, J. Gaudiot
{"title":"一种用于运动估计的高效PIM(内存处理器)体系结构","authors":"Jung-Yup Kang, S. Gupta, Saurabh Shah, J. Gaudiot","doi":"10.1109/ASAP.2003.1212852","DOIUrl":null,"url":null,"abstract":"Motion estimation is the most time consuming stage of MPEG family encodings and it reportedly absorbs up to 90% of the total execution time of MPEG processing. Therefore, we propose a hardware/software co-design paradigm that uses a PIM module to efficiently execute motion estimation operations. We use a PIM module to reduce the memory access penalty caused by a large number of memory accesses. We segment the PIM module into small pieces so that each smaller PIM module can execute the operations in parallel fashion. However, in order to execute the operations in parallel, there are critical overheads that involve replicating a huge amount of data to many of these smaller PIM modules. Not only do these replications require a huge amount of additional memory accesses but also calculations when generating addresses. Therefore, we also present an efficient data distribution mechanism to effectively support parallel executions among these smaller PIM modules. With our paradigm, the host processor can be relieved from computationally-intensive and data-intensive workloads of motion estimation. We observed up to 2034/spl times/ improvement in reduction of the number of memory accesses and up to 439/spl times/ performance improvement for the execution of motion estimation operations when using our computing paradigm.","PeriodicalId":261592,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"An efficient PIM (processor-in-memory) architecture for motion estimation\",\"authors\":\"Jung-Yup Kang, S. Gupta, Saurabh Shah, J. Gaudiot\",\"doi\":\"10.1109/ASAP.2003.1212852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motion estimation is the most time consuming stage of MPEG family encodings and it reportedly absorbs up to 90% of the total execution time of MPEG processing. Therefore, we propose a hardware/software co-design paradigm that uses a PIM module to efficiently execute motion estimation operations. We use a PIM module to reduce the memory access penalty caused by a large number of memory accesses. We segment the PIM module into small pieces so that each smaller PIM module can execute the operations in parallel fashion. However, in order to execute the operations in parallel, there are critical overheads that involve replicating a huge amount of data to many of these smaller PIM modules. Not only do these replications require a huge amount of additional memory accesses but also calculations when generating addresses. Therefore, we also present an efficient data distribution mechanism to effectively support parallel executions among these smaller PIM modules. With our paradigm, the host processor can be relieved from computationally-intensive and data-intensive workloads of motion estimation. We observed up to 2034/spl times/ improvement in reduction of the number of memory accesses and up to 439/spl times/ performance improvement for the execution of motion estimation operations when using our computing paradigm.\",\"PeriodicalId\":261592,\"journal\":{\"name\":\"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP.2003.1212852\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2003.1212852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

运动估计是MPEG家族编码中最耗时的阶段,据报道它占用了MPEG处理总执行时间的90%。因此,我们提出了一种硬件/软件协同设计范式,该范式使用PIM模块有效地执行运动估计操作。我们使用PIM模块来减少由于大量内存访问造成的内存访问损失。我们将PIM模块分割成小块,以便每个较小的PIM模块能够以并行方式执行操作。但是,为了并行执行这些操作,需要将大量数据复制到许多这些较小的PIM模块中,这是非常重要的开销。这些复制不仅需要大量额外的内存访问,而且在生成地址时还需要进行计算。因此,我们还提出了一种有效的数据分发机制,以有效地支持这些较小的PIM模块之间的并行执行。使用我们的范例,主机处理器可以从运动估计的计算密集型和数据密集型工作负载中解脱出来。当使用我们的计算范式时,我们观察到在减少内存访问次数方面提高了2034/spl次/,在执行运动估计操作时提高了439/spl次/性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An efficient PIM (processor-in-memory) architecture for motion estimation
Motion estimation is the most time consuming stage of MPEG family encodings and it reportedly absorbs up to 90% of the total execution time of MPEG processing. Therefore, we propose a hardware/software co-design paradigm that uses a PIM module to efficiently execute motion estimation operations. We use a PIM module to reduce the memory access penalty caused by a large number of memory accesses. We segment the PIM module into small pieces so that each smaller PIM module can execute the operations in parallel fashion. However, in order to execute the operations in parallel, there are critical overheads that involve replicating a huge amount of data to many of these smaller PIM modules. Not only do these replications require a huge amount of additional memory accesses but also calculations when generating addresses. Therefore, we also present an efficient data distribution mechanism to effectively support parallel executions among these smaller PIM modules. With our paradigm, the host processor can be relieved from computationally-intensive and data-intensive workloads of motion estimation. We observed up to 2034/spl times/ improvement in reduction of the number of memory accesses and up to 439/spl times/ performance improvement for the execution of motion estimation operations when using our computing paradigm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信