高速应用中先读后写的2D-DWT并行实现

M. Ashraf, M. S. Baig, L. A. Khan, A. Hassan
{"title":"高速应用中先读后写的2D-DWT并行实现","authors":"M. Ashraf, M. S. Baig, L. A. Khan, A. Hassan","doi":"10.1109/ICET.2007.4516355","DOIUrl":null,"url":null,"abstract":"This paper proposes an efficient implementation of multistage, multiple-level DSP algorithms suitable for parallel and distributed processing. To describe our method we selected Mallat's algorithm for two dimensional wavelet transforms (2D-DWT) coefficient computation which has multistage and multilevel processing requirements. We have selected field programmable gate arrays (FPGA) as a processing unit because of its inherited parallel processing capabilities but our method is not limited to FPGAs only. Our method directly computes 2D-DWT coefficients without computing and storing intermediate results; which makes it faster; resource saving and removes read after Write (RAW) dependencies. We discuss multistage and single level implementation but ideally it can be extended to n-level implementation. We also proposed method for generation of \"mutually scaled filter coefficients (MSFC)\" and computation of maximum number of parallel processors for optimized performance in this particular case. Both lookup tables (LUTs) and multipliers along with addition/subtraction architecture can be used. However, LUTs have advantage of high processing speed. Two computational stages are combined into a single stage to remove read after write (RAW) dependency. Our implementation takes N2/4+L2/4-2 time units to compute 2D- DWT of N x N input data, with filter length of L without intermediate storage. This method can be used for other multistage, multilevel DSP problems. Quartusreg II IDE and Altera Stratix device is used for implementation.","PeriodicalId":346773,"journal":{"name":"2007 International Conference on Emerging Technologies","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel Implementation of 2D-DWT by Purging Read after Write Dependency for High Speed Applications\",\"authors\":\"M. Ashraf, M. S. Baig, L. A. Khan, A. Hassan\",\"doi\":\"10.1109/ICET.2007.4516355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an efficient implementation of multistage, multiple-level DSP algorithms suitable for parallel and distributed processing. To describe our method we selected Mallat's algorithm for two dimensional wavelet transforms (2D-DWT) coefficient computation which has multistage and multilevel processing requirements. We have selected field programmable gate arrays (FPGA) as a processing unit because of its inherited parallel processing capabilities but our method is not limited to FPGAs only. Our method directly computes 2D-DWT coefficients without computing and storing intermediate results; which makes it faster; resource saving and removes read after Write (RAW) dependencies. We discuss multistage and single level implementation but ideally it can be extended to n-level implementation. We also proposed method for generation of \\\"mutually scaled filter coefficients (MSFC)\\\" and computation of maximum number of parallel processors for optimized performance in this particular case. Both lookup tables (LUTs) and multipliers along with addition/subtraction architecture can be used. However, LUTs have advantage of high processing speed. Two computational stages are combined into a single stage to remove read after write (RAW) dependency. Our implementation takes N2/4+L2/4-2 time units to compute 2D- DWT of N x N input data, with filter length of L without intermediate storage. This method can be used for other multistage, multilevel DSP problems. Quartusreg II IDE and Altera Stratix device is used for implementation.\",\"PeriodicalId\":346773,\"journal\":{\"name\":\"2007 International Conference on Emerging Technologies\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 International Conference on Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICET.2007.4516355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2007.4516355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种适用于并行和分布式处理的多阶段、多级DSP算法的高效实现方法。为了描述我们的方法,我们选择Mallat算法进行二维小波变换(2D-DWT)系数的计算,该算法具有多阶段、多层次的处理要求。我们选择现场可编程门阵列(FPGA)作为处理单元,因为它继承了并行处理能力,但我们的方法不仅限于FPGA。该方法直接计算2D-DWT系数,无需计算和存储中间结果;这使得它更快;节省资源,并消除对读写(RAW)的依赖。我们讨论了多阶段和单级实现,但理想情况下它可以扩展到n级实现。在这种特殊情况下,我们还提出了“相互缩放滤波器系数(MSFC)”的生成方法和并行处理器的最大数量的计算以优化性能。查找表(lut)和乘法器以及加/减体系结构都可以使用。然而,lut具有处理速度快的优点。两个计算阶段合并为一个阶段,以消除对写后读(RAW)的依赖。我们的实现需要N2/4+L2/4-2时间单位来计算N × N个输入数据的2D- DWT,滤波器长度为L,没有中间存储。该方法可用于其他多阶段、多层次的DSP问题。采用Quartusreg II IDE和Altera Stratix器件实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Parallel Implementation of 2D-DWT by Purging Read after Write Dependency for High Speed Applications
This paper proposes an efficient implementation of multistage, multiple-level DSP algorithms suitable for parallel and distributed processing. To describe our method we selected Mallat's algorithm for two dimensional wavelet transforms (2D-DWT) coefficient computation which has multistage and multilevel processing requirements. We have selected field programmable gate arrays (FPGA) as a processing unit because of its inherited parallel processing capabilities but our method is not limited to FPGAs only. Our method directly computes 2D-DWT coefficients without computing and storing intermediate results; which makes it faster; resource saving and removes read after Write (RAW) dependencies. We discuss multistage and single level implementation but ideally it can be extended to n-level implementation. We also proposed method for generation of "mutually scaled filter coefficients (MSFC)" and computation of maximum number of parallel processors for optimized performance in this particular case. Both lookup tables (LUTs) and multipliers along with addition/subtraction architecture can be used. However, LUTs have advantage of high processing speed. Two computational stages are combined into a single stage to remove read after write (RAW) dependency. Our implementation takes N2/4+L2/4-2 time units to compute 2D- DWT of N x N input data, with filter length of L without intermediate storage. This method can be used for other multistage, multilevel DSP problems. Quartusreg II IDE and Altera Stratix device is used for implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信