{"title":"高速应用中先读后写的2D-DWT并行实现","authors":"M. Ashraf, M. S. Baig, L. A. Khan, A. Hassan","doi":"10.1109/ICET.2007.4516355","DOIUrl":null,"url":null,"abstract":"This paper proposes an efficient implementation of multistage, multiple-level DSP algorithms suitable for parallel and distributed processing. To describe our method we selected Mallat's algorithm for two dimensional wavelet transforms (2D-DWT) coefficient computation which has multistage and multilevel processing requirements. We have selected field programmable gate arrays (FPGA) as a processing unit because of its inherited parallel processing capabilities but our method is not limited to FPGAs only. Our method directly computes 2D-DWT coefficients without computing and storing intermediate results; which makes it faster; resource saving and removes read after Write (RAW) dependencies. We discuss multistage and single level implementation but ideally it can be extended to n-level implementation. We also proposed method for generation of \"mutually scaled filter coefficients (MSFC)\" and computation of maximum number of parallel processors for optimized performance in this particular case. Both lookup tables (LUTs) and multipliers along with addition/subtraction architecture can be used. However, LUTs have advantage of high processing speed. Two computational stages are combined into a single stage to remove read after write (RAW) dependency. Our implementation takes N2/4+L2/4-2 time units to compute 2D- DWT of N x N input data, with filter length of L without intermediate storage. This method can be used for other multistage, multilevel DSP problems. Quartusreg II IDE and Altera Stratix device is used for implementation.","PeriodicalId":346773,"journal":{"name":"2007 International Conference on Emerging Technologies","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel Implementation of 2D-DWT by Purging Read after Write Dependency for High Speed Applications\",\"authors\":\"M. Ashraf, M. S. Baig, L. A. Khan, A. Hassan\",\"doi\":\"10.1109/ICET.2007.4516355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an efficient implementation of multistage, multiple-level DSP algorithms suitable for parallel and distributed processing. To describe our method we selected Mallat's algorithm for two dimensional wavelet transforms (2D-DWT) coefficient computation which has multistage and multilevel processing requirements. We have selected field programmable gate arrays (FPGA) as a processing unit because of its inherited parallel processing capabilities but our method is not limited to FPGAs only. Our method directly computes 2D-DWT coefficients without computing and storing intermediate results; which makes it faster; resource saving and removes read after Write (RAW) dependencies. We discuss multistage and single level implementation but ideally it can be extended to n-level implementation. We also proposed method for generation of \\\"mutually scaled filter coefficients (MSFC)\\\" and computation of maximum number of parallel processors for optimized performance in this particular case. Both lookup tables (LUTs) and multipliers along with addition/subtraction architecture can be used. However, LUTs have advantage of high processing speed. Two computational stages are combined into a single stage to remove read after write (RAW) dependency. Our implementation takes N2/4+L2/4-2 time units to compute 2D- DWT of N x N input data, with filter length of L without intermediate storage. This method can be used for other multistage, multilevel DSP problems. Quartusreg II IDE and Altera Stratix device is used for implementation.\",\"PeriodicalId\":346773,\"journal\":{\"name\":\"2007 International Conference on Emerging Technologies\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 International Conference on Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICET.2007.4516355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2007.4516355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
本文提出了一种适用于并行和分布式处理的多阶段、多级DSP算法的高效实现方法。为了描述我们的方法,我们选择Mallat算法进行二维小波变换(2D-DWT)系数的计算,该算法具有多阶段、多层次的处理要求。我们选择现场可编程门阵列(FPGA)作为处理单元,因为它继承了并行处理能力,但我们的方法不仅限于FPGA。该方法直接计算2D-DWT系数,无需计算和存储中间结果;这使得它更快;节省资源,并消除对读写(RAW)的依赖。我们讨论了多阶段和单级实现,但理想情况下它可以扩展到n级实现。在这种特殊情况下,我们还提出了“相互缩放滤波器系数(MSFC)”的生成方法和并行处理器的最大数量的计算以优化性能。查找表(lut)和乘法器以及加/减体系结构都可以使用。然而,lut具有处理速度快的优点。两个计算阶段合并为一个阶段,以消除对写后读(RAW)的依赖。我们的实现需要N2/4+L2/4-2时间单位来计算N × N个输入数据的2D- DWT,滤波器长度为L,没有中间存储。该方法可用于其他多阶段、多层次的DSP问题。采用Quartusreg II IDE和Altera Stratix器件实现。
Parallel Implementation of 2D-DWT by Purging Read after Write Dependency for High Speed Applications
This paper proposes an efficient implementation of multistage, multiple-level DSP algorithms suitable for parallel and distributed processing. To describe our method we selected Mallat's algorithm for two dimensional wavelet transforms (2D-DWT) coefficient computation which has multistage and multilevel processing requirements. We have selected field programmable gate arrays (FPGA) as a processing unit because of its inherited parallel processing capabilities but our method is not limited to FPGAs only. Our method directly computes 2D-DWT coefficients without computing and storing intermediate results; which makes it faster; resource saving and removes read after Write (RAW) dependencies. We discuss multistage and single level implementation but ideally it can be extended to n-level implementation. We also proposed method for generation of "mutually scaled filter coefficients (MSFC)" and computation of maximum number of parallel processors for optimized performance in this particular case. Both lookup tables (LUTs) and multipliers along with addition/subtraction architecture can be used. However, LUTs have advantage of high processing speed. Two computational stages are combined into a single stage to remove read after write (RAW) dependency. Our implementation takes N2/4+L2/4-2 time units to compute 2D- DWT of N x N input data, with filter length of L without intermediate storage. This method can be used for other multistage, multilevel DSP problems. Quartusreg II IDE and Altera Stratix device is used for implementation.