{"title":"Parallel Implementation of 2D-DWT by Purging Read after Write Dependency for High Speed Applications","authors":"M. Ashraf, M. S. Baig, L. A. Khan, A. Hassan","doi":"10.1109/ICET.2007.4516355","DOIUrl":null,"url":null,"abstract":"This paper proposes an efficient implementation of multistage, multiple-level DSP algorithms suitable for parallel and distributed processing. To describe our method we selected Mallat's algorithm for two dimensional wavelet transforms (2D-DWT) coefficient computation which has multistage and multilevel processing requirements. We have selected field programmable gate arrays (FPGA) as a processing unit because of its inherited parallel processing capabilities but our method is not limited to FPGAs only. Our method directly computes 2D-DWT coefficients without computing and storing intermediate results; which makes it faster; resource saving and removes read after Write (RAW) dependencies. We discuss multistage and single level implementation but ideally it can be extended to n-level implementation. We also proposed method for generation of \"mutually scaled filter coefficients (MSFC)\" and computation of maximum number of parallel processors for optimized performance in this particular case. Both lookup tables (LUTs) and multipliers along with addition/subtraction architecture can be used. However, LUTs have advantage of high processing speed. Two computational stages are combined into a single stage to remove read after write (RAW) dependency. Our implementation takes N2/4+L2/4-2 time units to compute 2D- DWT of N x N input data, with filter length of L without intermediate storage. This method can be used for other multistage, multilevel DSP problems. Quartusreg II IDE and Altera Stratix device is used for implementation.","PeriodicalId":346773,"journal":{"name":"2007 International Conference on Emerging Technologies","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2007.4516355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes an efficient implementation of multistage, multiple-level DSP algorithms suitable for parallel and distributed processing. To describe our method we selected Mallat's algorithm for two dimensional wavelet transforms (2D-DWT) coefficient computation which has multistage and multilevel processing requirements. We have selected field programmable gate arrays (FPGA) as a processing unit because of its inherited parallel processing capabilities but our method is not limited to FPGAs only. Our method directly computes 2D-DWT coefficients without computing and storing intermediate results; which makes it faster; resource saving and removes read after Write (RAW) dependencies. We discuss multistage and single level implementation but ideally it can be extended to n-level implementation. We also proposed method for generation of "mutually scaled filter coefficients (MSFC)" and computation of maximum number of parallel processors for optimized performance in this particular case. Both lookup tables (LUTs) and multipliers along with addition/subtraction architecture can be used. However, LUTs have advantage of high processing speed. Two computational stages are combined into a single stage to remove read after write (RAW) dependency. Our implementation takes N2/4+L2/4-2 time units to compute 2D- DWT of N x N input data, with filter length of L without intermediate storage. This method can be used for other multistage, multilevel DSP problems. Quartusreg II IDE and Altera Stratix device is used for implementation.