{"title":"分析和设计并行算法和实现矩阵乘法的图像和信号处理","authors":"M. Yasrebi, J. Browne","doi":"10.1109/PACRIM.1989.48302","DOIUrl":null,"url":null,"abstract":"Parallel matrix multiplication algorithms (based on the common data distribution formats) used in pattern recognition, image processing, and signal processing applications are discussed. A novel algorithm is introduced and is shown to be the fastest one for a determined class of applications. The algorithms are analyzed for performance as a function of array dimension, data distribution formats, and the architecture of the computer upon which the algorithms are executed. Performance bounds and speedups (linear in the number of processors) are established. The results of the analysis are given both as characterizations of executions on selected classes of architectures and also in the form of theorems which establish the relative performance of the algorithms across classes of data distributions and architectures.<<ETX>>","PeriodicalId":256287,"journal":{"name":"Conference Proceeding IEEE Pacific Rim Conference on Communications, Computers and Signal Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis and design of parallel algorithms and implementations of matrix multiplications for image and signal processing\",\"authors\":\"M. Yasrebi, J. Browne\",\"doi\":\"10.1109/PACRIM.1989.48302\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel matrix multiplication algorithms (based on the common data distribution formats) used in pattern recognition, image processing, and signal processing applications are discussed. A novel algorithm is introduced and is shown to be the fastest one for a determined class of applications. The algorithms are analyzed for performance as a function of array dimension, data distribution formats, and the architecture of the computer upon which the algorithms are executed. Performance bounds and speedups (linear in the number of processors) are established. The results of the analysis are given both as characterizations of executions on selected classes of architectures and also in the form of theorems which establish the relative performance of the algorithms across classes of data distributions and architectures.<<ETX>>\",\"PeriodicalId\":256287,\"journal\":{\"name\":\"Conference Proceeding IEEE Pacific Rim Conference on Communications, Computers and Signal Processing\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1989-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference Proceeding IEEE Pacific Rim Conference on Communications, Computers and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACRIM.1989.48302\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference Proceeding IEEE Pacific Rim Conference on Communications, Computers and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM.1989.48302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis and design of parallel algorithms and implementations of matrix multiplications for image and signal processing
Parallel matrix multiplication algorithms (based on the common data distribution formats) used in pattern recognition, image processing, and signal processing applications are discussed. A novel algorithm is introduced and is shown to be the fastest one for a determined class of applications. The algorithms are analyzed for performance as a function of array dimension, data distribution formats, and the architecture of the computer upon which the algorithms are executed. Performance bounds and speedups (linear in the number of processors) are established. The results of the analysis are given both as characterizations of executions on selected classes of architectures and also in the form of theorems which establish the relative performance of the algorithms across classes of data distributions and architectures.<>