William Chapman, S. Ranka, S. Sahni, M. Schmalz, U. Majumder
{"title":"gpu上合成孔径雷达数据处理的并行处理技术","authors":"William Chapman, S. Ranka, S. Sahni, M. Schmalz, U. Majumder","doi":"10.1109/ISSPIT.2010.5711769","DOIUrl":null,"url":null,"abstract":"This paper presents a design for parallel processing of synthetic aperture radar (SAR) data using one or more Graphics Processing Units (GPUs). Our design supports real-time reconstruction of a two-dimensional image from a matrix of echo pulses and their corresponding response values. Key to our design is a dual partitioning scheme that divides the output image into tiles and divides the input matrix into sets of pulses. Pairs comprised of an image tile and a pulse set are distributed to thread blocks in a GPU, thus facilitating parallel computation. Memory access latency is masked by the GPU's low-latency thread scheduling. Our performance analysis quantifies latency as a function of the input and output parameters. Experimental results were generated with an nVidia Tesla C2050 GPU having maximum throughput of 1030 Gflop/s. Our design achieves peak throughput of 293 Gflop/s, which scales well for output image sizes from 2,048 × 2,048 pixels to 4,096 × 4,096 pixels. Higher throughput can be obtained by distributing the pulse matrix across multiple GPUs and combining the results at a host device.","PeriodicalId":288042,"journal":{"name":"2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Parallel processing techniques for the processing of synthetic aperture radar data on GPUs\",\"authors\":\"William Chapman, S. Ranka, S. Sahni, M. Schmalz, U. Majumder\",\"doi\":\"10.1109/ISSPIT.2010.5711769\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a design for parallel processing of synthetic aperture radar (SAR) data using one or more Graphics Processing Units (GPUs). Our design supports real-time reconstruction of a two-dimensional image from a matrix of echo pulses and their corresponding response values. Key to our design is a dual partitioning scheme that divides the output image into tiles and divides the input matrix into sets of pulses. Pairs comprised of an image tile and a pulse set are distributed to thread blocks in a GPU, thus facilitating parallel computation. Memory access latency is masked by the GPU's low-latency thread scheduling. Our performance analysis quantifies latency as a function of the input and output parameters. Experimental results were generated with an nVidia Tesla C2050 GPU having maximum throughput of 1030 Gflop/s. Our design achieves peak throughput of 293 Gflop/s, which scales well for output image sizes from 2,048 × 2,048 pixels to 4,096 × 4,096 pixels. Higher throughput can be obtained by distributing the pulse matrix across multiple GPUs and combining the results at a host device.\",\"PeriodicalId\":288042,\"journal\":{\"name\":\"2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)\",\"volume\":\"146 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPIT.2010.5711769\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPIT.2010.5711769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
摘要
提出了一种利用一个或多个图形处理器并行处理合成孔径雷达(SAR)数据的设计方案。我们的设计支持从回波脉冲矩阵及其相应的响应值实时重建二维图像。我们设计的关键是一种双重分割方案,它将输出图像分割成块,并将输入矩阵分割成脉冲集。由图像块和脉冲集组成的对被分配到GPU的线程块中,从而促进并行计算。内存访问延迟被GPU的低延迟线程调度所掩盖。我们的性能分析将延迟量化为输入和输出参数的函数。实验结果是在最大吞吐量为1030 Gflop/s的nVidia Tesla C2050 GPU上生成的。我们的设计实现了293 Gflop/s的峰值吞吐量,输出图像尺寸从2,048 × 2,048像素扩展到4,096 × 4,096像素。通过将脉冲矩阵分布在多个gpu上并在主机设备上组合结果,可以获得更高的吞吐量。
Parallel processing techniques for the processing of synthetic aperture radar data on GPUs
This paper presents a design for parallel processing of synthetic aperture radar (SAR) data using one or more Graphics Processing Units (GPUs). Our design supports real-time reconstruction of a two-dimensional image from a matrix of echo pulses and their corresponding response values. Key to our design is a dual partitioning scheme that divides the output image into tiles and divides the input matrix into sets of pulses. Pairs comprised of an image tile and a pulse set are distributed to thread blocks in a GPU, thus facilitating parallel computation. Memory access latency is masked by the GPU's low-latency thread scheduling. Our performance analysis quantifies latency as a function of the input and output parameters. Experimental results were generated with an nVidia Tesla C2050 GPU having maximum throughput of 1030 Gflop/s. Our design achieves peak throughput of 293 Gflop/s, which scales well for output image sizes from 2,048 × 2,048 pixels to 4,096 × 4,096 pixels. Higher throughput can be obtained by distributing the pulse matrix across multiple GPUs and combining the results at a host device.