Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)最新文献

筛选
英文 中文
Parallelization of an ultrasound reconstruction algorithm for non destructive testing on multicore CPU and GPU 基于多核CPU和GPU的无损检测超声重构并行化算法
Antoine Pedron, L. Lacassagne, F. Bimbard, S. Berre
{"title":"Parallelization of an ultrasound reconstruction algorithm for non destructive testing on multicore CPU and GPU","authors":"Antoine Pedron, L. Lacassagne, F. Bimbard, S. Berre","doi":"10.1109/DASIP.2011.6136904","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136904","url":null,"abstract":"The CIVA software platform developed by CEA-LIST offers various simulation and data processing modules dedicated to non-destructive testing (NDT). In particular, ultrasonic imaging and reconstruction tools are proposed, in the purpose of localizing echoes and identifying and sizing the detected defects. Because of the complexity of data processed, computation time is now a limitation for the optimal use of available information. In this article, we present performance results on parallelization of one computationally heavy algorithm on general purpose processors (GPP) and graphic processing units (GPU). GPU implementation makes an intensive use of atomic intrinsics. Compared to initial GPP implementation, optimized GPP implementation runs up to ×116 faster and GPU implementation up to ×631. This shows that, even with irregular workloads, combining software optimization and hardware improvements, GPU give high performance.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126766162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Embedded operating systems energy overhead 嵌入式操作系统的能源开销
B. Ouni, C. Belleudy, S. Bilavarn, E. Senn
{"title":"Embedded operating systems energy overhead","authors":"B. Ouni, C. Belleudy, S. Bilavarn, E. Senn","doi":"10.1109/DASIP.2011.6136853","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136853","url":null,"abstract":"In this paper, a flow of characterization of embedded operating system's energy consumption is presented. The objective is to determine the energy overhead of the services of the embedded OS, we interest particularly on the context switch service. The modeling is based on measurements on the hardware platform OMAP35x EVM board, running Linux omap. Based on the analysis results, a relationship between energy overhead and a set of hardware and software parameters is established.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"257 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115794997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimization methodologies for complex FPGA-based signal processing systems with CAL 基于fpga的复杂信号处理系统的优化方法
A. Rahman, Hossam Amer, A. Prihozhy, Christophe Lucarz, M. Mattavelli
{"title":"Optimization methodologies for complex FPGA-based signal processing systems with CAL","authors":"A. Rahman, Hossam Amer, A. Prihozhy, Christophe Lucarz, M. Mattavelli","doi":"10.1109/DASIP.2011.6136878","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136878","url":null,"abstract":"Signal processing designs are becoming increasingly complex with demands for more advanced algorithms. Designers are now seeking high-level tools and methodology to help manage complexity and increase productivity. Recently, CAL dataflow language has been specified which is capable of synthesizing dataflow description into RTL codes for hardware implementation, and based on several case studies, have shown promising results. However, no work has been done on global network analysis, which could increase the optimization space. In this paper, we introduce methodologies to analyze and optimize CAL programs by determining which actions should be parallelized, pipelined, or refactored for the highest throughput gain, and then providing tools and techniques to achieve this using minimum resource. As a case study on the RVC MPEG-4 SP Intra decoder for implementation on Virtex-5 FPGA, experimental results confirmed our analysis with throughput gain of up to 3.5x using relatively-minor additional slice compared to the reference design.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132068331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
High speed VLSI architecture for 2-D lifting Discrete Wavelet Transform 二维提升离散小波变换的高速VLSI结构
A. Darji, R. Bansal, S. Merchant, A. Chandorkar
{"title":"High speed VLSI architecture for 2-D lifting Discrete Wavelet Transform","authors":"A. Darji, R. Bansal, S. Merchant, A. Chandorkar","doi":"10.1109/DASIP.2011.6136866","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136866","url":null,"abstract":"The lifting scheme reduces the computational complexity for computing Discrete Wavelet Transform (DWT) compared to convolution. We have proposed a high performance and memory efficient architecture with parallel scanning method for 2-D DWT using 5/3 Lifting wavelet. This 2-D architecture is composed with two 1-D DWT units and a Transpose Unit (TU). Proposed parallel scanning reduces requirement of on-chip line buffer compared to other line based scanning. Proposed 2-D DWT architecture utilizes only 2N size buffer for NxN sized image, which is low compare to 3.5N usual requirement for to implement 5/3 Lifting wavelet. This is achieved by performing column and row transform simultaneously. Designed 1-D DWT module can process two inputs at a time and produce two outputs per clock which reduces latency significantly compared to other 2-D dual scan based DWT architectures. Designed TU operates at half clock rate which reduces power and its design is independent of size of input image. Instead of shifter we propose Hardwired Scaling Unit (HSU) for coefficient multiplication. Unlike shift register unit this design saves clocks and helps in reducing power by great amount. This architecture is synthesized using Xilinx ISE 10.1 and is implemented on Virtex-IIPRO XC2VP30 FPGA. Very low FPGA resource utilization is found.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123872570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient maximal convex custom instruction enumeration for extensible processors 可扩展处理器的高效最大凸自定义指令枚举
Chenglong Xiao, E. Casseau
{"title":"Efficient maximal convex custom instruction enumeration for extensible processors","authors":"Chenglong Xiao, E. Casseau","doi":"10.1109/DASIP.2011.6136868","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136868","url":null,"abstract":"In recent years, the use of extensible processors has been increased. Extensible processors extend the base instruction set of a general-purpose processor with a set of custom instructions. Custom instructions that can be implemented in special hardware units make it possible to improve performance and decrease power consumption in extensible processors. The key issue involved is to generate and select automatically the custom instructions from a high-level application code. However, enumerating all possible custom instructions of a given dataflow graph is a computationally difficult problem. In this paper, we propose an efficient algorithm for the exact enumeration of maximal convex custom instructions. The state of the art algorithms use either a bottom-up manner or a top-down manner to solve the problem. The proposed algorithm enumerates all maximal convex custom instructions by using a sandwich manner that combines the advantage of the bottom-up manner and the top-down manner. Compared to the latest algorithm, our algorithm can achieve orders of magnitude speedup.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130114980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
DFG implementation on multi GPU cluster with computation-communication overlap 计算通信重叠的多GPU集群DFG实现
Sylvain Huet, Vincent Boulos, V. Fristot, L. Salvo
{"title":"DFG implementation on multi GPU cluster with computation-communication overlap","authors":"Sylvain Huet, Vincent Boulos, V. Fristot, L. Salvo","doi":"10.1109/DASIP.2011.6136859","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136859","url":null,"abstract":"Nowadays, it is possible to build a multi-GPU supercomputer, well suited for implementation of digital signal processing algorithms, for a few thousand dollars. However, to achieve the highest performance with this kind of architecture, the programmer has to focus on inter-processor communications, tasks synchronization … In this paper, we propose a design flow allowing an efficient implementation of a Digital Signal Processing (DSP) application specified as a Data Flow Graph (DFG) on a multi GPU computer cluster. We focus particularly on the effective implementation of communications by automating the computation-communication overlap, which can lead to significant speedups as shown in the presented benchmark. The approach is validated on a 3D granulometry application developed for research on materials.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117304455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An efficient parallel motion estimation algorithm and X264 parallelization in CUDA 一种高效的并行运动估计算法和CUDA中的X264并行化
Youngsub Ko, Youngmin Yi, S. Ha
{"title":"An efficient parallel motion estimation algorithm and X264 parallelization in CUDA","authors":"Youngsub Ko, Youngmin Yi, S. Ha","doi":"10.1109/DASIP.2011.6136860","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136860","url":null,"abstract":"H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research effort to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation because of significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. The proposed H.264 encoder achieves more than 20% speed-up compared with x264.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"21 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120905395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Middleware approaches for adaptivity of Kahn Process Networks on Networks-on-Chip 片上网络上Kahn过程网络自适应的中间件方法
E. Cannella, O. Derin, T. Stefanov
{"title":"Middleware approaches for adaptivity of Kahn Process Networks on Networks-on-Chip","authors":"E. Cannella, O. Derin, T. Stefanov","doi":"10.1109/DASIP.2011.6136862","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136862","url":null,"abstract":"We investigate and propose a number of different middleware approaches, namely virtual connector, virtual connector with variable rate, and request-driven, which implement the semantics of Kahn Process Networks on Network-on-Chip architectures. All of the presented solutions allow for run-time system adaptivity. We implement the approaches on a Network-on-Chip multiprocessor platform prototyped on an FPGA. Their comparison in terms of the introduced overhead is presented on two case studies with different communication characteristics. We found out that the virtual connector mechanism outperforms other approaches in the communication-intensive application. In the other case study, which has a higher computation/communication ratio, the middleware approaches show similar performance.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127277425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A systemc TLM framework for distributed simulation of complex systems with unpredictable communication 具有不可预测通信的复杂系统分布式仿真的系统TLM框架
J. Peeters, N. Ventroux, Tanguy Sassolas, L. Lacassagne
{"title":"A systemc TLM framework for distributed simulation of complex systems with unpredictable communication","authors":"J. Peeters, N. Ventroux, Tanguy Sassolas, L. Lacassagne","doi":"10.1109/DASIP.2011.6136847","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136847","url":null,"abstract":"Increasingly complex systems need parallelized simulation engines. In the context of SystemC simulation, existing proposals require predicting communication in the simulated system. However, this is often unpredictable. In order to deal with unpredictable systems, this paper presents a parallelization approach using asynchronous communication without modification of the SystemC simulation engine. Simulated system model is cut up and distributed across separate simulation engines, each part being evaluated in parallel of others. Functional consistency is preserved thanks to the simulated system write exclusive memory access policy while temporal consistency is guaranteed using explicit synchronization. Experimental results show up a speed-up up to 13x on 16 processors.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121444752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Systemc modelization for fast validation of imager architectures 用于快速验证成像仪架构的系统建模
Y. Blanchard, A. Dupret, A. Peizerat
{"title":"Systemc modelization for fast validation of imager architectures","authors":"Y. Blanchard, A. Dupret, A. Peizerat","doi":"10.1109/DASIP.2011.6136902","DOIUrl":"https://doi.org/10.1109/DASIP.2011.6136902","url":null,"abstract":"Development of smart CMOS imagers is a complex design task where the verification of an architecture composed of a matrix of pixels intermixed with analog and digital electronics is playing an important part. New generations of imager using 3D integration will allow even more processing to be done in-situ. Verification has to be done locally for the pixel and globally for the architecture. Design exploration and validation problematic has shifted from mostly the analog domain to the validation of a complex SOC with millions of parallel processors, the pixels. In this paper we present a methodology using the SystemC language for the creation of fast models for validation and a first level evaluation of performance of large CMOS imager architectures.","PeriodicalId":199500,"journal":{"name":"Proceedings of the 2011 Conference on Design & Architectures for Signal & Image Processing (DASIP)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132356107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信