The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)最新文献

An adaptive temperature threshold schema for dynamic thermal management of multi-core processors 一种多核处理器动态热管理的自适应温度阈值模式

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-30 DOI: 10.1109/CADS.2013.6714247

Bagher Salami, Mohammadreza Baharani, Hamid Noori

引用次数: 4

Improved modulo-(2n ± 3) multipliers 改进模-(2n±3)乘法器

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714234

H. Ahmadifar, G. Jaberipur

引用次数: 7

Fault-tolerant method with distributed monitoring and management technique for 3D stacked meshes 基于分布式监控与管理技术的三维堆叠网格容错方法

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714243

M. Ebrahimi, M. Daneshtalab, P. Liljeberg, H. Tenhunen

引用次数: 22

A holistic approach for building MPSoCs 构建mpsoc的整体方法

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714220

J. Carrabina

{"title":"A holistic approach for building MPSoCs","authors":"J. Carrabina","doi":"10.1109/CADS.2013.6714220","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714220","url":null,"abstract":"Multi-Processor Systems-on-a-chip (MPSoCs) are currently the most common implementation technique to build complex systems that provide high performance according to both timing and power restrictions for electronic systems. Both many-core (usually homogeneous multiprocessing) and multi-core (more often heterogeneous) require providing some complex parallel programming methods together with architectural exploration and performance analysis tools to get into an optimal solution. The concept of network-on-chip and programming model topics on multiprocessors system-on-chip world will be reviewed using some selected proposals to highlight the evolution of the implementation approaches. FPGAs are being used to prototype these complex systems since they provide a high degree of visibility of the system activity at different levels of abstraction. The emerging Reconfigurable Hardware devices allow the design of complex embedded systems combining soft-core processors and a mix of other IP cores. The reduced NRE costs compared to ASIC is a typical reason to choose FPGAs as a platform to implement some applications. But the continuous increase of capacity, and the flexibility offered by reconfigurable hardware, are also important reasons to select FPGAs in order to get good Time-to-Market and Time-in-Market values. Furthermore, and because of this existing infrastructure, FPGAs can provide multi-soft-core solutions are a viable suppose and interesting solutions for embedded systems that naturally appear after new general purpose platforms. These embedded systems are therefore oriented to specific purpose applications and need some additional trade-off between performance, flexibility and development time. FPGAs allows that, at a reasonable cost, we will have available many-soft-cores solutions so that they are expected to have some relevance for some future embedded systems. Then, in addition to the current soft-core SoC tools, some parallel programming methods and tools will be required as a part of the full system development process. Performance analysis tools have also to be updated taking into account specificities of parallel programming (most of them coming from the high performance computing community) has a critical part of the development process for parallel embedded applications. Meeting some real-time constraints is a critical issue when you want to get a desired performance. A basic review of the techniques used by the HPC community will be reviewed such as the post-mortem analysis of application traces, taking into account the resource limitations of the FPGA platforms for embedded systems. This review will include several techniques and some Hardware architectural support to be able to generate traces on multiprocessor systems based on FPGAs and use them to optimize the performance of the running applications. Finally, soft-cores allow an additional advantage due to the fact that one can easily add hardware acceleration or improve communic","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":" 28","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120937401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using binary-reflected gray coding for crosstalk mitigation of network on chip 用二值反射灰色编码抑制片上网络串扰

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714241

Z. Shirmohammadi, S. Miremadi

引用次数: 11

A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs NVIDIA gpu上OpenCL内核的统计性能预测模型

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714232

Ali Karami, S. A. Mirsoleimani, F. Khunjush

{"title":"A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs","authors":"Ali Karami, S. A. Mirsoleimani, F. Khunjush","doi":"10.1109/CADS.2013.6714232","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714232","url":null,"abstract":"Understanding performance bottlenecks of applications in high performance computing can lead to dramatic improvements of applications performances. For example, a key problem in GPU programming is finding performance bottlenecks and solving them to reach the best possible performance. These bottlenecks in GPU architectures span a variety of factors such as memory access latency, branch divergence, utilization, and the amount of existing parallelism. In addition, a simple profiling cannot demonstrate the relations between these bottlenecks. In this paper, we propose a statistical performance model that not only helps us find bottlenecks but also shows the relations between them which is not possible by using a profiler. The OpenCL programming standard can be used in a variety of platforms (e.g., CPUs and GPUs); therefore, a program written in one platform can be imported to other platforms with minimal effort. As a result, we selected the OpenCL programming standard in order to design our performance model for NVIDIA GPUs. For this, we first measure the values of a GPU performance counters for the selected benchmarks. Then, using the achieved results and applying a regression model and the principle component analysis we develop a model to show how different GPU parameters account for applications performance bottlenecks. Our results show that the proposed model can predict applications behaviors with a 91% accuracy. Moreover, the proposed model is able to characterize unknown applications based on their performance similarities with an existing database of benchmark to predict their likely performance bottlenecks.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"80 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132535929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Parallelized computation for Edge Histogram Descriptor using CUDA on the Graphics Processing Units (GPU) 基于CUDA在GPU上并行计算边缘直方图描述符

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714231

A. Mohammadabadi, A. Chalechale, H. Heidari

{"title":"Parallelized computation for Edge Histogram Descriptor using CUDA on the Graphics Processing Units (GPU)","authors":"A. Mohammadabadi, A. Chalechale, H. Heidari","doi":"10.1109/CADS.2013.6714231","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714231","url":null,"abstract":"Most image processing algorithms are inherently parallel, so multithreading processors are suitable in such applications. In huge image databases, image processing takes very long time for run on a single core processor because of single thread execution of algorithms. GPU is more common in most image processing applications due to multithread execution of algorithms, programmability and low cost. In this paper we show how to implement the MPRG-7 Edge Histogram Descriptor in parallel using CUDA programming model on a GPU. The Edge Histogram Descriptor describes the distribution of various types of edges with a histogram that can be a tool for image matching. This feature is applied to search images from a database which are similar to a query image. We evaluated the retrieval of the proposed technique using recall, precision, and average precision measures. Experimental results showed that parallel implementation led to an average speed up of 14.74×over the serial implementation. The average precision and the average recall of presented method are 67.02% and 55.00% respectively.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124651457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

RF resource planning in application specific integrated circuits to improve timing closure 射频资源规划在应用特定的集成电路，以改善时序关闭

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714251

Alireza Zarei, A. Jahanian

引用次数: 0

Target position estimation with mobile adaptive network with selective cooperation 基于选择性合作的移动自适应网络目标位置估计

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714245

A. Pak, Shahrood Aghazadeh, A. Rastegarnia, A. Khalili

引用次数: 2

On endurance of erasure codes in SSD-based storage systems 基于ssd存储系统中擦除码的持久性研究

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI: 10.1109/CADS.2013.6714239

Saeideh Alinezhad Chamazcoti, S. Miremadi, H. Asadi

{"title":"On endurance of erasure codes in SSD-based storage systems","authors":"Saeideh Alinezhad Chamazcoti, S. Miremadi, H. Asadi","doi":"10.1109/CADS.2013.6714239","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714239","url":null,"abstract":"The wear-out of flash-based Solid-State Drives (SSDs) is a main concern that significantly affects their reliability. One major parameter that accelerates SSD wear-out is the number of write-cycles committed to flash chips. The number of write-cycles in SSD-based disk subsystem is highly dependent on the erasure code implemented in Redundant Array of Independent Disks (RAIDs). In this paper, we investigate the impact of erasure codes and the configuration of storage subsystems (i.e., the number of disks participated in the RAID array and stripe unit size) on the endurance of storage systems. The number of write-cycles is considered as a metric to evaluate the endurance of storage system. We evaluate the endurance of four different well-known erasure codes, i.e., Reed-Solomon, EVENODD, RDP, and X-Code, employed in SSD-based RAID systems. In the evaluation, the number of write-cycles is measured with respect to the number of disks, stripe unit size, and request size using trace-driven simulation. The simulation results show that Reed-Solomon provides the lowest number of write-cycles due to the optimal dependency between data and parities in its coding. The results also demonstrate that EVENODD and RDP impose the highest number of write-cycles when using the high number of disks with large stripe unit size. These results recommend designing erasure codes with minimum dependency between data and parities as this minimum dependency provides optimal number of write-cycles.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128083143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3