The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)最新文献

筛选
英文 中文
An adaptive temperature threshold schema for dynamic thermal management of multi-core processors 一种多核处理器动态热管理的自适应温度阈值模式
Bagher Salami, Mohammadreza Baharani, Hamid Noori
{"title":"An adaptive temperature threshold schema for dynamic thermal management of multi-core processors","authors":"Bagher Salami, Mohammadreza Baharani, Hamid Noori","doi":"10.1109/CADS.2013.6714247","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714247","url":null,"abstract":"This paper presents an adaptive task migration threshold schema for dynamic thermal management of multi-core processors to minimize both average and peak temperature with very low performance overhead. Our proposed algorithm adjusts temperature threshold according to processor work-load and hardware platforms. The experimental results indicate that our technique can significantly decrease average and peak temperature compared to Linux standard scheduler, and two well-known thermal management techniques.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130016637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improved modulo-(2n ± 3) multipliers 改进模-(2n±3)乘法器
H. Ahmadifar, G. Jaberipur
{"title":"Improved modulo-(2n ± 3) multipliers","authors":"H. Ahmadifar, G. Jaberipur","doi":"10.1109/CADS.2013.6714234","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714234","url":null,"abstract":"Modular adders and multipliers have applications in residue number system (RNS) arithmetic, cryptography, and error-checking, where general architectures are usually designed for moduli of the form 2n±k ± 1, with very efficient realizations. However, less efficient arithmetic circuits also occasionally appear in the relevant literature for moduli of the form 2n ± δ, where δ is an odd integer and δ ≠1. In particular, adders, multipliers and RNS converters have been recently offered for modulo-(2n ± 3). In this paper, we address a recent work on modulo-(2n ± 3) multipliers that are realized as normal n-bit multipliers, followed by conversion of 2n-bit products to RNS residues. We aim to enhance the performance of such modular multipliers via eliminating the carry propagate adder that operates at the end of preliminary binary multiplication. Analytical and synthesis based evaluation has shown improvements in latency and power dissipation. Also our designs require less area consumption for the same delay.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124688922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Fault-tolerant method with distributed monitoring and management technique for 3D stacked meshes 基于分布式监控与管理技术的三维堆叠网格容错方法
M. Ebrahimi, M. Daneshtalab, P. Liljeberg, H. Tenhunen
{"title":"Fault-tolerant method with distributed monitoring and management technique for 3D stacked meshes","authors":"M. Ebrahimi, M. Daneshtalab, P. Liljeberg, H. Tenhunen","doi":"10.1109/CADS.2013.6714243","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714243","url":null,"abstract":"In this paper, we present a fully adaptive routing algorithm for 3D stacked mesh, called 3D-FAR. This algorithm utilizes two, two, and four virtual channels along the X, Y, and Z dimensions, respectively. It allows packets to take any shortest paths between the source and destination routers. 3D-FAR divides the network into four disjoint subnetworks. To improve the fault-tolerant capability of the network, packets are able to switch between subnetworks in an ascending order. In this paper, we also propose a fault-tolerant algorithm for 3D mesh network, called 3D-FT. This method is discussed both for tolerating faulty routers and links in the network. For tolerating faulty routers, only the shortest paths are taken while for tolerating faulty links, the non-minimal paths are used only when the source and destination routers are located in the same dimension with a faulty link between them. 3D-FT utilizes a distributed monitoring and management technique to distribute the fault statuses among the surrounding routers.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"79 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132511947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A holistic approach for building MPSoCs 构建mpsoc的整体方法
J. Carrabina
{"title":"A holistic approach for building MPSoCs","authors":"J. Carrabina","doi":"10.1109/CADS.2013.6714220","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714220","url":null,"abstract":"Multi-Processor Systems-on-a-chip (MPSoCs) are currently the most common implementation technique to build complex systems that provide high performance according to both timing and power restrictions for electronic systems. Both many-core (usually homogeneous multiprocessing) and multi-core (more often heterogeneous) require providing some complex parallel programming methods together with architectural exploration and performance analysis tools to get into an optimal solution. The concept of network-on-chip and programming model topics on multiprocessors system-on-chip world will be reviewed using some selected proposals to highlight the evolution of the implementation approaches. FPGAs are being used to prototype these complex systems since they provide a high degree of visibility of the system activity at different levels of abstraction. The emerging Reconfigurable Hardware devices allow the design of complex embedded systems combining soft-core processors and a mix of other IP cores. The reduced NRE costs compared to ASIC is a typical reason to choose FPGAs as a platform to implement some applications. But the continuous increase of capacity, and the flexibility offered by reconfigurable hardware, are also important reasons to select FPGAs in order to get good Time-to-Market and Time-in-Market values. Furthermore, and because of this existing infrastructure, FPGAs can provide multi-soft-core solutions are a viable suppose and interesting solutions for embedded systems that naturally appear after new general purpose platforms. These embedded systems are therefore oriented to specific purpose applications and need some additional trade-off between performance, flexibility and development time. FPGAs allows that, at a reasonable cost, we will have available many-soft-cores solutions so that they are expected to have some relevance for some future embedded systems. Then, in addition to the current soft-core SoC tools, some parallel programming methods and tools will be required as a part of the full system development process. Performance analysis tools have also to be updated taking into account specificities of parallel programming (most of them coming from the high performance computing community) has a critical part of the development process for parallel embedded applications. Meeting some real-time constraints is a critical issue when you want to get a desired performance. A basic review of the techniques used by the HPC community will be reviewed such as the post-mortem analysis of application traces, taking into account the resource limitations of the FPGA platforms for embedded systems. This review will include several techniques and some Hardware architectural support to be able to generate traces on multiprocessor systems based on FPGAs and use them to optimize the performance of the running applications. Finally, soft-cores allow an additional advantage due to the fact that one can easily add hardware acceleration or improve communic","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":" 28","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120937401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using binary-reflected gray coding for crosstalk mitigation of network on chip 用二值反射灰色编码抑制片上网络串扰
Z. Shirmohammadi, S. Miremadi
{"title":"Using binary-reflected gray coding for crosstalk mitigation of network on chip","authors":"Z. Shirmohammadi, S. Miremadi","doi":"10.1109/CADS.2013.6714241","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714241","url":null,"abstract":"This paper proposes an efficient crosstalk mitigation method for Network-on-Chips. This method uses the binary-reflected Gray coding to send the proper code word into a channel. As the Gray code has reflective and unit distance properties, based on these facts, content of every flit is selected so that to minimize the number of forbidden transition patterns in the channel. A VHDL-based simulation is carried out for several channel widths. Simulation results show that the proposed method reduces the forbidden transitions up to 26% and can save power in NoC links.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121028469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs NVIDIA gpu上OpenCL内核的统计性能预测模型
Ali Karami, S. A. Mirsoleimani, F. Khunjush
{"title":"A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs","authors":"Ali Karami, S. A. Mirsoleimani, F. Khunjush","doi":"10.1109/CADS.2013.6714232","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714232","url":null,"abstract":"Understanding performance bottlenecks of applications in high performance computing can lead to dramatic improvements of applications performances. For example, a key problem in GPU programming is finding performance bottlenecks and solving them to reach the best possible performance. These bottlenecks in GPU architectures span a variety of factors such as memory access latency, branch divergence, utilization, and the amount of existing parallelism. In addition, a simple profiling cannot demonstrate the relations between these bottlenecks. In this paper, we propose a statistical performance model that not only helps us find bottlenecks but also shows the relations between them which is not possible by using a profiler. The OpenCL programming standard can be used in a variety of platforms (e.g., CPUs and GPUs); therefore, a program written in one platform can be imported to other platforms with minimal effort. As a result, we selected the OpenCL programming standard in order to design our performance model for NVIDIA GPUs. For this, we first measure the values of a GPU performance counters for the selected benchmarks. Then, using the achieved results and applying a regression model and the principle component analysis we develop a model to show how different GPU parameters account for applications performance bottlenecks. Our results show that the proposed model can predict applications behaviors with a 91% accuracy. Moreover, the proposed model is able to characterize unknown applications based on their performance similarities with an existing database of benchmark to predict their likely performance bottlenecks.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"80 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132535929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Parallelized computation for Edge Histogram Descriptor using CUDA on the Graphics Processing Units (GPU) 基于CUDA在GPU上并行计算边缘直方图描述符
A. Mohammadabadi, A. Chalechale, H. Heidari
{"title":"Parallelized computation for Edge Histogram Descriptor using CUDA on the Graphics Processing Units (GPU)","authors":"A. Mohammadabadi, A. Chalechale, H. Heidari","doi":"10.1109/CADS.2013.6714231","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714231","url":null,"abstract":"Most image processing algorithms are inherently parallel, so multithreading processors are suitable in such applications. In huge image databases, image processing takes very long time for run on a single core processor because of single thread execution of algorithms. GPU is more common in most image processing applications due to multithread execution of algorithms, programmability and low cost. In this paper we show how to implement the MPRG-7 Edge Histogram Descriptor in parallel using CUDA programming model on a GPU. The Edge Histogram Descriptor describes the distribution of various types of edges with a histogram that can be a tool for image matching. This feature is applied to search images from a database which are similar to a query image. We evaluated the retrieval of the proposed technique using recall, precision, and average precision measures. Experimental results showed that parallel implementation led to an average speed up of 14.74×over the serial implementation. The average precision and the average recall of presented method are 67.02% and 55.00% respectively.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124651457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
RF resource planning in application specific integrated circuits to improve timing closure 射频资源规划在应用特定的集成电路,以改善时序关闭
Alireza Zarei, A. Jahanian
{"title":"RF resource planning in application specific integrated circuits to improve timing closure","authors":"Alireza Zarei, A. Jahanian","doi":"10.1109/CADS.2013.6714251","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714251","url":null,"abstract":"VLSI technology scaling raised the new problems in conventional metal wires. Some new technologies and materials are proposed for modern interconnects such as optical interconnection, nanowire technology and radio frequency interconnection. In this paper, a new placement/planning approach is proposed for using the RF interconnects in regular ASICs. In the proposed approach, three placement configurations are examined for RF resources and electrical characteristics of the final circuit is compared. Our experimental results show that using a suitable RF planning, this technology can be suitable, especially for large and complex circuits.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122411870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Target position estimation with mobile adaptive network with selective cooperation 基于选择性合作的移动自适应网络目标位置估计
A. Pak, Shahrood Aghazadeh, A. Rastegarnia, A. Khalili
{"title":"Target position estimation with mobile adaptive network with selective cooperation","authors":"A. Pak, Shahrood Aghazadeh, A. Rastegarnia, A. Khalili","doi":"10.1109/CADS.2013.6714245","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714245","url":null,"abstract":"In this paper we consider the problem of collective motion for a set of mobile nodes. The objective of nodes is to move toward a target. Adaptive network can be used for this purpose. In this paper, we propose a reduced-complexity diffusion adaptive network solution. The idea is to use a selective cooperation, by choosing the best nodes at each iteration, to reduce the number of communications. As our simulation results show, we can achieve proper steady-state performance while reducing the communication overhead.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115098758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On endurance of erasure codes in SSD-based storage systems 基于ssd存储系统中擦除码的持久性研究
Saeideh Alinezhad Chamazcoti, S. Miremadi, H. Asadi
{"title":"On endurance of erasure codes in SSD-based storage systems","authors":"Saeideh Alinezhad Chamazcoti, S. Miremadi, H. Asadi","doi":"10.1109/CADS.2013.6714239","DOIUrl":"https://doi.org/10.1109/CADS.2013.6714239","url":null,"abstract":"The wear-out of flash-based Solid-State Drives (SSDs) is a main concern that significantly affects their reliability. One major parameter that accelerates SSD wear-out is the number of write-cycles committed to flash chips. The number of write-cycles in SSD-based disk subsystem is highly dependent on the erasure code implemented in Redundant Array of Independent Disks (RAIDs). In this paper, we investigate the impact of erasure codes and the configuration of storage subsystems (i.e., the number of disks participated in the RAID array and stripe unit size) on the endurance of storage systems. The number of write-cycles is considered as a metric to evaluate the endurance of storage system. We evaluate the endurance of four different well-known erasure codes, i.e., Reed-Solomon, EVENODD, RDP, and X-Code, employed in SSD-based RAID systems. In the evaluation, the number of write-cycles is measured with respect to the number of disks, stripe unit size, and request size using trace-driven simulation. The simulation results show that Reed-Solomon provides the lowest number of write-cycles due to the optimal dependency between data and parities in its coding. The results also demonstrate that EVENODD and RDP impose the highest number of write-cycles when using the high number of disks with large stripe unit size. These results recommend designing erasure codes with minimum dependency between data and parities as this minimum dependency provides optimal number of write-cycles.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128083143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信