18th International Symposium on VLSI Design and Test最新文献

筛选
英文 中文
A VLIW-Vector co-processor design for accelerating Basic Linear Algebraic Operations in OpenCV 在OpenCV中加速基本线性代数运算的VLIW-Vector协处理器设计
18th International Symposium on VLSI Design and Test Pub Date : 2014-08-21 DOI: 10.1109/ISVDAT.2014.6881085
Venkata Ganapathi Puppala
{"title":"A VLIW-Vector co-processor design for accelerating Basic Linear Algebraic Operations in OpenCV","authors":"Venkata Ganapathi Puppala","doi":"10.1109/ISVDAT.2014.6881085","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881085","url":null,"abstract":"OpenCV is a widely used computer vision library written in C++. Basic Linear Algebraic Operations (BLAOP) involving matrices are at the heart of OpenCV. Though OpenCV provides ubiquity in the computer vision field, it runs slow when ported on embedded processors. Accelerating the LAOPs using a co-processor certainly helps improving the throughput. In this paper we present a floating point VLIW-Vector Co-processor Architecture with Vector Floating Point Datapath (VFPDP) and a 4-slot VLIW processor core to accelerate BLAOps achieving performance of two GFLOPS when run at 500MHz clock frequency. We also demonstrate a detailed mapping strategy of One sided Jacobi Singular Value Decomposition (OJSVD) algorithm onto the proposed architecture. The proposed architecture is designed using Verilog HDL and it is synthesized using Synopsis Design Compiler with 28nm TSMC target libraries. The clock period is set to 2ns and the timing constraints are met. Using the Altera's SOPC builder, an experimental system is created with the co-processor interfaced to the NIOS II soft processor and implemented in Cyclone IV FPGA. The OJSVD algorithm is ported onto both the standalone NIOS II processor based system and the system with the proposed co-processor. The results show that 15X performance improvement achieved with this co-processor.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127577977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FPGA-based implementation of M4RM for matrix multiplication over GF(2) 基于fpga的矩阵乘法M4RM在GF(2)上的实现
18th International Symposium on VLSI Design and Test Pub Date : 2014-08-21 DOI: 10.1109/ISVDAT.2014.6881072
Vivek Kumar, Vinay B. Y. Kumar, S. Patkar
{"title":"FPGA-based implementation of M4RM for matrix multiplication over GF(2)","authors":"Vivek Kumar, Vinay B. Y. Kumar, S. Patkar","doi":"10.1109/ISVDAT.2014.6881072","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881072","url":null,"abstract":"The Method of Four Russians for Multiplication (M4RM) is one of the most efficient algorithms for dense matrix multiplication over binary field targeting particularly the commodity general purpose processors. We present an efficient tile-based hardware/software implementation of M4RM, with the hardware side handling the constituent block multiplications in a streaming fashion, and the software side doing the accumulations. With designs for 64 × 64 and 128 × 128 sized block matrix multiplications, sizes feasible for targeting FPGAs, we compare the performance with the fastest software implementations of M4RM on commodity processors. The designs were implemented in Bluespec SystemVerilog, and evaluated over the hardware/software co-emulation framework, SCE-MI. Using the 128 × 128 hardware modules, a 16, 384 × 16, 384 matrix multiplication, running at 140 MHz could be done in ~ 3.0s using the Strassen-Winograd scheme when targeting a Cyclone IV FPGA and at a sustained bit operations per cycle of ~ 8000; where, in comparision, M4RM on Intel Core2Duo running at 2.33GHz, takes ~ 8s and at a sustained bit operations per cycle of ~ 500.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122541808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A spare link based reliable Network-on-Chip design 一种基于备用链路的可靠片上网络设计
18th International Symposium on VLSI Design and Test Pub Date : 2014-07-16 DOI: 10.1109/ISVDAT.2014.6881036
Navonil Chatterjee, N. Prasad, S. Chattopadhyay
{"title":"A spare link based reliable Network-on-Chip design","authors":"Navonil Chatterjee, N. Prasad, S. Chattopadhyay","doi":"10.1109/ISVDAT.2014.6881036","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881036","url":null,"abstract":"In this paper we have presented a reliable On-chip interconnection network design using spare links. It helps to mitigate the problem of fault chain formation due to failure of boundary links. The modified router design uses the redundant ports in boundary routers along with spare links for establishing connection with adjacent routers in case of link faults. This design modification on mesh based network along with proposed routing algorithm improves system reliability in case of single and multiple link failures. The performance evaluation in terms of network latency has also been improved compared to recent works with minimal area overhead.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116620910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Modelling and analysis of wireless communication over Networks-on-Chip 基于片上网络的无线通信建模与分析
18th International Symposium on VLSI Design and Test Pub Date : 2014-07-16 DOI: 10.1109/ISVDAT.2014.6881044
Apoorv Kumar, H. Kapoor
{"title":"Modelling and analysis of wireless communication over Networks-on-Chip","authors":"Apoorv Kumar, H. Kapoor","doi":"10.1109/ISVDAT.2014.6881044","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881044","url":null,"abstract":"Multi-cores and many-cores are becoming the next computing platform with the interconnection bus becoming the new bottleneck. The bus is replaced by a Network-on-Chip (NoC) for scalability issues. However, the NoC still being RC-wire based links, there are limitations in the transmission speed. As we reach far more denser integration, the problem is likely to aggravate. Wireless interconnects holds a good promise to solve the speed and scalability issue. In this paper we analyse the improvements offered by wireless links as shortcut interconnects in wormhole based NoCs. We measure latency and throughput and observe their variations by altering congestion level, represented by Packet Injection Rate (PIR) and channel count. Using a simple Media Access Control (MAC) protocol we analyse the effect of number of channels and traffic, demonstrating the advantage of using wireless NoC.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129065086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
VLSI implementation of novel fast confluence ICA algorithm for signal processing applications VLSI实现新型快速融合ICA算法,用于信号处理应用
18th International Symposium on VLSI Design and Test Pub Date : 2014-07-16 DOI: 10.1109/ISVDAT.2014.6881086
M. Ranjith, N. Muniraj
{"title":"VLSI implementation of novel fast confluence ICA algorithm for signal processing applications","authors":"M. Ranjith, N. Muniraj","doi":"10.1109/ISVDAT.2014.6881086","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881086","url":null,"abstract":"Independent component analysis is an iterative procedure to extract sources from observed mixtures. Power area and Convergence speed are important parameters to be improved in VLSI implementation of Independent component analysis (ICA) techniques. This paper presents VLSI implementation of novel fast confluence adaptive independent component analysis (FCAICA) technique which has reduced power, area and improved convergence speed. The reduction in area and power is achieved by hardware optimization scheme and high convergence speed is achieved by a novel optimization scheme that adaptively changes the weight vector based on the kurtosis value. To increase the number precision and dynamic range of the signals, floating-point (FP) arithmetic units are used. Simulation, Synthesis, Floor planning, Placement, Routing are carried out and data stream are created with Cadence Tool 10.1. The FCA ICA algorithm operates at 2.91MHz with 12.092 mW of power in 0.18um technology. It is more effective compared with most popular FastICA algorithm.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124387149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A locally reconfigurable Network-on-Chip architecture and application mapping onto it 一个本地可重构的片上网络体系结构和映射到它的应用程序
18th International Symposium on VLSI Design and Test Pub Date : 2014-07-16 DOI: 10.1109/ISVDAT.2014.6881041
J. Soumya, Ashish Sharma, S. Chattopadhyay
{"title":"A locally reconfigurable Network-on-Chip architecture and application mapping onto it","authors":"J. Soumya, Ashish Sharma, S. Chattopadhyay","doi":"10.1109/ISVDAT.2014.6881041","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881041","url":null,"abstract":"This paper presents a reconfigurable Network-on-Chip (NoC) architecture built around mesh topology. It provides the facility of changing the attachment of cores to local routers across applications. Applications share cores, but communication pattern between them may vary. Compared to many other reconfigurable NoCs, our architecture needs only about 0.2% extra area overhead than simple mesh. Application mapping and reconfiguration policy have been developed using Integer Linear Programming (ILP) and heuristic for the proposed topology. It has been shown that the reconfiguration strategy could improve communication costs of applications significantly which often resulted in improved latency and energy values, keeping throughput unaffected.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126316820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Pseudo-Deadline Based O(1) proportional share scheduler for embedded systems 基于伪截止日期的嵌入式系统O(1)比例共享调度器
18th International Symposium on VLSI Design and Test Pub Date : 2014-07-16 DOI: 10.1109/ISVDAT.2014.6881083
Swarnendu Ray, A. Sarkar
{"title":"A Pseudo-Deadline Based O(1) proportional share scheduler for embedded systems","authors":"Swarnendu Ray, A. Sarkar","doi":"10.1109/ISVDAT.2014.6881083","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881083","url":null,"abstract":"This paper presents Pseudo-Deadline Based Round-Robin (PDBRR), an O(1) proportional share scheduler for Embedded Systems that execute a mix of jobs with varying timeliness priorities. Simulation based experimental results reveal that PDBRR is able to achieve high proportional share scheduling accuracy.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"374 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115948797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Loop unrolling with fine grained power gating for runtime leakage power reduction 循环展开与细粒度功率门控运行时泄漏功率减少
18th International Symposium on VLSI Design and Test Pub Date : 2014-07-16 DOI: 10.1109/ISVDAT.2014.6881084
Sumanta Pyne, A. Pal
{"title":"Loop unrolling with fine grained power gating for runtime leakage power reduction","authors":"Sumanta Pyne, A. Pal","doi":"10.1109/ISVDAT.2014.6881084","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881084","url":null,"abstract":"The present work introduces a compilation technique to reduce runtime leakage power of functional units of a processor by combining loop unrolling with power gating. The instructions in the unrolled loop are scheduled to provide opportunities for power gating the functional units which are not in need for a considerable amount of time. The number of clock cycles taken by the power gating instructions is less than or equal to the number of clock cycles saved by loop unrolling. This results in 23-64% reduction of the total energy consumed by the benchmark programs without any degradation of performance.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116247700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An LUT based RNS FIR filter implementation for reconfigurable applications 基于LUT的可重构应用的RNS FIR滤波器实现
18th International Symposium on VLSI Design and Test Pub Date : 2014-07-16 DOI: 10.1109/ISVDAT.2014.6881047
Srinivasa Reddy Kotha, Sumit Bajaj, S. K. Sahoo
{"title":"An LUT based RNS FIR filter implementation for reconfigurable applications","authors":"Srinivasa Reddy Kotha, Sumit Bajaj, S. K. Sahoo","doi":"10.1109/ISVDAT.2014.6881047","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881047","url":null,"abstract":"In this work, two approaches to realize a look up table (LUT) based finite impulse response (FIR) filter using Residue Number System (RNS) are proposed. The proposed implementations take advantage of shift and add approach offered by the chosen moduli set. The two proposed filter architecture are compared with an earlier proposed version of reconfigurable RNS FIR filter. The filters are synthesized using Cadence RTL compiler in UMC 90 nm technology. The performance of the filters are compared in terms of Area (A), Power (P), and Delay (T). The results show that one of the proposed architecture offers significant improvement in terms of delay, while the second approach is well suited for applications that require minimal power and area. Both implementations offer advantage in area-delay AT and power-delay-product PTP. Proposed approaches are also verified functionally using Altera DSP Builder.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128038616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Power analysis attack using neural networks with wavelet transform as pre-processor 以小波变换为预处理的神经网络功率分析攻击
18th International Symposium on VLSI Design and Test Pub Date : 2014-07-16 DOI: 10.1109/ISVDAT.2014.6881059
P. Saravanan, P. Kalpana, V. Prcethisri, V. Sneha
{"title":"Power analysis attack using neural networks with wavelet transform as pre-processor","authors":"P. Saravanan, P. Kalpana, V. Prcethisri, V. Sneha","doi":"10.1109/ISVDAT.2014.6881059","DOIUrl":"https://doi.org/10.1109/ISVDAT.2014.6881059","url":null,"abstract":"This work proposes a novel methodology to perform power analysis attack on secure system by using wavelet transform as a pre-processor followed by machine learning technique. The proposed methodology uses known plain text attack. The power supply current traces from the cryptographic device are obtained by varying the atmospheric temperature. Then the current traces are pre-processed by using wavelet transform, data normalization and principal component analysis (PCA). The featured data samples selected by the pre-processor are then used to train the neural network. Through supervised learning algorithm and wavelet pre-processing, we are able to achieve around 25% improvement in guessing the secret key when compared to existing method of machine learning alone.","PeriodicalId":217280,"journal":{"name":"18th International Symposium on VLSI Design and Test","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115235902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信