{"title":"Automatic generation of parallelized FFT logic for implementation in FPGA chips","authors":"Tayler Sokalski, N. Manjikian","doi":"10.1109/NEWCAS.2014.6934021","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934021","url":null,"abstract":"This paper considers the automatic generation of parallelized fast-Fourier-transform (FFT) logic for field-programmable gate-array (FPGA) chips. A custom software tool has been created to generate VHDL logic descriptions for parallelized radix-4 FFT architectures using decimation-in-frequency (DIF). These architectures accept N simultaneously-provided fixed-point complex-valued input samples every cycle for applications that demand high throughput. Two approaches are described for generating the product terms in complex multiplications involving twiddle constants: standard single-cycle multiplication, and multi-cycle shift-and-add multiplication. Synthesis results are reported for parallelized FFT implementations of different sizes targeting low-cost Cyclone III chips and high-end Stratix III and IV chips from Altera. The shift-and-add approach for constant multiplication is shown to consume more logic resources, but provide a higher maximum clock frequency.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123674856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Back-annotation of gate-level power properties into system level descriptions","authors":"Najmeh Farajipour Ghohroud, Z. Navabi","doi":"10.1109/NEWCAS.2014.6934027","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934027","url":null,"abstract":"It is important to accurately estimate power consumption in early stages of the design in order to avoid costly redesign. System level design consists of defining transaction level (TL) communications between various components that can be either processors or cores described at system level using high level languages such as SystemC. An important component of a design at the system level is a processor, the accurate power estimation of which is necessary. In this paper we propose a new power estimation method with high accuracy and relatively fast simulation for general purpose processors. The average accuracy achieved is about 90% while the performance is better than the state-based estimation methods. Our proposed method is a hybrid method which uses the concepts of instruction-based, state-based and activity-based power estimation.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122313223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vehicle detection using TD2DHOG features","authors":"Mohamed A. Naiel, M. Ahmad, M. Swamy","doi":"10.1109/NEWCAS.2014.6934064","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934064","url":null,"abstract":"Histogram of oriented gradients (HOG) is often used for object detection in images. These HOG features of images can be referred to as 2DHOG when represented in a 2D matrix format instead of a 1D vector. In this paper, we propose a new vehicle detection algorithm by using 2DHOG in the discrete cosine transform (DCT) domain. The proposed technique consists of extracting 2DHOG from the input image and applying on it 2DDCT. This is followed by a low pass filtering in order to obtain novel features called as transform-domain 2DHOG (TD2DHOG). TD2DHOG is used with a classifier pyramid in order to reduce the multi-scale scanning cost. Experimental results show that the proposed algorithm when applied on two public vehicle detection datasets reduces the storage requirement of the classifier pyramid, while providing about the same performance as that provided by the state-of-the-art techniques.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127800567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Practical considerations for parameterized model order reduction of MEMS devices","authors":"Julian Santorelli, F. Nabki, R. Khazaka","doi":"10.1109/NEWCAS.2014.6934000","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934000","url":null,"abstract":"An algorithm for automatic generation of a MEMS device parameterized model from finite element analysis is proposed. Traditional model order reduction techniques via singular value decomposition are adapted for use with the large system matrices of the finite element method. The paper addresses and proposes solutions to practical issues concerning model order reduction when applied to finite element equations. Mesh generation, condition number and size of matrices are typical issues that are encountered. MEMS examples are studied to highlight these issues and to prove the effectiveness of such a model in reducing the system size significantly, while performing with high accuracy over a large range of parameter values.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114944807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pouya Kamalinejad, Kamyar Keikhosravy, Reza Molavi, S. Mirabbasi, Victor C. M. Leung
{"title":"An ultra-low-power CMOS voltage-controlled ring oscillator for passive RFID tags","authors":"Pouya Kamalinejad, Kamyar Keikhosravy, Reza Molavi, S. Mirabbasi, Victor C. M. Leung","doi":"10.1109/NEWCAS.2014.6934081","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934081","url":null,"abstract":"An ultra-low-power CMOS voltage-controlled ring oscillator (VCRO) for passive ultra-high-frequency (UHF) radio-frequency identification (RFID) tags is presented. The gates of the complementary CMOS transistors in pseudo-differential (PD) delay cells are biased through quasi-floating gate (QFG) technique. The boosted gate-drive voltage enables operation of the differential delay cells with supply voltages smaller than the minimum required overdrive voltage of the two stacked transistors and accordingly facilitates the oscillation at ultra-Iow-power regime. QFG biasing technique also offers an additional control knob to tune the output frequency of the ring oscillator. The proposed two-stage PD-VCRO is designed and laid-out in a standard 0.13-μm CMOS technology. A voltage level converter is also presented to interface the output of the proposed VCRO with the succeeding circuitry. The entire VCRO core occupies an area of 25 μm×20 μm For a supply voltage of as low as 140 mV, an output frequency of 4 MHz is achieved at 3.6 nW power consumption. Although the intended application for the proposed VCRO is passive RFID tags, the architecture can be used in other ultra-low-power applications.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122059452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Lenoir, D. Lattard, F. Dehmas, D. Morche, A. Jerraya
{"title":"Computational load reduction by downsampling for energy-efficient digital baseband","authors":"V. Lenoir, D. Lattard, F. Dehmas, D. Morche, A. Jerraya","doi":"10.1109/NEWCAS.2014.6934050","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934050","url":null,"abstract":"With the expansion of Internet of Things (IoT) the design of highly-integrated radio transceivers is a challenging task. The major concern is the power consumption, which is generally non-optimal due to the fluctuation of the communication channel quality. In this paper, we propose the use of downsampling techniques for decreasing the computational load of the baseband signal processing. More precisely, the paper compares two mechanisms using the IEEE 802.15.4 PHY in the 2.45 GHz band. It demonstrates that the performances remain close to the theoretical bound, even if only partial codes are used for the decoding. According to these results, it shows the impact of such techniques on the power consumption, considering a codes correlator unit. Thereby, it highlights the ability of downsampling to be used as an actuator in an adaptive baseband architecture, while preserving a good level of performances.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124796472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A reconfigurable buck-boost switched capacitor converter architecture for multiple, distributed on-chip load applications","authors":"Libin George, T. Lehmann, T. J. Hamilton","doi":"10.1109/NEWCAS.2014.6934083","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934083","url":null,"abstract":"This paper presents the design of a dual-output reconfigurable buck-boost switched capacitor converter architecture that can be adapted for applications requiring multiple, distributed on-chip loads. This system uses adaptive gain control and discrete frequency scaling to regulate power delivered. Core-interleaving and an enhanced load regulation scheme have also been adopted to improve performance. The converter provides a fully-integrated, low-area and fully digital solution. Design and implementation using a standard bulk CMOS 0.18μm process provide simulation results showing that the converter has an output voltage range of 1.0-2.2V, can deliver up to 5mA in load current and is up to 67% efficient.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116060404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clock domain crossing (CDC) for inter-logic-layer communication in 3-D ICs","authors":"Waqas Gul, S. R. Hasan, O. Hasan","doi":"10.1109/NEWCAS.2014.6934086","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934086","url":null,"abstract":"3D technology is becoming more popular due to improved design density and performance. However, single global clock distribution to a complex system, like 3-D IC, is a very challenging task. Due to potentially heterogeneous, dice integration also omens for increasing environmental and process non-idealities. Therefore, inter-logic layer communication in 3-D ICs can leverage from clock domain crossing (CDC) techniques to perform timely and correct data transactions. In this paper, we investigate two classes of CDC techniques, the pseudo quasi-delay insensitive (QDI) based GALS and loosely synchronous CDC technique under 3-D IC context. It is found that although pseudo QDI based GALS design provides an attractive solution because of the relaxed constraint on clock distribution network, but for 8 or higher data-bits/transaction its hardware overhead becomes more than the loosely synchronous design. To the best of authors' knowledge this is a premier work in investigating design guidelines for CDC techniques in through silicon via (TSV) based 3-D ICs.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131394705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing of an algebraic signature analyzer for mixed-signal systems","authors":"V. Geurkov, L. Kirischian","doi":"10.1109/NEWCAS.2014.6934009","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934009","url":null,"abstract":"In this paper, we propose a novel approach to designing an algebraic signature analyzer that can be employed for mixed-signal systems testing. Due to its algebraic nature, the analyzer does not contain carry propagating circuitry. This helps to improve its error immunity, as well as performance. The proposed scheme can also be used in arithmetic/algebraic error-control coding and cryptography.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123278286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taieb Lamine Ben Cheikh, G. Nicolescu, Jelena Trajkovic, Y. Bouchebaba, P. Paulin
{"title":"Fast and accurate implementation of Canny edge detector on embedded many-core platform","authors":"Taieb Lamine Ben Cheikh, G. Nicolescu, Jelena Trajkovic, Y. Bouchebaba, P. Paulin","doi":"10.1109/NEWCAS.2014.6934067","DOIUrl":"https://doi.org/10.1109/NEWCAS.2014.6934067","url":null,"abstract":"Image processing and computer vision applications are used intensively in several domains in particular multimedia and medicine. The main challenge in developing such applications is how to guarantee both high accuracy and low execution time. Accordingly, we observe two research directions: the first focuses on improving the algorithms and the second focuses on designing fast hardware platforms. In this paper, we propose an efficient parallel implementation of an accurate extended Canny edge detection algorithm suitable for medical applications on embedded many-core platform. The proposed implementation is running at a frame rate of 10 frames/s for image size of 512×512 with high accurate and smooth line edges.","PeriodicalId":216848,"journal":{"name":"2014 IEEE 12th International New Circuits and Systems Conference (NEWCAS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125192688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}