{"title":"Efficient reconfigurable architecture for MIMD streaming execution using permutation network","authors":"Chia-Wen Cheng, Yu Sheng Lin, Shao-Yi Chien","doi":"10.1109/SiPS.2014.6986090","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986090","url":null,"abstract":"Reconfigurable architectures grant many circuits more flexibility as well as more efficiency. By dynamically reconnecting the datapath between calculation units, we can optimize the performance of many designs. Inspired by some prior works, we proposed a new MIMD Streaming (MIMDS) execution scheme on the aid of reconfigurable design, featuring high efficient stream processing. In this work, we also take the locality of programs into account when designing our reconfigurable architecture. Therefore, we use the permutation network [1] as our reconfigurable path, which provides less but enough reconfigurability, leading to less area cost and less power consumption. In this paper, we will take a commercial processor, C54x from Texas Instrument [2], as example, as well as detail the modification from the baseline C54x to our proposed MIMDS architecture. We show that with the extra ALUs and efficient datapath, C54x with MIMDS feature has overall 63% less execution cycles and 45% less memory access at most. Compared with traditional C54x, our design has only 12% area overhead. Besides, if we consider only configurable network, our permutation network saves 85% area compared to fully reconfigurable datapath while supports sufficient reconfigurability.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128484684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power-awarness in coarse-grained reconfigurable designs: A dataflow based strategy","authors":"F. Palumbo, Carlo Sau, L. Raffo","doi":"10.1109/SiPS.2014.6986104","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986104","url":null,"abstract":"Applications and hardware complexity management in modern systems tend to collide with efficient resource and power balance. Therefore, dedicated and power-aware design frameworks are necessary to implement efficient multi-functional runtime reconfigurable signal processing platforms. In this work, we adopt dataflow specifications as a starting point to challenge power minimization.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131758621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Earliest-deadline first scheduling of multiple independent dataflow graphs","authors":"A. Bouakaz, T. Gautier, J. Talpin","doi":"10.1109/SiPS.2014.6986102","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986102","url":null,"abstract":"Static dataflow graphs are widely used in design of concurrent real-time streaming applications on multiprocessor systems-on-chip. The increasing complexity of these systems advocates using real-time operating systems and dynamic scheduling to manage applications and resources. Providing timing guarantees (e.g. minimum throughput, deadlines) and minimizing the required amount of resources (e.g. number of processors, buffer capacities) are crucial aspects of these systems. This paper addresses uniprocessor and partitioned multiprocessor earliest-deadline first scheduling of multiple concurrent applications, each designed as an independent dataflow graph. Our scheduling approach maps each actor to a periodic realtime task and computes the appropriate buffer sizes and timing and scheduling parameters (i.e. periods, processor allocation, etc.). The proposed parametric schedulability analysis aims at maximizing the overall processor utilization, and hence allows for reducing the required number of processors.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116461499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal voltage signal sensing of NAND flash memmory for LDPC code","authors":"Shigui Qi, D. Feng, Jingning Liu","doi":"10.1109/SiPS.2014.6986077","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986077","url":null,"abstract":"Low-density parity-check (LDPC) code can provide stronger error correcting performance in NAND flash memory. LDPC decoder requires accurate soft-decision log-likelihood ratio (LLR) information which demands fine-grained flash memory threshold voltage sensing operations. The threshold voltage sensing operations incur energy consumption and access latency penalty. Therefore, it is important to minimize the flash memory sensing operations without noticeable error correcting performance decreasing. We propose a new flash memory sensing strategy Ununiform-SOR (ununiform sensing in overlapping region) which can reduce 20% flash memory sensing operations than traditional non-uniform threshold voltage sensing without reducing the error correcting performance of LDPC code in NAND flash memory noise channel. The new Ununiform-SOR sensing strategy can reduce more than 20% reading energy consumption than the non-uniform sensing strategy. The abstract goes here.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Sheikh, Elias Szabo-Wexler, Mehnaz Rahman, Wei Wang, B. Alexandrov, Dongmin Yoon, A. Chun, H. Alavi
{"title":"Channel-adaptive complex K-best MIMO detection using lattice reduction","authors":"F. Sheikh, Elias Szabo-Wexler, Mehnaz Rahman, Wei Wang, B. Alexandrov, Dongmin Yoon, A. Chun, H. Alavi","doi":"10.1109/SiPS.2014.6986064","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986064","url":null,"abstract":"Lattice reduction (LR) aided detectors mitigate the exponentially increasing complexity of large multiple-input, multiple-output (MIMO) systems while achieving near-optimal performance with low computational complexity. In this paper, a channel-adaptive complex-domain LR-aided K-best MIMO detector is presented that reduces the gap between the K-best sphere decoding (SD) detector and the maximum likelihood (ML) optimal MIMO detector. While maintaining BER performance, computational complexity is reduced by 50% over a conventional complex domain K-best SD detector by implementing a new on-demand complex-domain candidate symbol selection algorithm. Two tunable variables in the candidate selection process are introduced to enable both coarse-grained and fine-grained adaptation of computational complexity to channel conditions.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130756167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-adaptive performance management for self-sustained signal processing systems","authors":"Junlin Chen, Lei Wang","doi":"10.1109/SiPS.2014.6986058","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986058","url":null,"abstract":"This paper presents an energy-adaptive performance management technique for the design of embedded signal processing systems powered by renewable energy sources. By jointly considering the nondeterministic characteristics of renewable energy and the unique relationship between signal processing performance and the required energy consumption, a progressive performance tuning approach is developed to dynamically determine an acceptable signal processing performance in accordance with the changing energy level at runtime. Several practical issues such as the battery capacity are investigated, and their impacts on the proposed technique are evaluated. The proposed technique is applied to a DCT-based image sensing system. Simulation results demonstrate that by adaptively tuning signal processing kernels with renewable energy, significant improvements in time coverage and energy efficiency can be achieved in the presence of unstable harvested energy.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122916693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julian Hartig, Lukas Gerlach, G. P. Vayá, H. Blume
{"title":"Customizing a VLIW-SIMD application-specific instruction-set processor for hearing aid devices","authors":"Julian Hartig, Lukas Gerlach, G. P. Vayá, H. Blume","doi":"10.1109/SiPS.2014.6986072","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986072","url":null,"abstract":"Hardware architectures for modern hearing aid devices have to provide ultra low power consumption at a small silicon area and moderate computational performance to deal with the continuously growing complexity of hearing aid signal processing. At the same time, they need to remain flexible for future algorithmic changes. These challenging design goals can be achieved by using Application-Specific Instruction-Set Processors (ASIPs), where a baseline architecture is customized to the target class of applications. In this paper, hardware modifications of a generic VLIW-SIMD processor architecture targeting audio processing are described and their influence in area-performance efficiency and power are evaluated. As exemplary hearing aid signal processing application, the evaluated algorithms contain a complex modulated filter bank and a noise reduction algorithm. The proposed architecture requires 2 times less silicon area and a 6 times lower clock frequency than a Tensilica Xtensa LX4 when running the same algorithms under real-time conditions.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115415950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyuan Wei, Ming Yang, C. Chakrabarti, Richard Sampson, T. Wenisch, O. Kripfgans, J. Fowlkes
{"title":"A low complexity scheme for accurate 3D velocity estimation in ultrasound systems","authors":"Siyuan Wei, Ming Yang, C. Chakrabarti, Richard Sampson, T. Wenisch, O. Kripfgans, J. Fowlkes","doi":"10.1109/SiPS.2014.6986067","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986067","url":null,"abstract":"Vector flow imaging is a critical component in the clinical diagnosis of cardiovascular diseases; however, most current methods are too computationally expensive to scale well to 3D. Less complex techniques, such as Doppler-based imaging (which cannot provide lateral flow measurements) and basic speckle tracking algorithms (which have poor lateral accuracy), are incapable of producing high quality 3D measurements. In this paper, we first extend a technique designed to improve lateral flow accuracy for 2D velocity vector estimation, the synthetic lateral phase method, to 3D (SLP-3D). We then show that a straightforward implementation of this algorithm is too computationally complex for modern systems. Instead, we propose a two-tiered method that uses low complexity sum-of-absolute differences (SAD) for coarse-grained search and an optimized version of SLP-3D to fine tune the search for sub-pixel accuracy. We show that the proposed method (SAD+SLP-3Dopt) achieves a 9× reduction in computational complexity compared to the naive SLP-3D. Field II simulations for plug and parabolic flow using our method show a fairly high degree of accuracy in both the axial and the lateral components. Finally, we show our technique can support accurate flow imaging with up to 130 velocity estimations/sec within the power constraints of a handheld device.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132739178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanxiang Huang, Meng Li, Chunshu Li, P. Debacker, L. Perre
{"title":"Computation-skip error resilient scheme for recursive CORDIC","authors":"Yanxiang Huang, Meng Li, Chunshu Li, P. Debacker, L. Perre","doi":"10.1109/SiPS.2014.6986061","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986061","url":null,"abstract":"Aggressive voltage and frequency scaling are widely utilized to exploit the design margin introduced by the process, voltage and environment variations. However, scaling beyond the critical voltage or frequency results to numerous timing errors, and hence unacceptable output quality. In this paper, a computation-skip (CS) scheme is proposed for recursive digital signal processors with a fixed cycles per instruction (CPI) to correct timing errors. A CORDIC processor with the proposed CS scheme still functions when scaling beyond the sub-critical voltage or frequency. It improves EVM by 47.9 dB at its most critical frequency or supply voltage, and extends the voltage scaling limit by 90 mV w.r.t the conventional CORDIC. Besides, it is more than 1.7X energy efficient w.r.t. the conventional high-speed CORDIC, which is designed for a more aggressive scaling.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122597366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weighted least squares solution for RSS based localization in correlated shadowing","authors":"Zeyuan Li, Pei-Jung Chung","doi":"10.1109/SiPS.2014.6986066","DOIUrl":"https://doi.org/10.1109/SiPS.2014.6986066","url":null,"abstract":"In this paper, we study the received signal strength (RSS) based localization problem with correlated shadowing between pairs of RSS measurements. By linearizing the correlated RSS model, a weighted least squares (WLS) is formulated to obtain the target location. We also study the correlated shadowing when differential received signal strength (DRSS) is deployed as measurements. Numerical simulations show that the proposed algorithms outperform the algorithms that do not take the correlation between measurements into consideration.","PeriodicalId":167156,"journal":{"name":"2014 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125607742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}