Piia Saastamoinen, J. Nurmi, I. Saastamoinen, Mikko Laiho
{"title":"Minimizing area costs in GPS applications on a programmable DSP by code compression","authors":"Piia Saastamoinen, J. Nurmi, I. Saastamoinen, Mikko Laiho","doi":"10.1109/SOCC.2009.5335669","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335669","url":null,"abstract":"The amount of applications requiring personal satellite based navigation is growing rapidly at the moment. Complexity of the GPS (Global Positioning System) navigation algorithms and thus the memory requirements for the systems are growing at the same pace as the demands from customers. The large program memory footprint can be efficiently reduced by code compression. In this paper we describe in detail the analysis and compression procedures of typical GPS functions, as well as the on-chip decompression flow. For the GPS functions, our compression scheme achieves compression ratio of 55% at best.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132568771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous PVT-tolerant voltage-island formation and core placement for thousand-core platforms","authors":"S. Majzoub, R. Saleh, S. Wilton, R. Ward","doi":"10.1109/SOCC.2009.5335688","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335688","url":null,"abstract":"In this paper, we propose a novel approach to voltage island formation and core placement for energy optimization in manycore architectures under parameter variations at pre-fabrication stage. We group the cores into irregular \"cloud-shaped\" voltage islands. The islands are created by balancing the desire to limit the spatial extent of each island, to reduce PVT impact, with the communication patterns between islands. Compared to using rectangular islands, our approach leads to power improvements between 10 and 12%.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125863567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soft NMR: Exploiting statistics for energy-efficiency","authors":"Eric P. Kim, R. Abdallah, Naresh R Shanbhag","doi":"10.1109/SOCC.2009.5335677","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335677","url":null,"abstract":"Achieving energy-efficiency in nanoscale CMOS process technologies is made challenging due to the presence of process, temperature and voltage variations. In this paper, we present soft N-modular redundancy (soft NMR) that consciously exploits statistics of errors due to these nanoscale artifacts in order to design robust and energy-efficient systems. In contrast to conventional NMR, soft NMR employs estimation and detection techniques in the voter. We compare NMR and soft NMR in the design of an energy-efficient and robust discrete cosine transform (DCT) image coder. Simulations in a commercial 45nm, 1.2V, CMOS process show that soft triple-MR (TMR) provides 10× improvement in robustness and 13% power savings over TMR at a peak signal-to-noise ratio (PSNR) of 20dB. In addition, soft dual-MR (DMR) provides 2× improvement in robustness and 35% power savings over TMR at a PSNR of 20dB.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"39 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126122857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Garzia, Roberto Airoldi, J. Nurmi, Carmelo Giliberto, C. Brunelli
{"title":"Mapping of the FFT on a reconfigurable architecture targeted to SDR applications","authors":"F. Garzia, Roberto Airoldi, J. Nurmi, Carmelo Giliberto, C. Brunelli","doi":"10.1109/SOCC.2009.5335655","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335655","url":null,"abstract":"This paper describes the implementation of a FFT on a system based on a GP core and a reconfigurable coarse-grain accelerator. The entire system has been prototyped on an Altera Stratix II device. On the prototype a 1024-point FFT gives a 40X speed-up in comparison with the software implementation. The 1024-point FFT is executed in 400μβ. Considering an ASIC synthesis of the coarse-grain array, the 1024-point FFT is executed in 42μβ, against the 104μβ of a DSP implementation.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"269 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parameterizing simulated annealing for distributing Kahn Process Networks on multiprocessor SoCs","authors":"Heikki Orsila, E. Salminen, T. Hämäläinen","doi":"10.1109/SOCC.2009.5335683","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335683","url":null,"abstract":"Mapping an application on multiprocessor system-on-chip (MPSoC) is a crucial step in architecture exploration. The problem is to minimize optimization effort and application execution time. Simulated annealing (SA) is a versatile algorithm for hard optimization problems, such as task distribution on MPSoCs. We propose an improved automatic parameter selection method for SA to save optimization effort. The method determines a proper annealing schedule and transition probabilities for SA, which makes the algorithm scalable with respect to application and platform size. Applications are modeled as Kahn process networks (KPNs). The method was improved to optimize KPNs and save optimization effort by doing sensitivity analysis for processes. The method is validated by mapping 16 to 256 node KPNs onto an MPSoC. We optimized 150 KPNs for 3 architectures. The method saves over half the optimization time and loses only 0.3% in performance to non-automated SA. Results are compared to non-automated SA, Group migration, random mapping and brute force algorithms. Global optimum solution are obtained by brute force and compared to our heuristics. Global optimum convergence for KPNs has not been reported before. We show that 35% of optimization runs reach within 5% of the global optimum. In one of the selected problems global optimum is reached in as many as 37% of optimization runs. Results show large variations between KPNs generated with different parameters. Cyclic graphs are found to be harder to parallelize than acyclic graphs.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128757937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated instrumentation of FPGA-based systems for system-level transaction monitoring","authors":"P. McKechnie, Michaela Blott, W. Vanderbauwhede","doi":"10.1109/SOCC.2009.5335653","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335653","url":null,"abstract":"Modern FPGA-based systems are complex and difficult to verify. One approach to easing the verification problem and reducing perceived complexity is to use libraries of reusable functions. These reusable functions, known as intellectual property blocks, are commonly created as netlists or RTL components. Complex systems can be created from IP blocks by using high-level design environments. These tools define the types and semantics of component interfaces which permit systems to be debugged using system-level transaction monitoring. However, the insertion of on-chip monitoring circuitry is a manual process in FPGA design flows. In this paper we present an algorithm which exploits the high-level design environment to permit automatic instrumentation of designs. We demonstrate that the algorithm can harness existing HDL generation techniques and reduce the insertion and configuration effort required of the designer.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127448777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy and bandwidth aware mapping of IPs onto regular NoC architectures using Multi-Objective Genetic Algorithms","authors":"K. Bhardwaj, R. Jena","doi":"10.1109/SOCC.2009.5335684","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335684","url":null,"abstract":"This paper presents energy and bandwidth aware topo-logical mapping of Intellectual Properties (IPs) onto regular tile-based Network-on-Chip (NoC) architectures. One-one mapping as well as many-many mapping are being taken in to consideration between switches and tiles in the proposed approach. In view of minimizing energy and link bandwidth requirements of the NoC-based designs, the approach focuses both the computational and communication synthesis. A Multi-Objective Genetic Algorithms (MOGA) based technique is used to find optimal solution from the pareto-optimal solutions. This technique has been implemented and evaluated for randomly generated benchmarks as well as real-life applications like multi-media system (MMS). The experimental results demonstrate savings up to 70% and 20% of energy and link bandwidth respectively. These results include performance evaluation of One-One vs. Many-Many mapping that clearly shows the effectiveness of the proposed approach.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133995898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Di Wu, J. Eilert, R. Asghar, Dake Liu, A. Nilsson, E. Tell, Eric Alfredsson
{"title":"System architecture for 3GPP LTE modem using a programmable baseband processor","authors":"Di Wu, J. Eilert, R. Asghar, Dake Liu, A. Nilsson, E. Tell, Eric Alfredsson","doi":"10.4018/jertcs.2010070103","DOIUrl":"https://doi.org/10.4018/jertcs.2010070103","url":null,"abstract":"3G evolution towards HSPA (High Speed Packet Access) and LTE (Long-Term Evolution) is ongoing which will substantially increase the throughput with higher spectral efficiency. This paper presents the system architecture of an LTE modem based on a programmable baseband processor. The architecture includes a baseband processor that handles processing such as time and frequency synchronization, IFFT/FFT (up to 2048-p), channel estimation and subcarrier demapping. The throughput and latency requirements of a Category 4 User Equipment (CAT4 UE) is met by adding a MIMO symbol detector and a parallel Turbo decoder supporting H-ARQ. This brings both low silicon cost and enough flexibility to support other wireless standards. The complexity demonstrated by the modem shows the practicality and advantage of using programmable baseband processors for a single-chip LTE solution.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116282692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of memory access optimization for motion compensation frames in MPEG-4","authors":"Haitham Habli, J. Lilius, Johan Ersfolk","doi":"10.1109/SOCC.2009.5335666","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335666","url":null,"abstract":"One of the big sources for energy consumption in modern video coding algorithms is the access to the reference frame, that is required in the motion compensation calculations. It is challenging to decrease this energy consumption because of the irregularity of the access. In this paper we evaluate an approach in which we presort the motion vectors so as to increase the locality of the accesses.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115096807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Physical realization oriented area-power-delay tradeoff exploration","authors":"V. Gierenz, C. Panis, J. Nurmi","doi":"10.1109/SOCC.2009.5335681","DOIUrl":"https://doi.org/10.1109/SOCC.2009.5335681","url":null,"abstract":"High level design-space exploration methodologies focus on optimizations on application and architectural abstraction layer. For power, leakage, and cost sensitive, as well as for performance critical SoC building blocks like embedded domain-specific processors and application specific accelerators, parasitic physical realization effects strongly influence the actual architecture efficiency. The tradeoff between architectural choices and physical implementation consequences needs to be considered to optimize area-power-performance efficiency. In this paper a semi-automated methodology is described that supports architectural optimizations with quantitative feedback on physical realizations already in an early design-space exploration phase. The presented methodology accounts for parasitic effects at the physical realization level, enables an efficient quantitative implementation tradeoff exploration for the design of high-performance SoC building blocks, and provides the foundation for a directed optimization throughout the design process.","PeriodicalId":389625,"journal":{"name":"2009 International Symposium on System-on-Chip","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121090203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}