J. Henkel, N. Vijaykrishnan, S. Parameswaran, J. Teich
{"title":"Run-time adaption for highly-complex multi-core systems","authors":"J. Henkel, N. Vijaykrishnan, S. Parameswaran, J. Teich","doi":"10.1109/CODES-ISSS.2013.6659000","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659000","url":null,"abstract":"As embedded on-chip systems grow more and more complex and are about to be deployed in automotive and other demanding application areas (beyond the main-stream of consumer electronics), run-time adaptation is a prime design consideration for many reasons: i) reliability is a major concern when migrating to technology nodes of 32nm and beyond, ii) efficiency i.e. computational power per Watt etc. is a challenge as computing models do not keep up with hardware-provided computing capabilities, iii) power densities increase rapidly as Dennard Scaling fails resulting in what is dubbed “Dark Silicon”, iv) highly complex embedded applications are hard to predict etc. All these scenarios (and further not listed here) make proactive and sophisticated run-time adaption techniques a prime design consideration for generations of multi-core architectures to come. The intend of this paper is to present problems and solutions of top research initiatives from diverse angles with the common denominator of the dire need for run-time adaption: The first part tackles the thermal problem i.e. high power densities and the related short and long-term effects it has on the reliability and it presents scalable techniques to cope the related problems. The second section demonstrates the potential of steep slope devices on thread scheduling of multi-cores. The third approach presents embedded pipelined architectures running complex multi-media applications whereas the fourth section introduces the paradigm of invasive computing i.e. a novel computing approach promising high efficiency through a highly-adaptive hardware/software architecture. In summary, the paper presents snapshots on four highly-adaptive solutions and platforms from different angles for challenges of complex future multi-core systems.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134104887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bound-oriented parallel pruning approaches for efficient resource constrained scheduling of high-level synthesis","authors":"Mingsong Chen, Lei Zhou, G. Pu, Jifeng He","doi":"10.1109/CODES-ISSS.2013.6659001","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659001","url":null,"abstract":"As a key step of high-level synthesis (HLS), resource constrained scheduling (RCS) tries to find an optimal schedule which can dispatch all the operations with minimum latency under specific resource constraints. Branch-and-bound heuristics are promising to achieve such an optimal schedule quickly, since they can prune away large parts of infeasible solution space during the exploration. However, few of them are based on the prevalent multi-core platforms. Based on the bound information, this paper exploits the parallel pruning potentials from different perspectives and proposes various efficient techniques that can substantially reduce the overall RCS search efforts. The experimental results demonstrate that our approach can reduce the RCS time drastically.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"416 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134116151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the automatic generation of GPU-oriented software applications from RTL IPs","authors":"N. Bombieri, F. Fummi, S. Vinco","doi":"10.1109/CODES-ISSS.2013.6658999","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6658999","url":null,"abstract":"Graphics processing units (GPUs) have been explored as a new computing paradigm for accelerating computation intensive applications. In particular, the combination between GPUs and CPU has proved to be an effective solution for accelerating the software execution, by mixing the few CPU cores optimized for serial processing with many smaller GPU cores designed for massively parallel computations. In addition, sustained by the need of low power consumption besides high performance, a recent trend is combining GPUs and CPU onto a single die (e.g., AMD Fusion, Intel Sandy Bridge, NVIDIA Tegra). The good tradeoff between computing capability and power consumption makes the integrated GPUs a promising alternative for accelerating a wide range of software application for embedded systems. Nevertheless, algorithms must be redesigned to take advantage of these architectures and such a manual parallelization often results in being unsatisfactory. This paper presents a methodology to automatically generate software applications for GPUs, by reusing existing and preverified register-transfer level (RTL) intellectual-properties (IPs). The methodology aims at exploiting the intrinsic parallelism of RTL IPs (such as process concurrency and pipeline micro-architecture) for generating the parallel software implementation of the functionality. The experimental results show how the performance obtained by running the RTL functionality as software applications on GPUs outperform those provided by the RTL code mapped into a hardware accelerator.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133780749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuo Li, Nasim Farahini, A. Hemani, Kathrin Rosvall, I. Sander
{"title":"System level synthesis of hardware for DSP applications using pre-characterized function implementations","authors":"Shuo Li, Nasim Farahini, A. Hemani, Kathrin Rosvall, I. Sander","doi":"10.1109/CODES-ISSS.2013.6659003","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6659003","url":null,"abstract":"SYLVA is a system level synthesis framework that transforms DSP sub-systems modeled as synchronous data flow into hardware implementations in ASIC, FPGAs or CGRAs. SYLVA synthesizes in terms of pre-characterized function implementations (FTMPs). It explores the design space in three dimensions, number of FTMPs, type of FTMPs and pipeline parallelism between the producing and consuming FTMPs. We introduce timing and interface model of FTMPs to enable reuse and automatic generation of Global Interconnect and Control (GLIC) to glue the FTMPs together into a working system. SYLVA has been evaluated by applying it to five realistic DSP applications and results analyzed for design space exploration, efficacy in generating GLIC by comparing to manually generated GLIC and accuracy of design space exploration by comparing the area and energy costs considered during the design space exploration based on pre-characterized FIMPs and the final results.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121988454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Da-Cheng Juan, S. Garg, Jinpyo Park, Diana Marculescu
{"title":"Learning the optimal operating point for many-core systems with extended range voltage/frequency scaling","authors":"Da-Cheng Juan, S. Garg, Jinpyo Park, Diana Marculescu","doi":"10.1109/CODES-ISSS.2013.6658995","DOIUrl":"https://doi.org/10.1109/CODES-ISSS.2013.6658995","url":null,"abstract":"Near-Threshold Computing (NTC) has emerged as a solution that promises to significantly increase the energy efficiency of next-generation multi-core systems. This paper evaluates and analyzes the behavior of dynamic voltage and frequency scaling (DVFS) control algorithms for multi-core systems operating under near-threshold, nominal, or turbo-mode conditions. We adapt the model selection technique from machine learning to learn the relationship between performance and power. The theoretical results show that the resulting models satisfy convexity properties essential to efficiently determining optimal voltage/frequency operating points for minimizing energy consumption under throughput constraints or maximizing throughput under a given power budget. Our experimental results show that, compared with DVFS in the conventional operating range, extended range DVFS control including turbo-mode and near-threshold operation achieves an additional (1) 13.28% average energy reduction under isoperformance conditions, and (2) 7.54% average throughput increase under iso-power conditions.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121667835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online OLED dynamic voltage scaling for video streaming applications on mobile devices","authors":"Mengying Zhao, Yiran Chen, Xiang Chen, C. Xue","doi":"10.1145/2518148.2518156","DOIUrl":"https://doi.org/10.1145/2518148.2518156","url":null,"abstract":"While OLED is replacing LCD and becoming the display of choice for mobile devices, display still consumes a large portion of total mobile device's power. Reducing OLED display power is of paramount importance for battery-powered mobile devices. With the explosive usage of video streaming on mobile devices, this paper proposes an online dynamic voltage scaling (DVS) approach for mobile video applications to reduce OLED display power consumption. A time-conscious DVS scheme, including scene change detection, voltage initialization and representative-region based voltage adjustment is developed and applied in video streaming. Based on the proposed scheme, flexible OLED DVS solutions can be adaptively derived according to timing constraints. Experimental results show that the proposed online technique achieves 17.3% power saving on average when compared with OLED display without DVS, which is 42.1% of the offline DVS power savings, while keeping more than 99% frames displayed in high quality.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127401639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}