A. Nikitakis, Savvas Papaioannou, I. Papaefstathiou
{"title":"A novel low-power embedded object recognition system working at multi-frames per second (Extended abstract)","authors":"A. Nikitakis, Savvas Papaioannou, I. Papaefstathiou","doi":"10.1145/2435227.2435229","DOIUrl":"https://doi.org/10.1145/2435227.2435229","url":null,"abstract":"One very important challenge in the field of multimedia is the implementation of fast and detailed Object Detection and Recognition systems. In particular, in the current state-of-the-art mobile multimedia systems, it is highly desirable to detect and locate certain objects within a video frame in real time. In this paper, we present a novel FPGA-based embedded implementation of a very efficient object recognition algorithm called Receptive Field Cooccurrence Histograms Algorithm(RFCH). Our main focus was to increase its performance so as to be able to handle the object recognition task of today's highly sophisticated embedded multimedia systems while keeping its energy consumption at very low levels. Our low-power embedded reconfigurable system is at least 15 times faster than the software implementation on a low-voltage high-end CPU, while consuming at least 60 times less energy. Our novel system is also 88 times more energy efficient than the recently introduced low-power multi-core Intel devices which are optimized for embedded systems.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122844958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Static prediction of recursion frequency using machine learning to enable hot spot optimizations","authors":"D. Tetzlaff, S. Glesner","doi":"10.1109/ESTIMedia.2012.6507027","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507027","url":null,"abstract":"Recursion poses a severe problem for static optimizations because its execution frequency usually depends upon runtime values, hence being rarely predictable at compile time. As a consequence, optimization potential of programs is sacrificed since possible hot paths where most of the execution time is spent and where optimization would be beneficial might be undiscovered. In this paper, we propose a sophisticated machine learning based approach to statically predict the recursion frequency of functions for programs in real-world application domains, which can be used to guide various hot spot optimizations. Our experiments with 369 programs of 25 benchmark suites from different domains demonstrate that our approach is applicable to a wide range of programs with different behavior and yields more precise heuristics than those generated by pure static analyses. Moreover, our results provide valuable insights into recursive structures in general, when they appear and how deep they are.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120933792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianhua Li, Liang Shi, Qing'an Li, C. Xue, Yinlong Xu
{"title":"TEACA: Thread ProgrEss Aware Coherence Adaption for hybrid coherence protocols","authors":"Jianhua Li, Liang Shi, Qing'an Li, C. Xue, Yinlong Xu","doi":"10.1109/ESTIMedia.2012.6507024","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507024","url":null,"abstract":"Hybrid coherence protocols can provide the scalability of directory protocols and low latency sharing miss handling in snooping protocols simultaneously. Unfortunately, how to adapt the hybrid protocols at runtime is not well studied. This paper proposes Thread ProgrEss Aware Coherence Adaption (TEACA) which utilizes the thread progress information as the hints to adapt hybrid coherence protocols. Specifically, TEACA fuses the memory system statistics to estimate the progress of threads. Based on the estimated thread progress information, TEACA dynamically categorizes threads into leader threads and laggard threads. The thread categorization decisions are then leveraged for efficient coherence adaption in hybrid coherence protocols. A case study on a recently proposed hybrid protocol (PATCH [29]) shows that, with the hints from TEACA, the enhanced hybrid protocol outperforms its baseline in both application execution time and energy dissipation.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133007499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing user experiences by exploiting energy and launch delay tradeoff of mobile multimedia applications (Extended abstract)","authors":"Yi-Fan Chung, Yin-Tsung Lo, C. King","doi":"10.1109/ESTIMedia.2012.6507034","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507034","url":null,"abstract":"The growing multimedia applications on smart phones place ever more stringent demands on user experiences. A key factor affecting user experiences is the delay in launching applications. It affects a user's perception of the responsiveness of the phone and the multimedia applications.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125254908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mapping of streaming applications considering alternative application specifications (Extended abstract)","authors":"J. Zhai, Hristo Nikolov, T. Stefanov","doi":"10.1145/2435227.2435230","DOIUrl":"https://doi.org/10.1145/2435227.2435230","url":null,"abstract":"Streaming applications often require a parallel Model of Computation (MoC) to specify their application behavior and to facilitate mapping onto Multi-Processor System-on-Chip (MPSoC) platforms. Various performance requirements and resource budgets of embedded systems ask for an efficient design space exploration (DSE) approach to select the best design from a design space consisting of a large number of design choices. However, existing DSE approaches explore the design space that includes only architecture and mapping alternatives for an initial application specification given by the application designer. In this paper, we first show that a design often might not be optimal if alternative specifications of a given application are not taken into account. We further argue that the best alternative specification consists of only independent and load-balanced application tasks. Based on the Polyhedral Process Network (PPN) MoC, we present an approach to analyze and transform an initial PPN to an alternative one that contains only independent processes if possible. Finally, by prototyping real-life applications on both FPGA-based MPSoCs and desktop multi-core platforms, we demonstrate that mapping the alternative application specification results in a large performance gain compared to those approaches, in which alternative application specifications are not taken into account.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126992714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Loop instruction caching for energy-efficient embedded multitasking processors","authors":"Ji Gu, T. Ishihara, Kyungsoo Lee","doi":"10.1109/ESTIMedia.2012.6507036","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507036","url":null,"abstract":"With the exponential increase of power consumption in processor generations, energy dissipation has become one of the most critical constraints in system design. Cache memories are usually the most energy consuming components on the processor chip due to their large die size occupation and frequent access operations. Furthermore, in step with the increased complexity of modern embedded applications, microprocessors are increasingly executing multitasking applications. In multitasking processors, the conventional L1 instruction cache (I-cache) is usually shared by multiple tasks and thereby suffering a highly intensive read/write operations, which can be even more energy-consuming than used in a single-task based system. This paper presents an energy-efficient shared multitasking loop instruction cache (SMLIC), which is designed to address the tasks sharing and context switch issues so that it can be efficiently utilized to reduce the I-cache accesses for energy savings in multitasking processors. Experiments on a set of multitasking applications demonstrate that the proposed SMLIC design scheme can reduce I-cache accesses by 12∼86% and energy consumption in instruction supply by 11∼79% for multitasking system, depending on various frequencies of context switch.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129925830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keynote: “Design space exploration and run-time resource management in the embedded multi-core era”","authors":"S. Bampi","doi":"10.1109/ESTIMedia.2012.6507016","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507016","url":null,"abstract":"Increasingly demanding complex algorithms for multimedia systems and higher resolutions for multiview videos hit power and memory walls in portable hardware. Silicon IC technology scaling is reaching two-dimensional limitations that accompany escalating technology cost wall. In this scenario the severe costs of power density, circuit performance variability and energy constraints call for new algorithms-to-architecture approaches. This talk will highlight the architectures and circuits techniques that will influence multimedia systems architectures in the future. Design challenges and specific solutions that deal with energy dissipation in the case of multiview video are addressed. In this presentation the technology-design-architecture-algorithms interactions are pointed as drivers for new cross-layer optimizations in energy-constrained multimedia systems.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133361485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"O2render: An OpenCL-to-Renderscript translator for porting across various GPUs or CPUs","authors":"Cheng-yan Yang, Yi-jui Wu, S. Liao","doi":"10.1109/ESTIMedia.2012.6507031","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2012.6507031","url":null,"abstract":"More than half-a-billion Android devices are world's most impactful real-time, interactive multimedia systems that are open-sourced. Google introduced Renderscript language and runtime in Android releases starting in 2011. Renderscript delivers performance and portability without losing usability. However, it is difficult to reuse software written in existing compute languages such as OpenCL. Thus, we develop the O2render system to enable OpenCL programs on Android devices. We analyze fundamental differences between OpenCL and Renderscript, and present our design of a translator between them using low-level virtual machine (LLVM). We extend LLVMs frontend, Clang, and show that we achieve about the same performance in Renderscript with minimal translation overhead.","PeriodicalId":431615,"journal":{"name":"2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126959824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}