B. Juurlink, J. Lucas, Nadjib Mammeri, G. Keramidas, Katerina Pontzolkova, I. Aransay, Chrysa Kokkala, Martyn Bliss, A. Richards
{"title":"Enabling GPU software developers to optimize their applications — The LPGPU2 approach","authors":"B. Juurlink, J. Lucas, Nadjib Mammeri, G. Keramidas, Katerina Pontzolkova, I. Aransay, Chrysa Kokkala, Martyn Bliss, A. Richards","doi":"10.1109/DASIP.2017.8122116","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122116","url":null,"abstract":"Low-power GPUs have become ubiquitous, they can be found in domains ranging from wearable and mobile computing to automotive systems. With this ubiquity has come a wider range of applications exploiting low-power GPUs, placing ever increasing demands on the expected performance and power efficiency of the devices. The LPGPU2 project is an EU-funded, Innovation Action, 30-month-project targeting to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To this end, the project follows a holistic approach. First, several applications (use cases) are being developed for or ported to low-power GPUs. These applications will be optimized using the tooling framework in the last phase of the project. In addition, power measurement devices and power models are devised that are 10× more accurate than the state of the art. The ultimate goal of the project is to promote open vendor-neutral standards via the Khronos group. This paper briefly reports on the achievements made in the first phase of the project (till month 18) and focuses on the progress made in applications; in power measurement, estimation, and modelling; and in the analysis and visualization tool suite.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"26 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82060300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Madroñal, R. Lazcano, H. Fabelo, S. Ortega, R. Salvador, G. Callicó, E. Juárez, C. Sanz
{"title":"Energy consumption characterization of a Massively Parallel Processor Array (MPPA) platform running a hyperspectral SVM classifier","authors":"D. Madroñal, R. Lazcano, H. Fabelo, S. Ortega, R. Salvador, G. Callicó, E. Juárez, C. Sanz","doi":"10.1109/DASIP.2017.8122112","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122112","url":null,"abstract":"In this paper, a Massively Parallel Processor Array platform is characterized in terms of energy consumption using a Support Vector Machine for hyperspectral image classification. This platform gathers 16 clusters composed of 16 cores each, i.e., 256 processors working in parallel. The objective of the work is to associate power dissipation and energy consumed by the platform with the different resources of the architecture. Experimenting with a hyperspectral SVM classifier, this study has been conducted using three strategies: i) modifying the number of processing elements, i.e., clusters and cores, ii) increasing system frequency, and iii) varying the number of active communication links during the analysis, i.e., I/Os and DMAs. As a result, a relationship between the energy consumption and the active platform resources has been exposed using two different parallelization strategies. Finally, the implementation that fully exploits the parallelization possibilities working at 500MHz has been proven to be also the most efficient one, as it reduces the energy consumption by 98% when compared to the sequential version running at 400MHz.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82068279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Nguyen, A. Mouraud, M. Thévenin, G. Corre, O. Pasquier, S. Pillement
{"title":"Model-driven reliability evaluation for MPSoC design","authors":"T. Nguyen, A. Mouraud, M. Thévenin, G. Corre, O. Pasquier, S. Pillement","doi":"10.1109/DASIP.2017.8122115","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122115","url":null,"abstract":"When designing a Multi-Processor System-on-Chip (MPSoC), a very large range of design alternatives arises from a huge space of possible design options and component choices. Literature proposes numerous Design-Space-Exploration (DSE) approaches thats mainly focus on cost optimization. In this paper, we present a DSE approach which focuses on the reliability of the whole design. This approach is based on a meta-model of Multi-Processor System-on-Chips (MPSoCs) integrated the reliability evaluation. We develop a tool that allows designers to describe and optimize their platform based on the proposed meta-model. The obtained results of an MPSoC is presented including the improved overall reliability of the system thanks to the automatic selection of the fault tolerance strategies for each component.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90654585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demonstrator of a fingerprint recognition algorithm into a low-power microcontroller","authors":"Javier Arcenegui, Rosario Arjona, I. Baturone","doi":"10.1109/DASIP.2017.8122121","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122121","url":null,"abstract":"A demonstrator has been developed to illustrate the performance of a lightweight fingerprint recognition algorithm based on the feature QFingerMap16, which is extracted from a window of the directional image centered at the convex core of the fingerprint. The algorithm has been implemented into a low-power ARM Cortex-M3 microcontroller included in a Texas Instruments LaunchPad CC2650 evaluation kit. It has been also implemented in a Raspberry Pi 2 so as to show the results obtained at the successive steps of the recognition process with the aid of a Graphical User Interface (GUI). The algorithm offers a good tradeoff between power consumption and recognition accuracy, being suitable for authentication on wearables.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"67 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74490186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient hardware acceleration for approximate inference of bitwise deep neural networks","authors":"Sebastian Vogel, A. Guntoro, G. Ascheid","doi":"10.1109/DASIP.2017.8122127","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122127","url":null,"abstract":"In recent years, Deep Neural Networks (DNNs) have been of special interest in the area of image processing and scene perception. Albeit being effective and accurate, DNNs demand challenging computational resources. Fortunately, dedicated low bitwidth accelerators enable efficient, real-time inference of DNNs. We present an approximate evaluation method and a specialized multiplierless accelerator for the recently proposed bitwise DNNs. Our approximate evaluation method is based on the speculative recomputation of selective parts of a bitwise neural network. The selection is based on the intermediate results of a previous input evaluation. In context with limited energy budgets, our method and accelerator enable a fast, power efficient, first decision. If necessary, a reliable and accurate output is available after reevaluating the input data multiple times in an approximate manner. Our experiments on the GTSRB and CIFAR-10 dataset show that this approach results in no loss of classification performance in comparison with floating-point evaluation. Our work contributes to efficient inference of neural networks on power-constrained embedded devices.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"104 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83412995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}