Hendrik Laux, Andreas Bytyn, G. Ascheid, A. Schmeink, Günes Karabulut-Kurt, Guido Dartmann
{"title":"Learning-based indoor localization for industrial applications","authors":"Hendrik Laux, Andreas Bytyn, G. Ascheid, A. Schmeink, Günes Karabulut-Kurt, Guido Dartmann","doi":"10.1145/3203217.3203227","DOIUrl":"https://doi.org/10.1145/3203217.3203227","url":null,"abstract":"Modern process automation and the industrial evolution heading towards Industry 4.0 require a huge variety of information to be fused in a Cyber-Physical System. Important for many applications is the spatial position of an arbitrary object given directly or indirectly in terms of data that has to be processed to obtain position information. Starting point for the idea of the technical reflection-based sound localization system presented in this paper is the biological role model of humans being able to learn how to localize sound sources. Compared to other forms of sound localization, this nature-inspired method has no need for high spatial and temporal accuracy or big microphone arrays. Possible applications for this system are indoor robot localization or object tracking.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124961175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuele Rusci, D. Rossi, E. Flamand, M. Gottardi, Elisabetta Farella, L. Benini
{"title":"Always-ON visual node with a hardware-software event-based binarized neural network inference engine","authors":"Manuele Rusci, D. Rossi, E. Flamand, M. Gottardi, Elisabetta Farella, L. Benini","doi":"10.1145/3203217.3204463","DOIUrl":"https://doi.org/10.1145/3203217.3204463","url":null,"abstract":"This work introduces an ultra-low-power visual sensor node coupling event-based binary acquisition with Binarized Neural Networks (BNNs) to deal with the stringent power requirements of always-on vision systems for IoT applications. By exploiting in-sensor mixed-signal processing, an ultra-low-power imager generates a sparse visual signal of binary spatial-gradient features. The sensor output, packed as a stream of events corresponding to the asserted gradient binary values, is transferred to a 4-core processor when the amount of data detected after frame difference surpasses a given threshold. Then, a BNN trained with binary gradients as input runs on the parallel processor if a meaningful activity is detected in a pre-processing stage. During the BNN computation, the proposed Event-based Binarized Neural Network model achieves a system energy saving of 17.8% with respect to a baseline system including a low-power RGB imager and a Binarized Neural Network, while paying a classification performance drop of only 3% for a real-life 3-classes classification scenario. The energy reduction increases up to 8x when considering a long-term always-on monitoring scenario, thanks to the event-driven behavior of the processing sub-system.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128851670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A methodology for efficient code optimizations and memory management","authors":"Vasilios I. Kelefouras, K. Djemame","doi":"10.1145/3203217.3203274","DOIUrl":"https://doi.org/10.1145/3203217.3203274","url":null,"abstract":"The key to optimizing software is the correct choice, order as well parameters of optimizations-transformations, which has remained an open problem in compilation research for decades for various reasons. First, most of the compilation subproblems-transformations are interdependent and thus addressing them separately is not effective. Second, it is very hard to couple the transformation parameters to the processor architecture (e.g., cache size and associativity) and algorithm characteristics (e.g. data reuse); therefore compiler designers and researchers either do not take them into account at all or do it partly. Third, the search space (all different transformation parameters) is very large and thus searching is impractical. In this paper, the above problems are addressed for data dominant affine loop kernels, delivering significant contributions. A novel methodology is presented that takes as input the underlying architecture details and algorithm characteristics and outputs the near-optimum parameters of six code optimizations in terms of either L1,L2,DDR accesses, execution time or energy consumption. The proposed methodology has been evaluated to both embedded and general purpose processors and for 6 well known algorithms, achieving high speedup as well energy consumption gain values over gcc compiler, hand written optimized code and Polly.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129998856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning application for patients activity recognition with pressure sensing in bed","authors":"Shengwei Luo, Chunhui Zhao, Limin Lu, Yongji Fu","doi":"10.1145/3203217.3203229","DOIUrl":"https://doi.org/10.1145/3203217.3203229","url":null,"abstract":"Patient activity recognition in bed is very valuable to clinician to understand patient disease and drive clinical decisions. This paper proposes a recognition method based on the CNN (Convolutional Neural Network) to identify the action of bedridden patients. The inputs are 4 time series signals acquired from pressure sensors on the bed. Through CNN we obtain the corresponding membership of four pre-defined actions. A probability density analysis is made for setting a judgment standard, and ultimately recognizing the action. The method has been tested with real human activity signal and the results are promising.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130175505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A file system bypassing volatile main memory: towards a single-level persistent store","authors":"Deng Zhou, Wen Pan, T. Xie, Wei Wang","doi":"10.1145/3203217.3203277","DOIUrl":"https://doi.org/10.1145/3203217.3203277","url":null,"abstract":"Existing persistent memory (PM) based file systems rely on a DRAM and PM hybrid store. Although a hybrid store does boost system performance while avoiding some current PM limitations like limited endurance, we envision that with more advances PM technologies could provide applications with a single-level persistent store in the not-so-distant future. As a first step to explore this direction, in this paper we design, implement, and evaluate a new persistent memory file system called SPFS (Single-level Persistent File System), which completely bypasses conventional DRAM-based volatile main memory. Unlike all existing PM-based file systems, SPFS never leverages DRAM to manage its metadata. Thus, redundant copies of metadata in volatile main memory (e.g., a copy of an inode in DRAM) and data movements between the two memories (e.g., copying an inode from PM to DRAM) can be totally eliminated. The goal of this paper is to explore how to manage files and their metadata with guaranteed data consistency on PM without the support of DRAM, which makes a first step towards the ultimate success of a single-level persistent store. Our experimental results demonstrate that SPFS outperforms traditional DRAM-based in-memory file systems ramfs and tmpfs in most cases. Besides, its performance is only moderately worse than that of NOVA, a state-of-the-art PM-based file system.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124635884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving public datasets markings' quality using unsupervised refinement kernels","authors":"A. Petre, Cosmin Toca, C. Patrascu, M. Ciuc","doi":"10.1145/3203217.3205862","DOIUrl":"https://doi.org/10.1145/3203217.3205862","url":null,"abstract":"In recent years there has been an exponential growth in developing machine learning algorithms, focused on applications ranging from scene understanding, to the more standard object recognition and classification tasks. Although multiple approaches have been proposed for solving these issues, a common prerequisite is the existence of large datasets, which can be used both for training and testing purposes. We propose a semi-automatic annotation framework for object instances, which addresses the problems related to the big data paradigm in the context of object detection and pixel-level segmentation. The designed marking and learning workflow aims to be a cyclical process allowing iterative improvements of the marking architecture. Results of this processing chain are empirically validated on the COCO database.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Silvano, G. Palermo, G. Agosta, Amir H. Ashouri, D. Gadioli, Stefano Cherubin, E. Vitali, L. Benini, Andrea Bartolini, Daniel Cesarini, João MP Cardoso, João Bispo, Pedro Pinto, Ricardo Nobre, Erven Rohou, L. Besnard, Imane Lasri, N. Sanna, C. Cavazzoni, R. Cmar, J. Martinovič, K. Slaninová, Martin Golasowski, A. Beccari, C. Manelfi
{"title":"Autotuning and adaptivity in energy efficient HPC systems: the ANTAREX toolbox","authors":"C. Silvano, G. Palermo, G. Agosta, Amir H. Ashouri, D. Gadioli, Stefano Cherubin, E. Vitali, L. Benini, Andrea Bartolini, Daniel Cesarini, João MP Cardoso, João Bispo, Pedro Pinto, Ricardo Nobre, Erven Rohou, L. Besnard, Imane Lasri, N. Sanna, C. Cavazzoni, R. Cmar, J. Martinovič, K. Slaninová, Martin Golasowski, A. Beccari, C. Manelfi","doi":"10.1145/3203217.3205338","DOIUrl":"https://doi.org/10.1145/3203217.3205338","url":null,"abstract":"Designing and optimizing applications for energy-efficient High Performance Computing systems up to the Exascale era is an extremely challenging problem. This paper presents the toolbox developed in the ANTAREX European project for autotuning and adaptivity in energy efficient HPC systems. In particular, the modules of the ANTAREX toolbox are described as well as some preliminary results of the application to two target use cases. 1","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127320515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the impact of pushing voice-driven interaction pipelines to the edge","authors":"S. Sridhar, Matthew E. Tolentino","doi":"10.1145/3203217.3203242","DOIUrl":"https://doi.org/10.1145/3203217.3203242","url":null,"abstract":"With the releases of Alexa Voice Services and Google Home, voice-driven interactive computing has quickly become commonplace. Voice interactive applications incorporate multiple components including complex speech recognition and translation algorithms, natural language understanding and generation capabilities, as well as custom compute functions commonly referred to as skills. Voice-driven interactive systems are composed of software pipelines using these components. These pipelines are typically resource intensive and must be executed quickly to maintain dialogue-consistent latency; consequently, voice interaction pipelines are usually computed in the cloud. However, for many cases, cloud connectivity may not be practical and thus require these voice interactive pipelines be executed at the edge. In this paper, we evaluate the feasibility of pushing voice interaction pipelines to resource constrained edge devices. Driven by the goal of enabling voice-driven interfaces for first responders during emergencies when connectivity to the cloud is impractical, we characterize the end-to-end performance of a complete open source voice interaction pipeline for four different configurations ranging from entirely cloud-based to completely edge-based. We then identify and evaluate several optimizations, such as caching and customized acoustic models that enable voice-driven interaction pipelines to be fully executed at computationally-weak edge devices at lower response latencies than using high-performance cloud resources.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133475443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Zhang, T. Tang, Jianbin Fang, Chun Huang, Canqun Yang, Zheng Wang
{"title":"MOCL: an efficient openCL implementation for the matrix-2000 architecture","authors":"P. Zhang, T. Tang, Jianbin Fang, Chun Huang, Canqun Yang, Zheng Wang","doi":"10.1145/3203217.3203244","DOIUrl":"https://doi.org/10.1145/3203217.3203244","url":null,"abstract":"This paper presents the design and implementation of an Open Computing Language (OpenCL) framework for the Matrix-2000 many-core architecture. This architecture is designed to replace the Intel XeonPhi accelerators of the TianHe-2 supercomputer. We share our experience and insights on how to design an effective OpenCL system for this new hardware accelerator. We propose a set of new analysis and optimizations to unlock the potential of the hardware. We extensively evaluate our approach using a wide range of OpenCL benchmarks on a single and multiple computing nodes. We present our design choices and provide guidance how to optimize code on the new Matrix-2000 architecture.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133942873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Aprile, J. Wüthrich, Luca Baldassarre, Y. Leblebici, V. Cevher
{"title":"An area and power efficient on-the-fly LBCS transformation for implantable neuronal signal acquisition systems","authors":"C. Aprile, J. Wüthrich, Luca Baldassarre, Y. Leblebici, V. Cevher","doi":"10.1145/3203217.3203260","DOIUrl":"https://doi.org/10.1145/3203217.3203260","url":null,"abstract":"A power and area efficient hardware encoding system tailored for wireless implantable applications is presented. Constant medical monitoring allowed by implantable devices is the most relevant alternative to current bulky monitoring systems, which, in case of severe mental diseases, require heavy surgery and long term hospitalization periods. In this work, the circuit design and the signal processing algorithm dovetail in order to allow real-time neuronal signal monitoring. Two main features must be met on the circuit level to facilitate the acceptance of the implant from the human body: small area and low power consumption. The presented work proposes a new compression scheme based on the Learning-Based Compressive Subsampling approach, which allows an area reduction with respect to recent published works, while allowing high signal reconstruction quality within low power requirements. The proposed method implements on-the-fly compression coefficients generation, which does not require large static memories. This new fully digital architecture handles the data compression of each individual neuronal acquisition channel with an area of 200 × 190μm in 0.18 μm CMOS technology, and a power dissipation of only 1.15μW.","PeriodicalId":127096,"journal":{"name":"Proceedings of the 15th ACM International Conference on Computing Frontiers","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116641825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}