{"title":"Smart, Secure, Yet Energy-Efficient, Internet-of-Things Sensors","authors":"Ayten Ozge Akmandor;Hongxu YIN;Niraj K. Jha","doi":"10.1109/TMSCS.2018.2864297","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2864297","url":null,"abstract":"The proliferation of Internet-of-Things (IoT) has led to the generation of zettabytes of sensitive data each year. The generated data are usually raw, requiring cloud resources for processing and decision-making operations to extract valuable information (i.e., distill smartness). Use of cloud resources raises serious design issues: limited bandwidth, insufficient energy, and security concerns. Edge-side computing and cryptographic techniques have been proposed to get around these problems. However, as a result of increased computational load and energy consumption, it is difficult to simultaneously achieve smartness, security, and energy efficiency. We propose a novel way out of this predicament by employing signal compression and machine learning inference on the IoT sensor node. An important sensor operation scenario is for the sensor to transmit data to the base station immediately when an event of interest occurs, e.g., arrhythmia is detected by a smart electrocardiogram sensor or seizure is detected by a smart electroencephalogram sensor, and transmit data on a less urgent basis otherwise. Since on-sensor compression and inference drastically reduce the amount of data that need to be transmitted, we actually end up with a dramatic energy bonus relative to the traditional sense-and-transmit IoT sensor. We use a part of this energy bonus to carry out encryption and hashing to ensure data confidentiality and integrity. We analyze the effectiveness of this approach on six different IoT applications with two data transmission scenarios: alert notification and continuous notification. The experimental results indicate that relative to the traditional sense-and-transmit sensor, IoT sensor energy is reduced by \u0000<inline-formula><tex-math>$57.1times$</tex-math></inline-formula>\u0000 for electrocardiogram (ECG) sensor based arrhythmia detection, \u0000<inline-formula><tex-math>$379.8times$</tex-math></inline-formula>\u0000 for freezing of gait detection in the context of Parkinson's disease, \u0000<inline-formula><tex-math>$139.7times$</tex-math></inline-formula>\u0000 for electroencephalogram (EEG) sensor based seizure detection, \u0000<inline-formula><tex-math>$216.6times$</tex-math></inline-formula>\u0000 for human activity classification, \u0000<inline-formula><tex-math>$162.8times$</tex-math></inline-formula>\u0000 for neural prosthesis spike sorting, and \u0000<inline-formula><tex-math>$912.6times$</tex-math></inline-formula>\u0000 for chemical gas classification. Our approach not only enables the IoT system to push signal processing and decision-making to the extreme of the edge-side (i.e., the sensor node), but also solves data security and energy efficiency problems simultaneously.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"914-930"},"PeriodicalIF":0.0,"publicationDate":"2018-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2864297","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68024189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware Accelerated Mappers for Hadoop MapReduce Streaming","authors":"Katayoun Neshatpour;Maria Malik;Avesta Sasan;Setareh Rafatirad;Houman Homayoun","doi":"10.1109/TMSCS.2018.2854787","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2854787","url":null,"abstract":"Heterogeneous architectures have emerged as an effective solution to address the energy-efficiency challenges. This is particularly happening in data centers where the integration of FPGA hardware accelerators with general purpose processors such as big Xeon or little Atom cores introduces enormous opportunities to address the power, scalability, and energy-efficiency challenges of processing emerging applications, in particular in domain of big data. Therefore, the rise of hardware accelerators in data centers, raises several important research questions: What is the potential for hardware acceleration in MapReduce, a defacto standard for big data analytics? What is the role of processor after acceleration; whether big or little core is most suited to run big data applications post hardware acceleration? This paper answers these questions through methodical real-system experiments on state-of-the-art hardware acceleration platforms. We first present the implementation of four highly used big data applications in a heterogeneous CPU+FPGA architecture. We develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine, and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA based hardware accelerating environment. We present a full implementation of the HW+SW mappers on existing FPGA+core platform and evaluate how a cluster of CPUs equipped with FPGAs uses the accelerated mapper to enhance the overall performance of MapReduce. Moreover, we study how various parameters at the application, system, and architecture levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration. This analysis helps to better understand how presence of HW accelerators for Hadoop MapReduce, changes the choice of CPU, tuning optimization parameters, and scheduling decisions for performance and energy-efficiency improvement. The results show a promising speedup as well as energy-efficiency gains of upto 5.7× and 16× is achieved, respectively, in an end-to-end Hadoop implementation using a semi-automated HLS framework. Results suggest that HW+SW acceleration yields significantly higher speedup on little cores, reducing the performance gap between little and big cores after the acceleration. On the other hand, the energy-efficiency benefit of HW+SW acceleration is higher on the big cores, which reduces the energy-efficiency gap between little and big cores. Overall, the experimental results show that a low cost embedded FPGA platform, programmed using a semi-automated HW+SW co-design methodology, brings significant performance and energy-efficiency gains for Hadoop MapReduce computing in cloud-based architectures and significantly reduces the reliance on large number of big high-performance cores.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"734-748"},"PeriodicalIF":0.0,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2854787","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Placement of Virtual Network Functions in Hybrid Data Center Networks","authors":"Zhenhua Li;Yuanyuan Yang","doi":"10.1109/TMSCS.2018.2848949","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2848949","url":null,"abstract":"Hybrid data center networks (HDCNs), where each ToR switch is installed with a directional antenna, emerge as a candidate helping alleviate the over-subscription problem in traditional data centers. Meanwhile, as virtualization techniques develop rapidly, there is a trend that traditional network functions that are implemented in hardware will also be virtualized into virtual machines. However, how to place virtual network functions (VNFs) into data centers to meet the customer requirements in a hybrid data center network environment is a challenging problem. In this paper, we study the VNF placement in hybrid data center networks, and provide a joint VNF placement and antenna scheduling model. We further simplify it to a mixed integer programming (MIP) problem. Due to the hardness of a MIP problem, we develop a heuristic algorithm to solve it, and also give an on-line algorithm to meet the requirements from real-time scenarios. To the best of our knowledge, this is the first work concerning VNF placement in the context of HDCNs. Our extensive simulations demonstrate the effectiveness of the proposed algorithms, which make them a promising solution for VNF placement in HDCN environment.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"861-873"},"PeriodicalIF":0.0,"publicationDate":"2018-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2848949","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Guest Editorial: Emerging Technologies and Architectures for Many core Computing Part 1: Hardware Techniques","authors":"Sébastien Le Beux;Paul V. Gratz;Ian O'Connor","doi":"10.1109/TMSCS.2018.2826758","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2826758","url":null,"abstract":"The papers included in this special section focus on emerging technologies and architectures for manycore computing, with particular emphasis on hardware techniques. THE pursuit of Moore’s Law is slowing and the exploration of alternative devices is underway to replace the CMOS transistor and traditional architectures at the heart of data processing. Moreover, the emergence of stringent application constraints, particularly those linked to energy consumption, require new system architectural strategies (e.g. manycore) and real-time operational adaptability approaches. Such complex systems require new and powerful design and programming methods to ensure optimal and reliable operation. Thus, this special issue aims at collating new research along all the dimensions of emerging technologies and architectures for computing in manycores.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 2","pages":"97-98"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2826758","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67858194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast TCAM-Based Multi-Match Packet Classification Using Discriminators","authors":"Hsin-Tsung Lin;Pi-Chung Wang","doi":"10.1109/TMSCS.2018.2847677","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2847677","url":null,"abstract":"Ternary content addressable memory (TCAM) is a widely used technology for network devices to perform packet classification. TCAM compares a search key with all ternary entries in parallel to yield the first matching entry. To generate all matching entries, either storage or speed penalty is inevitable. Because of the inherit disadvantages of TCAM, including power hungry and limited capacity, the feasibility of TCAM-based multi-match packet classification (TMPC) is thus debatable. Discriminators appended to each TCAM entry have been used to avoid storage penalty for TMPC. We are motivated to minimize speed penalty for TMPC with discriminators. In this paper, a novel scheme, which utilizes unused TCAM entries to accelerate the search performance, is presented. It selectively generates TCAM entries to merge overlapping match conditions so that the number of accessed TCAM entries can be significantly reduced. By limiting the number of generated TCAM entries, the storage penalty is minimized since our scheme does not need extra TCAM chips. We further present several refinements to the search procedure. The experimental results show that our scheme can drastically improve the search performance with extra 10-20 percent TCAM entries. As a result, the power consumption, which correlates to the number of accessed TCAM entries per classification, can be reduced.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"686-697"},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2847677","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68024196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
sai vineel reddy chittamuru;Ishan G. Thakkar;Sudeep Pasricha
{"title":"LIBRA: Thermal and Process Variation Aware Reliability Management in Photonic Networks-on-Chip","authors":"sai vineel reddy chittamuru;Ishan G. Thakkar;Sudeep Pasricha","doi":"10.1109/TMSCS.2018.2846274","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2846274","url":null,"abstract":"Silicon nanophotonics technology is being considered for future networks-on-chip (NoCs) as it can enable high bandwidth density and lower latency with traversal of data at the speed of light. But, the operation of photonic NoCs (PNoCs) is very sensitive to on-chip temperature and process variations. These variations can create significant reliability issues for PNoCs. For example, a microring resonator (MR) may resonate at another wavelength instead of its designated wavelength due to thermal and/or process variations, which can lead to bandwidth wastage and data corruption in PNoCs. This paper proposes a novel run-time framework called \u0000<italic>LIBRA</i>\u0000 to overcome temperature- and process variation- induced reliability issues in PNoCs. The framework consists of (i) a device-level reactive MR assignment mechanism that dynamically assigns a group of MRs to reliably modulate/receive data in a waveguide based on the chip thermal and process variation characteristics; and (ii) a system-level proactive thread migration technique to avoid on-chip thermal threshold violations and reduce MR tuning/ trimming power by dynamically migrating threads between cores. Our simulation results indicate that \u0000<italic>LIBRA</i>\u0000 can reliably satisfy on-chip thermal thresholds and maintain high network bandwidth while reducing total power by up to 61.3 percent, and thermal tuning/trimming power by up to 76.2 percent over state-of-the-art thermal and process variation aware solutions.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"758-772"},"PeriodicalIF":0.0,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2846274","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyi Lu;Haiyang Shi;Rajarshi Biswas;M. Haseeb Javed;Dhabaleswar K. Panda
{"title":"DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters","authors":"Xiaoyi Lu;Haiyang Shi;Rajarshi Biswas;M. Haseeb Javed;Dhabaleswar K. Panda","doi":"10.1109/TMSCS.2018.2845886","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2845886","url":null,"abstract":"<underline>D</u>\u0000eep \u0000<underline>L</u>\u0000earning \u0000<underline>o</u>\u0000ver \u0000<underline>B</u>\u0000ig \u0000<underline>D</u>\u0000ata (DLoBD) is an emerging paradigm to mine value from the massive amount of gathered data. Many Deep Learning frameworks, like Caffe, TensorFlow, etc., start running over Big Data stacks, such as Apache Hadoop and Spark. Even though a lot of activities are happening in the field, there is a lack of comprehensive studies on analyzing the impact of RDMA-capable networks and CPUs/GPUs on DLoBD stacks. To fill this gap, we propose a systematical characterization methodology and conduct extensive performance evaluations on four representative DLoBD stacks (i.e., CaffeOnSpark, TensorFlowOnSpark, MMLSpark/CNTKOnSpark, and BigDL) to expose the interesting trends regarding performance, scalability, accuracy, and resource utilization. Our observations show that RDMA-based design for DLoBD stacks can achieve up to 2.7x speedup compared to the IPoIB-based scheme. The RDMA scheme also scales better and utilizes resources more efficiently than IPoIB. For most cases, GPU-based schemes can outperform CPU-based designs, but we see that for LeNet on MNIST, CPU + MKL can achieve better performance than GPU and GPU + cuDNN on 16 nodes. Through our evaluation and an in-depth analysis on TensorFlowOnSpark, we find that there are large rooms to improve the designs of current-generation DLoBD stacks.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"635-648"},"PeriodicalIF":0.0,"publicationDate":"2018-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2845886","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67861364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast Hill Climbing Algorithm for Defect and Variation Tolerant Logic Mapping of Nano-Crossbar Arrays","authors":"Furkan Peker;Mustafa Altun","doi":"10.1109/TMSCS.2018.2829518","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2829518","url":null,"abstract":"Nano-crossbar arrays are area and power efficient structures, generally realized with self-assembly based bottom-up fabrication methods as opposed to relatively costly traditional top-down lithography techniques. This advantage comes with a price: very high process variations. In this work, we focus on the worst-case delay optimization problem in the presence of high process variations. As a variation tolerant logic mapping scheme, a fast hill climbing algorithm is proposed; it offers similar or better delay improvements with much smaller runtimes compared to the methods in the literature. Our algorithm first performs a reducing operation for the crossbar motivated by the fact that the whole crossbar is not necessarily needed for the problem. This significantly decreases the computational load up to 72 percent for benchmark functions. Next, initial column mapping is applied. After the first two steps that can be considered as preparatory, the algorithm proceeds to the last step of hill climbing row search with column reordering where optimization for variation tolerance is performed. As an extension to this work, we directly apply our hill climbing algorithm on defective arrays to perform both defect and variation tolerance. Again, simulation results approve the speed of our algorithm, up to 600 times higher compared to the related algorithms in the literature without sacrificing defect and variation tolerance performance.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"522-532"},"PeriodicalIF":0.0,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2829518","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68023989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-Power Multi-Sensor System with Power Management and Nonvolatile Memory Access Control for IoT Applications","authors":"Masanori Hayashikoshi;Hideyuki Noda;Hiroyuki Kawai;Yasumitsu Murai;Sugako Otani;Koji Nii;Yoshio Matsuda;Hiroyuki Kondo","doi":"10.1109/TMSCS.2018.2827388","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2827388","url":null,"abstract":"The low-power multi-sensor system with power management and nonvolatile memory access control for IoT applications are proposed, which achieves almost zero standby power at the no-operation modes. A power management scheme with activity localization can reduce the number of transitions between power-on and power-off modes with rescheduling and bundling task procedures. In addition, autonomously standby mode transition control selects the optimum standby mode of microcontrollers, reducing total power consumption. We demonstrate with evaluation board as a use case of IoT applications, observing 91 percent power reductions by adopting task scheduling and autonomously standby mode transition control combination. Furthermore, we propose a new nonvolatile memory access control technology, and estimate the possibility for future low-power effect.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"784-792"},"PeriodicalIF":0.0,"publicationDate":"2018-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2827388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68024165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive and Roll-Forward Error Recovery in MEDA Biochips Based on Droplet-Aliquot Operations and Predictive Analysis","authors":"Zhanwei Zhong;Zipeng Li;Krishnendu Chakrabarty","doi":"10.1109/TMSCS.2018.2827030","DOIUrl":"https://doi.org/10.1109/TMSCS.2018.2827030","url":null,"abstract":"Digital microfluidic biochips (DMFBs) are being increasingly used in biochemistry labs for automating bioassays. However, traditional DMFBs suffer from some key shortcomings: 1) inability to vary droplet volume in a flexible manner; 2) difficulty of integrating on-chip sensors; and 3) the need for special fabrication processes. To overcome these problems, DMFBs based on micro-electrode-dot -array (MEDA) have recently been proposed. However, errors are likely to occur on a MEDA DMFB due to chip defects and the unpredictability inherent to biochemical experiments. We present fine-grained error-recovery solutions for MEDA by exploiting real-time sensing and advanced MEDA-specific droplet operations. The proposed methods rely on adaptive droplet-aliquot operations and predictive analysis of mixing. In addition, a roll-forward error-recovery method is proposed to efficiently utilize the unused part of the biochip and reduce the time required for error recovery. Experimental results on three representative benchmarks demonstrate the efficiency of the proposed error-recovery strategy.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 4","pages":"577-592"},"PeriodicalIF":0.0,"publicationDate":"2018-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2827030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68025493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}