Mahyar Shahsavari, Pierre Boulet, A. Shahbahrami, S. Hamdioui
{"title":"Impact of increasing number of neurons on performance of neuromorphic architecture","authors":"Mahyar Shahsavari, Pierre Boulet, A. Shahbahrami, S. Hamdioui","doi":"10.1109/CADS.2017.8310732","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310732","url":null,"abstract":"Pattern recognition is used to classify the input data into different classes based on extracted key features. Increasing the recognition rate of pattern recognition applications is a challenging task. The spike neural networks inspired from physiological brain architecture, is a neuromorphic hardware implementation of network of neurons. A sample of neuromorphic architecture has two layers of neurons, input and output. The number of input neurons is fixed based on the input data patterns. While the number of outputs neurons can be different. The goal of this paper is performance evaluation of neuromorphic architecture in terms of recognition rates using different numbers of output neurons. For this purpose a simulation environment of N2S3 and MNIST handwritten digits are used. Our simulation results show the recognition rate for various number of output neurons, 20, 30, 50, 100, 200, and 300 is 70%, 74%, 79%, 85%, 89%, and 91%, respectively.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115031381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bamshad: A JIT compiler for running Java stream APIs on heterogeneous environments","authors":"Bahram Yarahmadi, F. Khunjush","doi":"10.1109/CADS.2017.8310734","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310734","url":null,"abstract":"Nowadays, Graphics Processing Units (GPUs) and other types of emerging accelerators have an important role in high-performance computing. These devices can be leveraged in a wide range of applications through using appropriate programming environments such as CUDA and OpenCL which lead to reaching high-performance applications. However, on one hand, programming GPUs is a painful and error-prone task and requires a great amount of expertise especially in low-level architectural features as well as their memory management in order to achieve reasonable performance. On the other hand, enabling running high-level programming languages such as Java with massive computational power of today's GPUs can lessen the burden of this complexity. Considering new features in Java 8 such as lambda functions which are used in Java parallel streams, supporting these new features is vital to use these devices in real applications. In this paper, we introduce a just-in-time compiler, named Bamshad, which ports lambda functions used in Java parallel streams to GPU at runtime. For this, a series of compiler techniques are adopted to transparently eliminate unnecessary data communication between CPUs and GPUs. With our approach, a programmer is not involved in the detailed process of tuning a GPU device for reducing the amount of communication. The experimental results show that the proposed JIT compiler yields 13× improvement in comparison to sequential Java execution for all benchmarks. Also, in comparison to parallel Java, our work yields 3.9× improvement.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126944967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PAMS: A new position-aware multi-sensor dataset for human activity recognition using smartphones","authors":"Pegah Esfahani, H. Malazi","doi":"10.1109/CADS.2017.8310680","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310680","url":null,"abstract":"Nowadays smartphones are ubiquitous in various aspects of our lives. The processing power, communication bandwidth, and the memory capacity of these devices have surged considerably in recent years. Besides, the variety of sensor types, such as accelerometer, gyroscope, humidity sensor, and bio-sensors, which are embedded in these devices, opens a new horizon in self-monitoring of physical daily activities. One of the primary steps for any research in the area of detecting daily life activities is to test a detection method on benchmark datasets. Most of the early datasets limited their work to collecting only a single type of sensor data such as accelerometer data. While some others do not consider age, weight, and gender of the subjects who have participated in collecting their activity data. Finally, part of the previous works collected data without considering the smartphone's position. In this paper, we introduce a new dataset, called Position-Aware Multi-Sensor (PAMS). The dataset contains both accelerometer and gyroscope data. The gyroscope data boosts the accuracy of activity recognition methods as well as enabling them to detect a wider range of activities. We also take the user information into account. Based on the biometric attributes of the participants, a separate learned model is generated to analyze their activities. We concentrate on several major activities, including sitting, standing, walking, running, ascending/descending stairs, and cycling. To evaluate the dataset, we use various classifiers, and the outputs are compared to the WISDM. The results show that using aforementioned classifiers, the average precision for all activities is above 88.5%. Besides, we measure the CPU, memory, and bandwidth usage of the application collecting data on the smartphone.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132261242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mosabbah Mushir Ahmed, D. Hély, N. Barbot, R. Siragusa, E. Perret, M. Bernier, F. Garet
{"title":"Towards a robust and efficient EM based authentication of FPGA against counterfeiting and recycling","authors":"Mosabbah Mushir Ahmed, D. Hély, N. Barbot, R. Siragusa, E. Perret, M. Bernier, F. Garet","doi":"10.1109/CADS.2017.8310673","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310673","url":null,"abstract":"Counterfeiting of integrated circuits (IC) has become a serious concern for semiconductor industry. It is necessary to find a robust solution which is both efficient and low cost in terms of implementation in order to detect and avoid the counterfeiting of ICs. Also, the solution must be resistant against aging and other reliability effects. In this paper we have proposed a scheme to utilize radiated Electromagnetic (EM) emission from the IC to create a fingerprint. Our proposed scheme exploits manufacturing based process variation (PV), which continues to dominate in the nanoscale technologies. We have deployed variability-aware circuit (VAC) design that generates radiated EM emission and performs realistic assessment of the PV effects. Generated EM response is treated to different encoding metrics to quantize it as a fingerprint for the IC. Latter part of the paper validates that the fingerprint is stable after the aging effects of IC. To validate our proposed scheme measurements are carried out over several FPGA boards.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114127615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amin Majd, M. Daneshtalab, J. Plosila, N. Khalilzad, Golnaz Sahebi, E. Troubitsyna
{"title":"NOMeS: Near-optimal metaheuristic scheduling for MPSoCs","authors":"Amin Majd, M. Daneshtalab, J. Plosila, N. Khalilzad, Golnaz Sahebi, E. Troubitsyna","doi":"10.1109/CADS.2017.8310723","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310723","url":null,"abstract":"The task scheduling problem for Multiprocessor System-on-Chips (MPSoC), which plays a vital role in performance, is an NP-hard problem. Exploring the whole search space in order to find the optimal solution is not time efficient, thus metaheuristics are mostly used to find a near-optimal solution in a reasonable amount of time. We propose a novel metaheuristic method for near-optimal scheduling that can provide performance guarantees for multiple applications implemented on a shared platform. Applications are represented as directed acyclic task graphs (DAG) and are executed on an MPSoC platform with given communication costs. We introduce a novel multi-population method inspired by both genetic and imperialist competitive algorithms. It is specialized for the scheduling problem with the goal to improve the convergence policy and selection pressure. The potential of the approach is demonstrated by experiments using a Sobel filter, a SUSAN filter, RASTA-PLP and JPEG encoder as real-world case studies.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114207513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolutionary architecture design for approximate DCT","authors":"Abbas Azaraien, Babak Djalaei, M. Salehi","doi":"10.1109/CADS.2017.8310731","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310731","url":null,"abstract":"Discrete Cosine Transform (DCT) which has a major role in image and video compression has also a major role in power consumption. Approximate Computing let us trade precision to save power in error resilient applications such as multimedia. Therefore, DCT is a potential candidate for approximation. In this paper, we propose a method for evolutionary design of DCT architecture exploiting the inherent behavior of DCT. Unlike the prior works on DCT approximation, which concentrated mostly on optimizing, replacing, or removing less effective building blocks of DCT, in our proposed method we use the evolutionary method to find new structures for DCT. According to the results, the evolution methods lead to architectures with less area and acceptable accuracy.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117258613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NOC characteristics of cloud applications","authors":"P. Lotfi-Kamran, M. Modarressi, H. Sarbazi-Azad","doi":"10.1109/CADS.2017.8310674","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310674","url":null,"abstract":"Cloud applications have abundant request-level parallelism, and as a result, many-core server processors are good candidates for their execution. A key component in a many-core processor is the network-on-chip (NOC) that connects cores to cache banks and memory, and acts as the medium for delivering instructions and data to the cores. While cloud applications are an important class of massively-parallel workloads that benefit from many-core processors and networks-on-chip, there is no comprehensive study for the NOC requirements of these workloads. In this work, we use full-system simulation and a set of cloud applications to study the characteristics and requirements of these applications with respect to networks-on-chip. We find that NOC latency is the most important optimization criterion for these workloads. As NOC traffic of these workloads is relatively low and approximately follows uniform traffic, we find that knobs like routing algorithm and buffer size that mostly affect NOC bandwidth, beyond a certain point, have little impact on the performance of these workloads. On the other hand, techniques that reduce NOC latency directly improve the performance of cloud applications.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132259438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UTIC: A toolchain for enumeration and selection of custom instructions","authors":"Mahdi Mohammadpour Fard, Mostafa Ersali Salehi Nasab","doi":"10.1109/CADS.2017.8310727","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310727","url":null,"abstract":"In this paper, we present a free and open source toolchain that can be used for enumerating all custom instructions from C and C++ codes considering not only general constraints like circular dependencies but also parameterized architectural constraints such as number of read and write ports, area, and delay parameters and constraints. After enumerating all custom instructions, the toolchain can select the best subset of custom instructions for implementation.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132642323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Golsa Ghasemi, Amir Mahdi Hosseini Monazzah, Hamed Farbeh
{"title":"RI-COTS: Trading performance for reliability improvements in commercial of the shelf systems","authors":"Golsa Ghasemi, Amir Mahdi Hosseini Monazzah, Hamed Farbeh","doi":"10.1109/CADS.2017.8310724","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310724","url":null,"abstract":"The flexibility of software-based fault tolerant approaches in providing the required level of reliability Commer-cial-Off-The Shelf (COTS) devices made them the first choice in designing safety-critical systems. In this paper, we propose a reliability improvement method for COTS-based systems, so-called, RI-COTS. The main idea behind RI-COTS is to establish a tradeoff between reliability and performance of COTS system through controlling redundant execution at instruction level. RI-COTS is implemented on LEON2 processor VHDL model. Our simulation results show that comparing with the most related studies, RI-COTS can improve the fault detection capability by 20% with only 4% performance overhead.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133077847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient mapping of DNA logic circuits on parallelized digital microfluidic architcture","authors":"Z. Beiki, M. Taajobian, A. Jahanian","doi":"10.1109/CADS.2017.8310679","DOIUrl":"https://doi.org/10.1109/CADS.2017.8310679","url":null,"abstract":"DNA is known as the basic element for storing the life codes and transferring the genetic features through the generations. However, it is found that DNA molecules can be utilized for a new kind of computation that opens fascinating horizons in computation and medical sciences. Significant contributions are addressed on design of DNA-based logic gates for medical and computational applications. Microfluidic biochips are known as efficient platforms to implement the DNA circuits but current biochips architectures allow sequential implementation of DNA modules that leads to increase the run time. In this paper, a new Microfluidic biochip architecture and corresponding CAD flow is presented for parallel implementation of DNA circuits. In this flow, Verilog description of the circuit files are synthesized and converted into a bioassay file format. Then assay files are implemented on a microfluidic biochip based on parallel architecture that mane is PBCM architecture. Experimental results show that the experimental time of assays and pin number of biochips are reduced by 17% and 23% respectively.","PeriodicalId":321346,"journal":{"name":"2017 19th International Symposium on Computer Architecture and Digital Systems (CADS)","volume":"91 21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128786128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}