K. Komatsu, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi
{"title":"A Directive Generation Approach Using User-Defined Rules","authors":"K. Komatsu, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi","doi":"10.1109/CANDAR.2016.0095","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0095","url":null,"abstract":"The appearance of various high-performance computing (HPC) systems compels a user to write a code considering the characteristic of each HPC system. To describe the system-dependent information without drastic code modifications, the directive sets such as the OpenMP directive set and the OpenACC directive set are useful. However, a code becomes complex to achieve high performance on various HPC systems because different directive sets are required for each HPC system. Thus, the code maintainability and readability are degraded. This paper proposes a directive generation approach that generates various kinds of directive sets using user-defined rules. Instead of several kinds of directive sets, a user writes a special placeholder that is utilized to specify a unique code pattern where several directives are inserted. Then, the special placeholder triggers generation of appropriate directives for each system using a user-defined rule with a code translation framework Xevolver. Because only special placeholders are inserted in a code, the proposed approach can keep the code maintainability and readability. From the demonstration of translation into three kinds of directive-based implementations, it is clarified that the proposed approach can replace directives into a smaller number of special placeholders. Moreover, it is clarified that the proposed approach can realize high performance portability by generating appropriate directives for each HPC system.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131079146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Memory-Access-Efficient Implementation of the Approximate String Matching Algorithm on GPU","authors":"L. S. N. Nunes, J. Bordim, K. Nakano, Yasuaki Ito","doi":"10.1109/CANDAR.2016.0090","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0090","url":null,"abstract":"The task of finding strings having a partial match to a given pattern is of interest to a number of practical applications, including DNA sequencing and text searching. Owing to its importance, alternatives to accelerate the Approximate String Matching (ASM) have been widely investigated in the literature. The main contribution of this work is to present a memory-access-efficient implementation for computing the ASM on a GPU. The key idea of our implementation relies on warp shuffle operations, which are used to reduce the communication overhead between threads. Experimental results, carried out on a GeForce GTX 960 GPU, show that the proposed implementation provides acceleration between 1.31 and 1.84 times when compared to another noteworthy alternative.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130687734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Shindo, Momoka Ohba, Tomoaki Tsumura, Shinobu Miwa
{"title":"Evaluation of Task Mapping on Multicore Neural Network Accelerators","authors":"S. Shindo, Momoka Ohba, Tomoaki Tsumura, Shinobu Miwa","doi":"10.1109/CANDAR.2016.0078","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0078","url":null,"abstract":"Deep neural networks are widely used for many applications such as image classification, speech recognition and natural language processing because of their high recognition rate. Since general-purpose processors such as CPUs and GPUs are not energy efficient for such neural networks, application specific hardware accelerators for neural networks (a.k.a. neural network accelerators or NNAs) have been proposed to improve the energy efficiency. There are many studies to increase the energy efficiency of NNAs, but few studies focus on task allocation on the accelerators. This paper provides the first exploration of task mapping to cores within NNAs for the increased performance. Intuitively, a well-tuned task mapping has less amount of communication between cores. To confirm this assumption, we tested two types of task mappings that generate different amount of communication between cores on an NNA. Our experimental results show that the number of communication between cores strongly affects the execution cycle of the NNA and the most effective task mapping differs depending on the size of neural networks.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129287622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast Hybrid Approach for Stream Compaction on GPUs","authors":"V. Rego, Janche Sang, Chansu Yu","doi":"10.1109/CANDAR.2016.0089","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0089","url":null,"abstract":"Stream compaction, also known as stream filtering or selection, produces a smaller output array which contains the indices of the only selected elements from the input array for further processing. With the tremendous amount of data elements to be filtered, the performance of selection is of great concern. Recently, modern Graphics Processing Units (GPUs) have been increasingly used to accelerate the execution of massively large, data parallel applications. In this paper, we proposed a hybrid implementation method for stream compaction on GPUs by using both parallel prefix-sum and atomics approaches. We compared its performance with different parallel selection algorithms on the current generation of NVIDIA GPUs. The experimental results show that our method can be more than 120 times faster than the sequential selection on CPU. Furthermore, the hybrid method performs the best among all existing selection algorithms on GPU and can be 5.6 times faster than Thrust, an open-source parallel algorithms library.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126072204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computation Based on Signal Random Fluctuation in Asynchronous Cellular Automata","authors":"Wen-Li Xu, Jia Lee","doi":"10.1109/CANDAR.2016.0051","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0051","url":null,"abstract":"A Brownian cellular automaton (BCA) is an asynchronous cellular automaton (ACA) in which the local configurations representing signals may fluctuate randomly in the cell space. The random fluctuation of signals enables effective stochastic search to conduct computation, which can actually result in the decreased complexity of BCAs. This paper proposes a novel BCA based on a conventional ACA, which employs a smaller number of rules as compared to the previous model. This BCA is capable of universal computing which might demonstrate the crucial role of signal random fluctuation for reducing the complexity of universal ACAs.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122801070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Password-Protected Secret Sharing Based on Kurosawa-Desmedt Hybrid Encryption","authors":"T. Arai, Satoshi Obana","doi":"10.1109/CANDAR.2016.0108","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0108","url":null,"abstract":"Needs for secret sharing scheme is increasing as demands for cloud services grow. However, secret sharing scheme possesses a drawback in that unauthorized users who can access storages storing partial information can reconstruct a secret. Password-Protected Secret Sharing (PPSS) was proposed in order to resolve such a drawback. PPSS is a secret sharing scheme that ensures only the owner of the secret who knows correct password to get the original secret by applying password authentication to partial information. The first PPSS was proposed by Bagherzandi et al. in 2011. When a secret is large, their scheme encrypts the secret with symmetric key encryption (SKE) and then encrypts the symmetric key with CPA secure public key encryption (PKE). Because of such combination, it seems difficult to prove strong security (i.e., CCA security) of their scheme at least in the standard model. In this paper, we propose a new PPSS model and scheme which does not use a simple combination of SKE and CPA secure PKE but use Kurosawa-Desmedt hybrid encryption, that is proven to be CCA secure in the standard model. Proposed PPSS is constructed by combining public key part of Kurosawa-Desmedt hybrid encryption with password authentication. Our scheme is expected to be more secure than that of Bagherzandi et al.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"3 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131437757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A User-Defined Code Transformation Approach to Overlapping MPI Communication with Computation","authors":"Yasuharu Hayashi, H. Takizawa, Hiroaki Kobayashi","doi":"10.1109/CANDAR.2016.0094","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0094","url":null,"abstract":"The Xevolver framework has been developed to enable application programmers to define their own code translation rules outside of their codes so that they can express platform-specific optimizations separately from algorithm-level application codes. Due to the diversity of HPC node architectures, the Xevolver framework has so far mainly been used to separate node-level code optimizations from application codes. However, user-defined code transformation rules are also potentially useful for optimizing MPI applications without messing up their codes. Therefore, this paper shows a case study of using the Xevolver framework to optimize MPI applications through customizable code transformations without loss of high performance portability, and discusses the benefits of the framework.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134294112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introducing PSO for Optimal Packet Scheduling of Collective Communication","authors":"T. Yokota, K. Ootsu, Takeshi Ohkawa","doi":"10.1109/CANDAR.2016.0080","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0080","url":null,"abstract":"Interconnection network is an inevitable component that is responsible to the system's communication capability. It affects the system-level performance as well as the physical and logical structure of the parallel system. Many studies are reported to enhance the interconnection network technology, however, we have to further discuss remaining issues for building large-scale systems. One of the most important issues is congestion management. In an interconnection network, packets are transferred simultaneously, and the packets interfere to each other on the network. Congestion arises as a result of the interference among packets. Its fast spreading speed degrades communication performance drastically and it continues for long time. Thus, we should appropriately control the network to suppress the congested situation for maintaining the maximum performance. Many studies address the problem and present effective methods, however, the maximal performance in an ideal situation is not sufficiently clarified. Solving the ideal performance is, in general, an NP-hard problem. This paper introduces particle swarm optimization (PSO) method to overcome the problem. In this paper, we first formalize the optimization problem suitable for the PSO method and present three PSO methods for avoiding local minima. We furthermore introduce some non-PSO methods for comparison. Our preliminary evaluation results reveal high potentials of the PSO method.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115901041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Self-Optimizing Routing Algorithm in a 3-Dimensional Virtual Grid Network","authors":"Yonghwan Kim, Y. Katayama","doi":"10.1109/CANDAR.2016.0020","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0020","url":null,"abstract":"In this paper, we present a self-optimizing routing algorithm using local information only, in a three-dimensional (3D) virtual grid network. A virtual grid network is a well-known network model for its ease of designing algorithms and saving energy consumption. We consider a 3D virtual grid network which is obtained by virtually dividing a network into a set of unit cubes called cells. There is one specific node named a router at each cell, and each router is connected with the routers at adjacent cells. This implies that each router can communicate with 6 routers. We suppose one special node (named a source node) and one moving node (named a destination node) in a 3D virtual grid networks. We consider maintenance of an inter-cell communication path to a destination node from a source node. We propose an optimizing protocol in a 3D virtual grid network, which can transform an arbitrary given path (from a source node to a destination node) to the optimal (shortest) path using only local information (6 hops: 3 hops each back and forward along the routing path) of each router.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115152413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Hotel Recommendation System Based on Reviews: What Do You Attach Importance To?","authors":"Koji Takuma, Junya Yamamoto, S. Kamei, S. Fujita","doi":"10.1109/CANDAR.2016.0129","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0129","url":null,"abstract":"In looking for a hotel, it is common to access a list of hotels matching a query which is arranged in a descending order of the average evaluation value. Since such a list does not reflect the preference of users, for many inexperienced users, it takes too long to determine a hotel. In this paper, we focus on the evaluation values given by contributors whose preferences are similar to the user's preference. Such evaluation values may be more highly credible for the user. We proposes a method to extract the preference of review contributors from a collection of reviews. The extracted preferences are used for the hotel recommendation in such a way that the evaluation value given by a contributor to have preference similar to the user is given larger weight. The result of questionnaire-based evaluations indicates that our proposed method can recommend hotels that matches the user preference.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123096444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}