{"title":"[Title page i]","authors":"","doi":"10.1109/candarw.2018.00001","DOIUrl":"https://doi.org/10.1109/candarw.2018.00001","url":null,"abstract":"","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130851610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiroki Okada, Takayuki Shiroma, Celimuge Wu, T. Yoshinaga
{"title":"A Color-Based Cooperative Caching Strategy for Time-Shifted Live Video Streaming","authors":"Hiroki Okada, Takayuki Shiroma, Celimuge Wu, T. Yoshinaga","doi":"10.1109/CANDARW.2018.00030","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00030","url":null,"abstract":"This paper proposes an efficient in-network caching strategy to reduce traffic volume for pseudo-live video streaming networks. Pseudo-live streaming is a technique that records video data as fragmented files in a cache server and reproduces them by continuously combining the fragments. The recorded video data can be treated as static video files. Therefore, in-network caching techniques could efficiently reduce network traffic by carefully managing cache servers and important contents arrangement. The proposed in-network caching strategy tries to cache popular chunking video fragments with taking account of the freshness of the data in a cooperative way among distributed cache servers. We extend a color-based cooperative cache algorithm, which is recently proposed for contents delivery networks, to effectively treat time-shifting video chunks. The extension strategy determines an optimal cache placement before starting content delivery based on the generation of the data and its multiple video quality structures of the real-time streaming. In our experiment, traffic volume is calculated from access probability and the number of hops of communication, and a content arrangement is selected in such a way that the total communication distance in the network becomes the smallest. We conduct a network simulation with traffic patterns that are generated from content access probability of gamma distribution for a three-layer hierarchical structure network. Simulation results show that the traffic volume is reduced up to 50% and 40% compared with conventional LRU and LFU methods, respectively.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115192580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Pipeline Implementation for Dynamic Programming on GPU","authors":"Makoto Miyazaki, Susumu Matsumae","doi":"10.1109/CANDARW.2018.00063","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00063","url":null,"abstract":"In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU. As an example, we parallelize a typical DP program where each element of its solution table is calculated in order by semigroup computations among some already computed elements in the table. We implement the DP program on GPU in a pipeline fashion, i.e., we use GPU cores for supporting pipeline-stages so that many elements of the solution table are partially computed in parallel at one time. Our implementation can determine one output value per one computational step, which is faster than the standard parallel implementation whose strategy is to speed up each semi-group computations. We evaluate the performance of our implementation and verify its speedup.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115884286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Evaluation of Dynamic Cell Allocation Cache Using Cycle Accurate Simulator","authors":"Masato Kitou, Takahiro Sasaki, K. Ohno","doi":"10.1109/CANDARW.2018.00109","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00109","url":null,"abstract":"Multi-core processors are widely used to improve performance of computer systems. To achieve both high performance and low power consumption, we propose Cell-allocation cache. It allocates cache spaces called 'Cell' which is smaller than a way and dynamically assigns it to a core. However, it is only evaluated with trace-driven simulation, so evaluations of execution speed and energy consumption in realistic environments are not presented. In this paper, we implement Cell-allocation cache on the cycle accurate simulator Gem5 and evaluates more detailed performance. We also propose evaluation methodology by mixing plural benchmark programs to evaluate performance under unbalanced load works.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125860910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load Balancing in P2P Video Streaming Systems with Service Differentiation","authors":"Yuta Yamada, S. Fujita","doi":"10.1109/CANDARW.2018.00105","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00105","url":null,"abstract":"This paper considers Peer-to-Peer (P2P) video streaming systems with service differentiation, in which the quality level of a video stream is differentiated according to the authority level of the subscribers. In such P2P video streaming systems, peers with high authority level are imposed a heavy load, since high quality video generally consumes more upload capacity. To resolve such problem, we encode video stream by using a layered encoding and organize a overlay for each authority level and deliver the appropriate streams by using more capacity of peers with lower authority level. However, in the method, peers with low authority level are imposed a heavy load, so that latency of forwarding stream correspond to low authority is increased. In this paper, we propose a load balancing method for such P2P systems. In our proposed method, by using more capacity of peers with high authority level, streams corresponded to low authority level are forwarded. so that more peers receive streams within less hop. The performance of the proposed scheme is evaluated by simulation. The result of simulations shows that the latency which peers receive sub-streams with the proposed method is the smallest in all methods.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124863508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takeo Ueki, Keisuke Iwai, T. Matsubara, T. Kurokawa
{"title":"AQSS: Accelerator of Quantization Neural Networks with Stochastic Approach","authors":"Takeo Ueki, Keisuke Iwai, T. Matsubara, T. Kurokawa","doi":"10.1109/CANDARW.2018.00033","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00033","url":null,"abstract":"In recent years, Deep Neural Network (DNN)s have become widely spread. Several high-throughput hardware implementations for DNNs have been proposed. One of the key points for hardware implementations of DNNs is to reduce their power consumption because DNNs require a lot of product-sum operations. Previous papers presented some accelerators using logarithmic quantization to reduce the power consumption by replacing multipliers with shifters. However, most of them are implemented only for inference. In this paper, an Accelerator of Quantization neural networkS with Stochastic approach (AQSS) is proposed. It uses a stochastic approach for logarithmic quantization, and enables DNNs to infer or to learn using logarithmic quantization. A prototype of AQSS is implemented on a field-programmable gate array (FPGA) (Intel Arria 10 GX 1150) and synthesized with Intel Quartus Prime 17.1 Standard Edition. As a result, it is confirmed to have 1.8 times the power efficiency of GPU.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128660393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mass Spring Model for String Simulation with Stress-Strain Handling","authors":"R. Durikovic, E. Siebenstich","doi":"10.1109/CANDARW.2018.00010","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00010","url":null,"abstract":"In this paper we present an approach for real-time physically semi-realistic animation of strings which directly works on the positions. The main advantage of a position based dynamics is its controllability. Instability problems of explicit integration schemes can be avoided. Specifically, we offer the following three contributions. We introduce the non-elongating and non-stretchable mass spring dynamics model based on Position Based Dynamics to simulate 1D string. We introduce a method for propagating the twisting angle increments associated with each segment. In addition, collision constraints can be handled easily, penetrations can be resolved by regularly spreading the spheres along the segment followed by projection of particles to valid locations. Proposed strain limiting constraint can handle the strings fixed in multiple locations contrary to single fixed side as is common for hair models. The use of constraints provides an efficient treatment for stiff twisting and non-stretchable mass spring dynamics model. We demonstrate that our method can produce visually plausible animations.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128988041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Jingu, Kohta Shigenobu, K. Ootsu, Takeshi Ohkawa, T. Yokota
{"title":"An Implementation of LLVM Pass for Loop Parallelization Based on IR-Level Directives","authors":"K. Jingu, Kohta Shigenobu, K. Ootsu, Takeshi Ohkawa, T. Yokota","doi":"10.1109/CANDARW.2018.00097","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00097","url":null,"abstract":"Currently, multicore processors are widely used, and processing performance can be improved on many machines by exploiting thread level parallelism. However, for parallelizing a program, it takes much time and effort to analyze effect of parallel processing and to rewrite the source code, so sequential programs still remain around the world, and they can not fully bring out performance of multicore processors. To improve execution performance of existing sequential programs by effectively utilizing processing power of multicore processors, it is quite useful to automatically and directly parallelize a machine language code (binary code) by using binary translation. Based on this background, an automatic parallel processing system that parallelizes and optimizes a sequential binary code using LLVM was proposed. In this paper, we introduce parallelization directives for LLVM IR (Intermediate Representation), and implement an LLVM compiler pass for parallel code generation based on the directives. Our research makes it possible to implement analysis and code generation as versatile programs, and to generate an optimal code according to the multiple analysis results. Evaluation results show that the implemented pass can generate a parallelized IR code and can achieve speedup as highly as the parallelization using source code does.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123693041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating TicToc with Parallel Logging","authors":"Yasuhiro Nakamura, H. Kawashima, O. Tatebe","doi":"10.1109/CANDARW.2018.00028","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00028","url":null,"abstract":"A transactional system consists of a concurrency control system and a recovery system. TicToc is one of the state of the art concurrency control protocols today, but it lacks recovery system. We studied the ways to integrate TicToc and recovery system. For efficiency, we adopted a parallel write ahead logging scheme for the recovery system. There are two methods to optimize the logging. First method is early lock release which executes lock release early on data objects. Second method is group commit which executes batched logs transfer to storage from memory. We integrated a transactional system consisted by TicToc and P-WAL logging system assuming non-volatile memory. We found that the two optimization methods incur performance degradation when storage access latency is equivalent to that of NVRAM.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130561400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Relationship between Timeout and Latency of Connection Re-establishment for Control Packet Loss Scenario in Bluetooth MANETs","authors":"Temma Ohtani, Eitaro Kohno, Y. Kakuda","doi":"10.1109/CANDARW.2018.00016","DOIUrl":"https://doi.org/10.1109/CANDARW.2018.00016","url":null,"abstract":"Bluetooth MANETs, which consist of Bluetooth-enabled terminals, are a prospective methodology for mobile ad hoc networks (MANETs). Since Bluetooth is a connection-oriented and a low-power-consumption communication method, terminals must execute time-consuming connection establishment procedures in advance. We have to solve the following two problems for Bluetooth MANETs: (1) since terminals move within fields, terminals must establish their connection within a limited time. In addition, (2) since the communication area of Bluetooth is narrower than that of other technologies, established connections are easily disrupted. In order to solve problem (1), a low-latency connection establishment method has been proposed. However, there is no countermeasure for problem (2). Therefore, the possibility of rapid re-establishment for connections must be investigated. In this paper, we have proposed a rapid connection re-establishment method for control packet-loss scenarios and have investigated the effect of adjusting the configuration of the connection-establishment timeout parameter. We also have developed a Raspberry Pi-based application for our proposed method and have conducted experiments. We confirmed that our proposed method could re-establish connections in control packet-loss scenarios as the connection-establishment timeout decreases. However, we found that the time it takes to re-establish a connection is significantly long if the connection-establishment timeout configuration is 1 second.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132200631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}