{"title":"Revisiting Concurrent Separation Logic and Operational Semantics","authors":"Pedro Soares, A. Ravara, S. Sousa","doi":"10.1109/PDP.2015.85","DOIUrl":"https://doi.org/10.1109/PDP.2015.85","url":null,"abstract":"We present a new soundness proof of Concurrent Separation Logic (CSL) based on a structural operational semantics (SOS). We build on two previous proofs and develop new auxiliary notions to achieve the goal. One uses a denotational semantics (based on traces). The other is based on SOS, but was obtained only for a fragment of the logic - the Disjoint CSL - which disallows modifying shared variables between concurrent threads. In this work, we lift such restriction, proving the soundness of full CSL with respect to a SOS. Thus contributing to the development of tools able of ensuring the correctness of realistic concurrent programs. Moreover, given that we used SOS, such tools can be well-integrated in programming environments and even incorporated in compilers.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121822895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault Tolerant Routing for Hierarchically Organized Networks-on-Chip","authors":"G. Schley, M. Radetzki","doi":"10.1109/PDP.2015.36","DOIUrl":"https://doi.org/10.1109/PDP.2015.36","url":null,"abstract":"With increasing number of processing elements on a single chip, the size of the Network-on-Chip connecting the processing elements increases accordingly. This leads to new challenges for components such as fault diagnosis and routing because they do not scale with the size of the Network-on-Chip, e.g. regarding the required communication overhead or their implementation costs. A measure to avoid these scaling problems is to organize future Networks-on-Chip hierarchically. This paper presents a fault tolerant routing for Networks-on-Chip organized into hierarchical units where each unit manages its own routing. In case of link faults or failure of switches, the proposed approach enables the online adaptation of routing locally within each unit while deadlock freedom is globally ensured in the network. Experimental results of our approach for a 16x16 network show a speedup of three for routing reconfiguration compared to state-of-the-art approach. At the same time our approach achieves a memory reduction for routing tables by a factor of seven compared to flat network tables.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116690676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryuta Kawano, S. Tade, I. Fujiwara, Hiroki Matsutani, H. Amano, M. Koibuchi
{"title":"Optimized Core-Links for Low-Latency NoCs","authors":"Ryuta Kawano, S. Tade, I. Fujiwara, Hiroki Matsutani, H. Amano, M. Koibuchi","doi":"10.1109/PDP.2015.15","DOIUrl":"https://doi.org/10.1109/PDP.2015.15","url":null,"abstract":"In recent many-core architectures, the number of cores has been steadily increasing and thus the network latency between cores becomes an important issue for parallel application programs. Because packet-switched network structures are widely used for core-to-core communications, a topology among cores has a major impact on the network latency. It has been reported that a small-world Network-on-Chip that adds links between randomly-selected routers on a regular router topology is effective for reducing the network latency. In this study, we extend this framework by connecting multiple links between a single core and quasi-optimally selected neigh boring routers to form multiple links from each core on a 2D MESH router topology. Results obtained by a flit-level discrete event simulator show that our optimized core-link topologies can achieve the average latency up to 48% lower than that of baseline topologies. Furthermore, full-system CMP simulation results show that by using optimized core-links we can improve the application execution time on the NAS Parallel Benchmarks by up to 10.1%.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124913515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Implementations of the Particle Filter Algorithm for Android Mobile Devices","authors":"Alejandro Acosta, F. Almeida","doi":"10.1109/PDP.2015.93","DOIUrl":"https://doi.org/10.1109/PDP.2015.93","url":null,"abstract":"The advent of emergent System-on-Chip (SoCs) and multiprocessor System-on-Chip (MPSocs) opens a new era on the small mobile devices (Smartphones, Tablets, ) in terms of computing capabilities and applications to be addressed. Given the ability of these devices to interact with the real world through the camera, is mandatory the development of efficient algorithms related to image processing and computer vision. We present a parallel implementation on mobile Android devices of the Particle Filter algorithm. We developed three different version of this algorithm. A Java sequential implementation and two Render script parallel versions, an ad-hoc implementation and a Paralldroid generated implementation. The results obtained by the parallel versions present an important speedup and a high accurate with a high processing rate of frame per seconds.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126131741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of the Mobile Malware Detection Approaches","authors":"Anastasia Skovoroda, D. Gamayunov","doi":"10.1109/PDP.2015.54","DOIUrl":"https://doi.org/10.1109/PDP.2015.54","url":null,"abstract":"Mobile devices such as smartphones and tablets are extremely widespread nowadays. These devices provide users with a wide range of applications for commercial and public use. However, the contents of applications and their full behavior are not always properly reviewed which makes the presence of malware in the application marketplaces possible. Mobile security researchers have proposed many effective solutions for detection and prevention of malicious applications on mobile devices. This paper provides a comprehensive review and comparison of the most recent (dated mostly 2011 -2013) approaches to mobile malware detection.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128366077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elena Giachino, Ivan Lanese, C. A. Mezzina, F. Tiezzi
{"title":"Causal-Consistent Reversibility in a Tuple-Based Language","authors":"Elena Giachino, Ivan Lanese, C. A. Mezzina, F. Tiezzi","doi":"10.1109/PDP.2015.98","DOIUrl":"https://doi.org/10.1109/PDP.2015.98","url":null,"abstract":"Causal-consistent reversibility is a natural way of undoing concurrent computations. We study causal-consistent reversibility in the context of μKlaim, a formal coordination language based on distributed tuple spaces. We consider both uncontrolled reversibility, suitable to study the basic properties of the reversibility mechanism, and controlled reversibility based on a rollback operator, more suitable for programming applications. The causality structure of the language, and thus the definition of its reversible semantics, differs from all the reversible languages in the literature because of its generative communication paradigm. In particular, the reversible behavior of μKlaim read primitive, reading a tuple without consuming it, cannot be matched using channel-based communication. We illustrate the reversible extensions of μKlaim on a simple, but realistic, application scenario.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129037586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NanoCheckpoints: A Task-Based Asynchronous Dataflow Framework for Efficient and Scalable Checkpoint/Restart","authors":"J. Moreno, O. Unsal, Jesús Labarta, A. Cristal","doi":"10.1109/PDP.2015.17","DOIUrl":"https://doi.org/10.1109/PDP.2015.17","url":null,"abstract":"In this paper, we present NanoCheckpoints which is a lightweight software-based checkpoint/restart scheme for task-parallel HPC applications. We leverage OmpSs, a task-based OpenMP derivative programming model (PM) and its Nanos asynchronous dataflow runtime. NanoCheckpoints achieves minimal overheads by check pointing only tasks' inputs which are available for free in the OmpSs PM. We evaluate NanoCheckpoints by both pure task-parallel shared memory benchmarks (up to 16 cores) and hybrid OmpSs+MPI applications (up to 1024 cores). The results indicate that NanoCheckpoints has on average overhead 3% for shared memory benchmarks. The dataflow semantics of Nanos, where both check pointing and error recovery are asynchronous, allows NanoCheckpoints to scale at large core counts even when high error rates are present. For hybrid OmpSs+MPI benchmarks, NanoCheckpoints has very low overhead, on average 2%, and high scalability.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125845828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A High-Performance Media Streaming Architecture Based on KVM","authors":"Woo-Yeong Jeong, Youngjae Lee, Jin-Soo Kim","doi":"10.1109/PDP.2015.90","DOIUrl":"https://doi.org/10.1109/PDP.2015.90","url":null,"abstract":"A media streaming server can be implemented on a virtual machine for the ease of resource management. However, simply running a media streaming server on a virtual machine has two problems, the duplicate data in file caches of virtual machines and the performance degradation caused by the virtualization overhead. In order to resolve these problems, this paper proposes a high-performance media streaming architecture based on KVM. First, we implement a shared cache among virtual machines in order to eliminate the duplicate cached data. Second, the send file operation is offloaded to the hypervisor to reduce the virtualization overhead in I/O operations. Our evaluations with D-DASH datasets show that the performance of a media streaming server in the proposed architecture is increased by up to 30% as compared to that of the conventional media streaming server that simply runs on a virtual machine.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115803706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems","authors":"M. Diener, E. Cruz, P. Navaux","doi":"10.1109/PDP.2015.11","DOIUrl":"https://doi.org/10.1109/PDP.2015.11","url":null,"abstract":"In parallel architectures that have a Non-Uniform Memory Access (NUMA) behavior, the mapping of memory pages to NUMA nodes influences the performance of parallel applications. In order to improve traditional data mapping policies, two basic strategies can be employed: optimizing locality or balance of memory accesses. In a locality-based policy, memory pages are mapped to nodes that access the page the most. In a balance-based policy, memory pages are mapped such that the number of memory accesses resolved by each memory controller is similar. In this paper, we perform an in-depth exploration of these data mapping policies on the performance of parallel applications. We introduce metrics that describe their memory access behavior and evaluate their suitability for data mapping. We also present new mapping policies that focus on locality, balance or both. These policies were evaluated on three different NUMA architectures with applications from the NAS-OMP and PARSEC benchmark suites. Results show that the performance improvements of each policy depend on the characteristics of the applications and machines. Choosing the wrong policy can actually hurt the performance compared to the default first-touch mapping. Compared to traditional mapping policies and to policies that only focus on either locality or balance, taking into account both locality and balance results in the highest improvements. Furthermore, it avoids the performance reduction caused by the wrong data mapping.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115496402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Simulation of Radiological Images Using CUDA Technology","authors":"Elena Gianaria, E. Gallio","doi":"10.1109/pdp.2015.55","DOIUrl":"https://doi.org/10.1109/pdp.2015.55","url":null,"abstract":"Radiography is nowadays a common medical exam, used for diagnosing several diseases, but has the disadvantage of exposing people to a dose of radiation. For this reason, it is important to study methods for reducing such dose. In this paper we present a digital X-ray simulation tool that simulates a radiological exam on a virtual patient. The software builds a physically-realistic radiography in real-time thanks to GPGPU programming and CUDA technology. It aims to be used in radiological departments, for testing new dose reduction procedures and training health operators. We validated the software comparing the results with real radiographic images, and we tested it on different graphic cards obtaining running times that are 35 to 250 times faster than the corresponding CPU implementation.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115847454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}