{"title":"Proactive Content Caching for Internet-of-Vehicles based on Peer-to-Peer Federated Learning","authors":"Zhengxin Yu, Jia Hu, G. Min, Han Xu, Jed Mills","doi":"10.1109/ICPADS51040.2020.00083","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00083","url":null,"abstract":"To cope with the increasing content requests from emerging vehicular applications, caching contents at edge nodes is imperative to reduce service latency and network traffic on the Internet-of-Vehicles (IoV). However, the inherent characteristics of IoV, including the high mobility of vehicles and restricted storage capability of edge nodes, cause many difficulties in the design of caching schemes. Driven by the recent advancements in machine learning, learning-based proactive caching schemes are able to accurately predict content popularity and improve cache efficiency, but they need gather and analyse users' content retrieval history and personal data, leading to privacy concerns. To address the above challenge, we propose a new proactive caching scheme based on peer-to-peer federated deep learning, where the global prediction model is trained from data scattered at vehicles to mitigate the privacy risks. In our proposed scheme, a vehicle acts as a parameter server to aggregate the updated global model from peers, instead of an edge node. A dual-weighted aggregation scheme is designed to achieve high global model accuracy. Moreover, to enhance the caching performance, a Collaborative Filtering based Variational AutoEncoder model is developed to predict the content popularity. The experimental results demonstrate that our proposed caching scheme largely outperforms typical baselines, such as Greedy and Most Recently Used caching.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125927650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shun Nie, Chaoshu Yang, Runyu Zhang, Wenbin Wang, Duo Liu, Xianzhang Chen
{"title":"WMAlloc: A Wear-Leveling-Aware Multi-Grained Allocator for Persistent Memory File Systems","authors":"Shun Nie, Chaoshu Yang, Runyu Zhang, Wenbin Wang, Duo Liu, Xianzhang Chen","doi":"10.1109/ICPADS51040.2020.00072","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00072","url":null,"abstract":"Emerging Persistent Memories (PMs) are promised to revolutionize the storage systems by providing fast, persistent data access on the memory bus. Therefore, persistent memory file systems are developed to achieve high performance by exploiting the advanced features of PMs. Unfortunately, the PMs have the problem of limited write endurance. Furthermore, the existing space management strategies of persistent memory file systems usually ignore this problem, which can cause that the write operations concentrate on a few cells of PM. Then, the unbalanced writes can damage the underlying PMs quickly, which seriously damages the data reliability of the file systems. However, existing wear-leveling-aware space management techniques mainly focus on improving the wear-leveling accuracy of PMs rather than reducing the overhead, which can seriously reduce the performance of persistent memory file systems. In this paper, we propose a Wear-Leveling-Aware Multi-Grained Allocator, called WMAlloc, to achieve the wear-leveling of PM while improving the performance for persistent memory file systems. WMAlloc adopts multiple heap trees to manage the unused space of PM, and each heap tree represents an allocation granularity. Then, WMAlloc allocates less-worn required blocks from the heap tree for each allocation. We implement the proposed WMAlloc in Linux kernel based on NOVA, a typical persistent memory file system. Compared with DWARM, the state-of-the-art and wear-leveling-aware space management technique, experimental results show that WMAlloc can achieve 1.52× lifetime of PM and 1.44× performance improvement on average.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128410437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marziyeh Nourian, Mostafa Eghbali Zarch, M. Becchi
{"title":"Optimizing Complex OpenCL Code for FPGA: A Case Study on Finite Automata Traversal","authors":"Marziyeh Nourian, Mostafa Eghbali Zarch, M. Becchi","doi":"10.1109/ICPADS51040.2020.00073","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00073","url":null,"abstract":"While FPGAs have been traditionally considered hard to program, recently there have been efforts aimed to allow the use of high-level programming models and libraries intended for multi-core CPUs and GPUs to program FPGAs. For example, both Intel and Xilinx are now providing toolchains to deploy OpenCL code onto FPGA. However, because the nature of the parallelism offered by GPU and FPGA devices is fundamentally different, OpenCL code optimized for GPU can prove very inefficient on FPGA, in terms of both performance and hardware resource utilization. This paper explores this problem on finite automata traversal. In particular, we consider an OpenCL NFA traversal kernel optimized for GPU but exhibiting FPGA-friendly characteristics, namely: limited memory requirements, lack of synchronization, and SIMD execution. We explore a set of structural code changes, custom and best-practice optimizations to retarget this code to FPGA. We showcase the effect of these optimizations on an Intel Stratix V FPGA board using various NFA topologies from different application domains. Our evaluation shows that, while the resource requirements of the original code exceed the capacity of the FPGA in use, our optimizations lead to significant resource savings and allow the transformed code to fit the FPGA for all considered NFA topologies. In addition, our optimizations lead to speedups up to 4x over an already optimized code-variant aimed to fit the NFA traversal kernel on FPGA. Some of the proposed optimizations can be generalized for other applications and introduced in OpenCL-to-FPGA compiler.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"351 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133133971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaopeng Fan, Xinchun Liu, Yang Wang, Youjun Wang, Jing Li
{"title":"Optimizing Multi-way Theta Join for Data Skew in Sub-second Stream Computing","authors":"Xiaopeng Fan, Xinchun Liu, Yang Wang, Youjun Wang, Jing Li","doi":"10.1109/ICPADS51040.2020.00068","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00068","url":null,"abstract":"In sub-second stream computing, the answer to a complex query usually depends on operations of aggregation or join on streams, especially multi-way theta join. Some attribute keys are not distributed uniformly, which is called the data intrinsic skew problem, such as taxi car plate in GPS trajectories and transaction records, or stock code in stock quotes and investment portfolios etc. In this paper, we define the concept of key redundancy for single stream as the degree of data intrinsic skew, and joint key redundancy for multi-way streams. We present an execution model for multi-way stream theta joins with a fine-grained cost model to evaluate its performance. We propose a solution named Group Join (GroJoin) to make use of key redundancy during transmission and execution in a cluster. GroJoin is adaptive to data intrinsic skew in the way that it depends on the grouping condition we find out, i.e., the selectivity of theta join results should be smaller than 25%. Experiments are carried out by our MS-Generator to produce multi-way streams, and the simulation results show that GroJoin can decrease at most 45% transmission overheads with different key redundancies and value-key proportionality coefficients, and reduce at most 70% query delay with different key distributions. We further implement GroJoin in Multi-Way Stream Theta Join by Spark Streaming. The experimental results demonstrate that there are about 40%∼50% join latency reduced after our optimization with a very small computation cost.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"14611 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132866370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Service Placement and Request Scheduling for Multi-SP Mobile Edge Computing Network","authors":"Zhengwei Lei, Hongli Xu, Liusheng Huang, Zeyu Meng","doi":"10.1109/ICPADS51040.2020.00014","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00014","url":null,"abstract":"Mobile edge computing(MEC), as an emerging computing paradigm, pushes services away from centralized remote cloud to distributed edge servers deployed by multiple service providers(SPs), improving user experience and reducing the communication burden on core network. However, this distributed computing architecture also brings some new challenges to the network. In multi-SP MEC system, a SP prefers to use edge servers deployed by itself instead of others, which not only improves service quality but also reduces processing cost. The service placement and request scheduling strategies directly affect the revenue of SPs. Since the service popularity changes over time and the resources of edge servers are limited, the network system needs to make decisions about service placement and request scheduling dynamically to provide better service for users. Owing to the lack of long-term prior knowledge and involving binary decision variables, how to place services and schedule requests to boost the profit of SPs is a challenging problem. We formally formalize this joint optimization problem and propose an efficient online algorithm. First, we invoke Lyapunov optimization technology to convert the long-term optimization problem into a series of subproblems, then a dual-decomposition algorithm is utilized to solve the subproblem. Experimental results show that the algorithm proposed in this paper achieves nearly optimal performance, and it raises 25% and 70% profit compared to greedy and Top-K algorithms, respectively.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133475102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Service Placement and Computation Offloading in Mobile Edge Computing: An Auction-based Approach","authors":"Lei Zhang, Zhihao Qu, Baoliu Ye, Bin Tang","doi":"10.1109/ICPADS51040.2020.00043","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00043","url":null,"abstract":"The emerging applications, e.g., virtual reality, online games, and Internet of Vehicles, have computation-intensive and latency-sensitive requirements. Mobile edge computing (MEC) is a powerful paradigm that significantly improves the quality of service (QoS) of these applications by offloading computation and deploying services at the network edge. Existing works on service placement in MEC usually ignore the impact of the different requirements of QoS among service providers (SPs), which is common in many applications such that online game requires extremely low latency and online video requires extremely large bandwidth. Considering the competitive relationship among SPs, we propose an auction-based resource allocation mechanism. We formulate the problem as a social welfare maximization problem to maximize effectiveness of allocated resources while maintaining economic robustness. According to our theoretical analysis, this problem is NP-hard, and thus it is practically impossible to derive the optimal solution. To tackle this, we design multiple rounds of iterative auctions mechanism (MRIAM), which divides resources into blocks and allocates them through multiple rounds of auctions. Finally, we conduct extensive experiments and demonstrate that our auction-based mechanism is effective in resource allocation and robust in economics.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133787108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TEEp: Supporting Secure Parallel Processing in ARM TrustZone","authors":"Zinan Li, Wenhao Li, Yubin Xia, B. Zang","doi":"10.1109/ICPADS51040.2020.00076","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00076","url":null,"abstract":"Machine learning applications are getting prevelent on various computing platforms, including cloud servers, smart phones, IoT devices, etc. For these applications, security is one of the most emergent requirements. While trusted execution environment (TEE) like ARM TrustZone has been widely used to protect critical prodecures including fingerprint authentication and mobile payment, state-of-the-art implementations of TEE OS lack the support for multi-threading and are not suitable for computing-intensive workloads. This is because current TEE OSes are usually designed for hosting security critical tasks, which are typically small and non-computing-intensive. Thus, most of TEE OSes do not support multi-threading in order to minimize the size of the trusted computing base (TCB). In this paper, we propose TEEp, a system that enables multi-threading in TEE without weakening security, and supports existing multi-threaded applications to run directly in TEE. Our design includes a novel multithreading mechanism based on the cooperation between the TEE OS and the host OS, without trusting the host OS. We implement our system based on OP-TEE and port it to two platforms: a HiKey 970 development board as mobile platform, and a Huawei Hi1610 ARM server as server platform. We run TensorFlow Lite on the development board and TensorFlow on the server for performance evaluation in TEE. The result shows that our system can improve the throughput of TensorFlow Lite on 5 models to 3.2x when 4 cores are available, with 13.5% overhead compared with Linux on average.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133800465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Oikonomou, Nikos Tziritas, G. Theodoropoulos, M. Koziri, Thanasis Loukopoulos, S. Khan
{"title":"Graph-based Approaches for the Interval Scheduling Problem","authors":"P. Oikonomou, Nikos Tziritas, G. Theodoropoulos, M. Koziri, Thanasis Loukopoulos, S. Khan","doi":"10.1109/ICPADS51040.2020.00091","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00091","url":null,"abstract":"One of the fundamental problems encountered by large-scale computing systems, such as clusters and cloud, is to schedule a set of jobs submitted by the users. Each job is characterized by resource demands, as well as start and completion time. Each job must be scheduled to execute on a machine having the required capacity between the start and completion time (referred as interval) of the job. Each machine is defined by a parallelism parameter g that indicates the maximum number of jobs that can be processed by the machine, in parallel. The above problem is referred to as the interval scheduling problem with bounded parallelism. The objective is to minimize the total busy time of all machines. Majority of the solutions proposed in the literature consider homogeneous set of jobs and machines that is a simplified assumption as in practice, heterogeneous jobs and machines are frequently encountered. In this article, we tackle the aforesaid problem with a set of heterogeneous jobs and machines. A major contribution of our work is that the problem is addressed in a novel way by combining a graph-based approach and a dynamic programming approach which is based on a variation of bin packing problem. A greedy algorithm is also proposed by employing only a graph-based approach at the aim to reduce the computational complexity. Experimental results show that the proposed algorithms can significantly reduce the cumulative busy interval over all machines compared with state-of-the-art algorithms proposed in the literature.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128285345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yufeng Xing, Lei Guo, Zongchao Xie, Lei Cui, Longxiang Gao, Shui Yu
{"title":"Non-Technical Losses Detection in Smart Grids: An Ensemble Data-Driven Approach","authors":"Yufeng Xing, Lei Guo, Zongchao Xie, Lei Cui, Longxiang Gao, Shui Yu","doi":"10.1109/ICPADS51040.2020.00078","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00078","url":null,"abstract":"Non technical losses (NTL) detection plays a crucial role in protecting the security of smart grids. Employing massive energy consumption data and advanced artificial intelligence (AI) techniques for NTL detection are helpful. However, there are concerns regarding the effectiveness of existing AI-based detectors against covert attack methods. In particular, the tampered metering data with normal consumption patterns may result in low detection rate. Motivated by this, we propose a hybrid data-driven detection framework. In particular, we introduce a wide & deep convolutional neural networks (CNN) model to capture the global and periodic features of consumption data. We also leverage the maximal information coefficient algorithm to analysis and detect those covert abnormal measurements. Our extensive experiments under different attack scenarios demonstrate the effectiveness of the proposed method.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114851796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Customized Reinforcement Learning based Binary Offloading in Edge Cloud","authors":"Yuepeng Li, Lvhao Chen, Deze Zeng, Lin Gu","doi":"10.1109/ICPADS51040.2020.00055","DOIUrl":"https://doi.org/10.1109/ICPADS51040.2020.00055","url":null,"abstract":"To tackle the computation resource poorness on the end devices, task offloading is developed to reduce the task completion time and improve the Quality-of-Service (QoS). Edge cloud facilitates such offloading by provisioning resources at the proximity of the end devices. Modern applications are usually deployed as a chain of subtasks (e.g., microservices) where a special offloading strategy, referred as binary offloading, shall be applied. Binary offloading divides the chain into two parts, which will be executed on end device and the edge cloud, respectively. The offloading point in the chain therefore is critical to the QoS in terms of task completion time. Considering the system dynamics and algorithm sensitivity, we apply Q-learning to address this problem. In order to deal with the late feedback problem, a reward rewind match strategy is proposed to customize Q-learning. Trace-driven simulation results show that our customized Q-learning based approach is able to achieve significant reduction on the total execution time, outperforming traditional offloading strategies and non-customized Q-learning.","PeriodicalId":196548,"journal":{"name":"2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124625239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}