{"title":"Load Balancing Optimization for Transformer in Distributed Environment","authors":"Delu Ma, Zhou Lei, Shengbo Chen, Peng-Cheng Wang","doi":"10.1109/ICPADS53394.2021.00109","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00109","url":null,"abstract":"In recent years, the demand for artificial intelligence applications has increased dramatically. Complex models can promote machine learning to achieve excellent results, but computing efficiency has gradually reached a bottleneck. Therefore, more researchers are exploring the improvement of the efficiency of intelligent computing systems. Distributed machine learning can improve the efficiency of model training and inference, but problems such as communication delay and load imbalance between computing nodes still exist. In the multi-GPU distributed computing environment, this paper takes the vision field algorithm VIT (vision transformer) as the optimization object, which has the advantage of convenient parallel training, and proposes several related solutions. Firstly, the parameter server is used as the system logic architecture and in order to reduce the idleness of the computing devices during the training process, the device working status query mechanism is designed to realize load balancing. Secondly, combined with the pre-trained small VIT algorithm model, semi-asynchronous communication method is proposed to reduce the communication overhead of computing devices and accelerate global convergence. The results of this experiment carried out in the existing distributed environment has demonstrated that compared with the existing synchronization method, the computational efficiency has been improved well under the premise of slightly reducing the accuracy.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"45 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132893831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziyu Wang, Cong Sun, Qingsong Yao, Duo Ding, Jianfeng Ma
{"title":"Delica: Decentralized Lightweight Collective Attestation for Disruptive IoT Networks","authors":"Ziyu Wang, Cong Sun, Qingsong Yao, Duo Ding, Jianfeng Ma","doi":"10.1109/ICPADS53394.2021.00051","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00051","url":null,"abstract":"The recent advance of the Internet of Things and autonomous systems brings massive security threats to the network of low-end embedded devices. Remote attestation is a hardware-assisted technique to verify the integrity and trustworthiness of software on remote devices. The recently proposed collective remote attestations have focused on attesting to the highly dynamic and disruptive device networks. However, they are generally inefficient due to the homogeneous node setting for the robustness of attestation reports aggregation. In this work, we propose Delica, an efficient and robust collective attestation framework for dynamic and disruptive networks. We differentiate the role of provers and aggregators to limit the redundant communications and attestation evidence aggregations for efficiency. Delica is capable of mitigating DoS attacks and detecting physical and black-hole attacks. The experimental results and analysis show that Delica can greatly reduce the per-node computational cost and reduce the network attestation cost by over 75% compared with the state-of-the-art approaches on disruptive networks.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"462 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129567338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RTPoW: A Proof-of-Work Consensus Scheme with Real-Time Difficulty Adjustment Algorithm","authors":"Weijia Feng, Zhenfu Cao, Jiachen Shen, Xiaolei Dong","doi":"10.1109/ICPADS53394.2021.00035","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00035","url":null,"abstract":"Bitcoin, the first decentralized cryptocurrency system, uses a simple but effective difficulty adjustment algorithm to stabilize its average time of the block creation at 10 minutes. Over time, the volatility of the Bitcoin price has become higher and higher, and it causes the total hashrate (the hash power of the entire network) constantly fluctuating. Both facts and our experimental results prove that Bitcoin's difficulty adjustment algorithm cannot respond in time while the total hashrate is constantly fluctuating. Hence, we propose a consensus protocol with a real-time difficulty adjustment algorithm, RTPoW. RTPoW allows the blockchain to adjust the difficulty target of each block by predicting the real-time total hashrate, so the block time can remain stable even if the total hashrate is wildly fluctuating. To evaluate the effect of RTPoW, we implemented a simulator of an experimental environment and tested our algorithm. The results obtained have confirmed its effectiveness and stability.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124690969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sensitivity loss training based implicit feedback","authors":"Kunyu Li, Nan Wang, Xinyu Liu","doi":"10.1109/ICPADS53394.2021.00036","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00036","url":null,"abstract":"In recommender systems, due to the lack of explicit feedback features, datasets with implicit feedback are always accustomed to train all samples without separating them during model training, without considering the non-consistency of samples. This leads to a significant decrease in sample utilization and creates challenges for model training. Also, little work has been done to explore the intrinsic laws implied in the implicit feedback dataset and how to effectively train the implicit feedback data. In this paper, we first summarize the variation pattern of loss with model training for different rating samples in the explicit feedback dataset, and find that model training is highly sensitive to the ratings. Second, we design an adaptive hierarchical training function with dynamic thresholds that can effectively distinguish different rating samples in the dataset, thus optimizing the implicit feedback dataset into an explicit feedback dataset to some extent. Finally, to better learn samples with different ratings, we also propose an adaptive hierarchical training strategy to obtain better training results in the implicit feedback dataset. Extensive experiments on three datasets show that our approach achieves excellent performance and greatly improves the performance of the model.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122014212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting Byzantine Protocols in Large Sparse Networks with High System Assumption Coverage","authors":"Shaolin Yu, Jihong Zhu, Jiali Yang, Yulong Zhan","doi":"10.1109/ICPADS53394.2021.00097","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00097","url":null,"abstract":"To improve the overall efficiency and reliability of Byzantine protocols in large sparse networks, we propose a new system assumption for developing multi-scale fault-tolerant systems, with which several kinds of multi-scale Byzantine protocols are developed in large sparse networks with high system assumption coverage. By extending the traditional Byzantine adversary to the multi-scale adversaries, it is shown that efficient deterministic Byzantine broadcast and Byzantine agreement can be built in logarithmic-degree networks. Meanwhile, it is shown that the multi-scale adversary can make a finer trade-off between the system assumption coverage and the overall efficiency of the Byzantine protocols, especially when a small portion of the low-layer small-scale protocols are allowed to fail arbitrarily. With this, efficient Byzantine protocols can be built in large sparse networks with high system reliability.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121059376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FerryLink: Combating Link Degradation for Practical LPWAN Deployments","authors":"Jing Yang, Zhenqiang Xu, Jiliang Wang","doi":"10.1109/ICPADS53394.2021.00077","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00077","url":null,"abstract":"Low-Power Wide-Area Networks (LPWANs) have been shown as a promising technique to provide long-range low-power communication for large-scale IoT devices. In this paper, however, we show the poor performance of LoRa network due to its link diversity in macro- and micro- scope through one-month measurements in an area of $2.2 kmtimes 1.5 km$. We present FerryLink, which exploits such link diversity and leverages peer nodes to ferry data of weak links, to combat performance degradation. Traditional arts (e.g., building multi-hop networks) are inefficient or too heavyweight for the current star-topology-based LoRa network. FerryLink thus proposes a novel ferry mechanism combining RSSI sampling and Channel Activity detection(CAD) to suit multiple orthogonal transmission parameters of LoRa. To reduce energy overhead, FerryLink leverages convention windows for coarse-grained transmission synchronization between two coupled nodes. Finally, FerryLink utilizes the orthogonality of uplink and downlink signals to avoid data redundancy due to the ferry mechanism, maintaining comparable capacity with original LPWANs. We build FerryLink on top of LoRaWANwith commercial off-the-shelf hardware. The extensive evaluation results show that FerryLink effectively improves the packet delivery rate (PDR) of LoRa nodes (to over 95%), achieves 2x less energy overhead, and increases communication range by 50% compared with the original LoRaWAN.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129795986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring and Modeling Multipath of Wi-Fi to Locate People in Indoor Environments","authors":"Xiaoyu Ma, Hui He, Hui Zhang, Wei Xi, Zuhao Chen, Jizhong Zhao","doi":"10.1109/ICPADS53394.2021.00029","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00029","url":null,"abstract":"With the rapid development of the Internet of Things (IoT) technology, the position information of indoor people has become an indispensable factor in most fields. Most existing indoor positioning schemes require people to keep moving to detect significant variance of the signal as the location feature. Hence, this paper proposes a passive indoor positioning system based on commodity Wi-Fi called Wisite, which can implement indoor multipath signal measurement and static person positioning modeling. The biggest challenge is how to detect the dynamic features in the reflection path of the static person to achieve target path matching. To address this issue, Wisite proposes a MUSIC expectation-maximization (MEM) joint parameter estimation algorithm to estimate and enhance the indoor multipath parameters. Then, a dynamic path matching model based on signal change enhancement (SCE) is proposed to enhance the signal changes caused by human activities, which can amplify the weak signal changes introduced by human respiration when a person is in a static state. Finally, the multipath geometric positioning model is used to calculate the person's position. We implement Wisite using commercial off-the-shelf (COTS) IEEE 802.11n devices and evaluate its performance via extensive experiments in typical real-world scenes. The results show that Wisite outperforms the comparison approaches in estimating accuracy and effectiveness with the average indoor positioning error is less than 0.65cm.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130787698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Push for HTTP Adaptive Streaming with Deep Reinforcement Learning","authors":"Haipeng Du, Danfu Yuan, Weizhan Zhang, Q. Zheng","doi":"10.1109/ICPADS53394.2021.00112","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00112","url":null,"abstract":"HTTP adaptive streaming (HAS) has revolutionized video distribution over the Internet due to its prominent benefit of outstanding quality of experience (QoE). Due to the pull-based nature of HTTP/1.1, the client must make requests for each segment. This usually causes high request overhead and low bandwidth utilization and finally reduces QoE. Currently, research into the HAS adaptive bitrate algorithm typically focuses on the server-push feature introduced in the new HTTP standard, which enables the client to receive multiple segments with a single request. Every time a request is sent, the client must simultaneously make decisions on the number of segments the server should push and the bitrate of these future segments. As the decision space complexity increases, existing rule-based strategies inevitably fail to achieve optimal performance. In this paper, we present D-Push, an HAS framework that combines deep reinforcement learning (DRL) techniques. Instead of relying on inaccurate assumptions about the environment and network capacity variation models, D-Push trains a DRL model and makes decisions by exploiting the QoE of past decisions through the training process and adapts to a wide range of highly dynamic environments. The experimental results show that D-Push outperforms the existing state-of-the-art algorithm by 12%-24% in terms of the average QoE.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130930378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Highly Scalable Parallel Checksums","authors":"Christian Siebert","doi":"10.1109/ICPADS53394.2021.00107","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00107","url":null,"abstract":"Checksums are used to detect errors that might occur while storing or communicating data. Checking the integrity of data is well-established, but only for smaller data sets. Contrary, supercomputers have to deal with huge amounts of data, which introduces failures that may remain undetected. Therefore, additional protection becomes a necessity at large scale. However, checking the integrity of larger data sets, especially in case of distributed data, clearly requires parallel approaches. We show how popular checksums, such as CRC-32 or Adler-32, can be parallelized efficiently. This also disproves a widespread belief that parallelizing aforementioned checksums, especially in a scalable way, is not possible. The mathematical properties behind these checksums enable a method to combine partial checksums such that its result corresponds to the checksum of the concatenated partial data. Our parallel checksum algorithm utilizes this combination idea in a scalable hierarchical reduction scheme to combine the partial checksums from an arbitrary number of processing elements. Although this reduction scheme can be implemented manually using most parallel programming interfaces, we use the Message Passing Interface, which supports such a functionality directly via non-commutative user-defined reduction operations. In conjunction with the efficient checksum capabilities of the zlib library, our algorithm can not only be implemented conveniently and in a portable way, but also very efficiently. Additional shared-memory parallelization within compute nodes completes our hybrid parallel checksum solutions, which show a high scalability of up to 524,288 threads. At this scale, computing the checksums of 240 TiB data took only 3.4 seconds for CRC-32 and 2.6 seconds for Adler-32. Finally, we discuss the APES application as a representative of dynamic supercomputer applications. Thanks to our scalable checksum algorithm, even such applications are now able to detect many errors within their distributed data sets.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124986170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Restore Performance of Deduplication Systems via a Greedy Rewriting Scheme","authors":"Lifang Lin, Yuhui Deng, Yi Zhou","doi":"10.1109/ICPADS53394.2021.00042","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00042","url":null,"abstract":"Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122839659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}