2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)最新文献_第2页

Minerva: Rethinking Secure Architectures for the Era of Fabric-Attached Memory Architectures Minerva:重新思考Fabric-Attached Memory架构时代的安全架构

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00033

Mazen Al-Wadi, Rujia Wang, David A. Mohaisen, C. Hughes, S. Hammond, Amro Awad

{"title":"Minerva: Rethinking Secure Architectures for the Era of Fabric-Attached Memory Architectures","authors":"Mazen Al-Wadi, Rujia Wang, David A. Mohaisen, C. Hughes, S. Hammond, Amro Awad","doi":"10.1109/ipdps53621.2022.00033","DOIUrl":"https://doi.org/10.1109/ipdps53621.2022.00033","url":null,"abstract":"Fabric-attached memory (FAM) is proposed to enable the seamless integration of directly accessible memory modules attached to the shared system fabric, which will provide future systems with flexible memory integration options, mitigate underutilization, and facilitate data sharing. Recently proposed interconnects, such as Gen-Z and Compute Express Link (CXL), define security, correctness, and performance requirements of fabric-attached devices, including memory. These initiatives are supported by most major system and processor vendors, bringing widespread adoption of FAM-enabled systems one step closer to reality and security concerns to the forefront. This paper discusses the challenges for adapting secure memory implementations to FAM-enabled systems for the first time in literature. Specifically, we observe that handling the security metadata used to protect fabric-attached memories needs to be done deliberately to eliminate unintentional integrity check failures and/or security vulnerabilities, caused by an inconsistent view of the shared security metadata across nodes. Our scheme, Minerva, elegantly adapts secure memory implementations to support FAM-enabled systems with negligible performance over-heads (3.8% of an ideal scheme), compared to the performance overhead (99.5% of an ideal scheme) for a scheme that uses conventional invalidation-based cache coherence to ensure the consistency of security metadata across nodes.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125441309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HRaft: Adaptive Erasure Coded Data Maintenance for Consensus in Distributed Networks HRaft:分布式网络共识的自适应Erasure编码数据维护

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/IPDPS53621.2022.00130

Yulei Jia, Guangping Xu, C. Sung, Salwa Mostafa, Yulei Wu

{"title":"HRaft: Adaptive Erasure Coded Data Maintenance for Consensus in Distributed Networks","authors":"Yulei Jia, Guangping Xu, C. Sung, Salwa Mostafa, Yulei Wu","doi":"10.1109/IPDPS53621.2022.00130","DOIUrl":"https://doi.org/10.1109/IPDPS53621.2022.00130","url":null,"abstract":"Distributed data services usually rely on consensus protocols like Paxos and Raft to provide fault-tolerance and data consistency across global and local-distributed data centers. Erasure coding replication has appealing storage and network cost saving compared with full copy replication, which helps consensus protocols achieve low latency, high fault tolerance, and high throughput for data access. Applying erasure coding in consensus protocols directly will degrade the liveness level when the number of failure servers reaches a certain level. To address the challenge, CRaft just stores full copy replication instead of erasure coding replication when the number of failed servers reaches a certain threshold. In such situation, CRaft will be downgraded sharply to the same storage and network costs as Raft. To overcome the shortcoming of CRaft, we propose a protocol, called HRaft, which can adapt the placement of data blocks in order to always have enough blocks to recover the stored value when servers fail. By replenishing some coded blocks in healthy servers instead of full copy replication, it can avoid switching to the full replication when a certain threshold on the number of failures is reached. We designed and implemented a key-value (KV) storage prototype to validate the proposed protocol and evaluate its performance. The experimental results show HRaft can significantly reduce storage and network costs and improve write performance while keeping the liveness level compared to CRaft.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121535939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

TagTree: Global Tagging Index with Efficient Querying for Time Series Databases TagTree:具有高效查询功能的时间序列数据库全局标记索引

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00127

Jin Xue, Zhiqi Wang, Tianyu Wang, Z. Shao

{"title":"TagTree: Global Tagging Index with Efficient Querying for Time Series Databases","authors":"Jin Xue, Zhiqi Wang, Tianyu Wang, Z. Shao","doi":"10.1109/ipdps53621.2022.00127","DOIUrl":"https://doi.org/10.1109/ipdps53621.2022.00127","url":null,"abstract":"Modern time series databases come with a tag-based query interface that allows users to select time series, which are essentially sequences of timestamped data values, based on a set of specific tags. A tagging index is an important component that can efficiently provide such tag-based services. However, existing methods store tag information in external databases or time-partitioned data structures, which has a negative impact on query performance. In this paper, we present a novel abstraction for efficient queries of tag information in time series databases: a hybrid tagging index that manages all tags in one place. By managing tag information globally in a single disk-based data structure, we can fundamentally relieve memory pressure and eliminate I/O overhead of duplicate metadata from existing methods. Furthermore, the tagging index is internally partitioned by time to support time range based queries and data retention which are essential to time series databases. We implement the proposed tagging index as a standalone module which can be integrated with time series storage engines. Experiments on the TSBS benchmark show our proposed method can significantly speed up queries by on average 84.0% and 87.2% compared to Prometheus (using a time-partitioned segment method) and Graphite (using an external database for tag management), respectively.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"68 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123116855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs SALoBa: gpu上快速序列对齐的最大化数据局部性和工作负载平衡

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00076

Seong-Bin Park, Hajin Kim, Tanveer Ahmad, Nauman Ahmed, Z. Al-Ars, H. P. Hofstee, Youngsok Kim, Jinho Lee

引用次数: 1

Parallel Tensor Train Rounding using Gram SVD 利用Gram SVD进行并行张量列舍入

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00095

Hussam Al Daas, Grey Ballard, Lawton Manning

引用次数: 2

Dynamic Computation Offloading for Green Things-Edge-Cloud Computing with Local Caching 动态计算卸载的绿色事物-边缘云计算与本地缓存

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00103

Xianzhong Tian, Huixiao Meng, Yanjun Li, Pingting Miao, Pengcheng Xu

{"title":"Dynamic Computation Offloading for Green Things-Edge-Cloud Computing with Local Caching","authors":"Xianzhong Tian, Huixiao Meng, Yanjun Li, Pingting Miao, Pengcheng Xu","doi":"10.1109/ipdps53621.2022.00103","DOIUrl":"https://doi.org/10.1109/ipdps53621.2022.00103","url":null,"abstract":"With the increasing popularity of the internet of things (IoT) and 5G, emerging things-edge-cloud computing (TEC) paradigm provides a flexible way for execution of delay-sensitive and computation-intensive applications running on the user equipment (UE). By offloading these workloads to the mobile edge computing (MEC) or mobile cloud computing (MCC) server, the quality of experience, e.g., the execution delay, could be greatly improved. Nevertheless, conventional battery-powered devices face the challenge of battery exhaustion for task offloading. Using renewable energy via energy harvesting (EH) technologies has become a promising way to power these devices. In this paper, we investigate a multi-user green TEC system with EH UEs, each has a task buffer with limited capacity. A joint offloading decision and resource allocation problem is formulated, which addresses the long-term average execution delay, the task dropping and the long-term average energy cost constraint. A low-complexity online algorithm is proposed leveraging Lyapunov optimization framework and matroid theory, which jointly decides the offloading decision, the MEC server CPU frequencies and the transmit power for computation offloading. A unique advantage of this algorithm is that the decisions depend only on the current system state without requiring distribution information of the arrival tasks, wireless channel state, and EH processes. The implementation of the algorithm only requires to solve a deterministic problem in each time slot. Simulation results show that our proposed algorithm makes a best trade-off between minimizing the long-term average generalized delay and satisfying the long-term average energy cost constraint. Impacts of various parameters on the delay and energy cost performance are also discussed.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122656343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HDagg: Hybrid Aggregation of Loop-carried Dependence Iterations in Sparse Matrix Computations 稀疏矩阵计算中带环依赖迭代的混合聚合

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00121

Behrooz Zarebavani, Kazem Cheshmi, Bangtian Liu, M. Strout, M. Dehnavi

{"title":"HDagg: Hybrid Aggregation of Loop-carried Dependence Iterations in Sparse Matrix Computations","authors":"Behrooz Zarebavani, Kazem Cheshmi, Bangtian Liu, M. Strout, M. Dehnavi","doi":"10.1109/ipdps53621.2022.00121","DOIUrl":"https://doi.org/10.1109/ipdps53621.2022.00121","url":null,"abstract":"This paper proposes a novel aggregation algorithm, called Hybrid DAG Aggregation (HDagg), that groups iterations of sparse matrix computations with loop carried dependence to improve their parallel execution on multicore processors. Prior approaches to optimize sparse matrix computations fail to provide an efficient balance between locality, load balance, and synchronization and are primarily optimized for codes with a tree-structure data dependence. HDagg is optimized for sparse matrix computations that their data dependence graphs (DAGs) do not have a tree structure, such as incomplete matrix factorization algorithms. It uses a hybrid approach to aggregate vertices and wavefronts in the DAG of a sparse computation to create well-balanced parallel workloads with good locality. Across three sparse kernels, triangular solver, incomplete Cholesky, and incomplete LU, HDagg outperforms existing sparse libraries such as MKL with an average speedup of 3.56× and is faster than state-of-the-art inspector-executor approaches that optimize sparse computations, i.e. DAGP, LBC, wavefront parallelism techniques, and SpMP by an average speedup of 3.87×, 3.41×, 1.95×, and 1.43× respectively.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115677706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

RLRP: High-Efficient Data Placement with Reinforcement Learning for Modern Distributed Storage Systems 基于强化学习的现代分布式存储系统的高效数据放置

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00064

Kai Lu, Nannan Zhao, Ji-guang Wan, Changhong Fei, Wei Zhao, Tongliang Deng

{"title":"RLRP: High-Efficient Data Placement with Reinforcement Learning for Modern Distributed Storage Systems","authors":"Kai Lu, Nannan Zhao, Ji-guang Wan, Changhong Fei, Wei Zhao, Tongliang Deng","doi":"10.1109/ipdps53621.2022.00064","DOIUrl":"https://doi.org/10.1109/ipdps53621.2022.00064","url":null,"abstract":"Modern distributed storage systems with massive data and storage nodes pose higher requirements to the data placement strategy. Furthermore, with emerged new storage devices, heterogeneous storage architecture has become increasingly common and popular. However, traditional strategies expose great limitations in the face of these requirements, especially do not well consider distinct characteristics of heterogeneous storage nodes yet, which will lead to suboptimal performance. In this paper, we present and evaluate the RLRP, a deep reinforcement learning (RL) based replica placement strategy. RLRP constructs placement and migration agents through the Deep-Q-Network (DQN) model to achieve fair distribution and adaptive data migration. Besides, RLRP provides optimal performance for heterogeneous environment by an attentional Long Short-term Memory (LSTM) model. Finally, RLRP adopts Stagewise Training and Model fine-tuning to accelerate the training of RL models with large-scale state and action space. RLRP is implemented on Park and the evaluation results indicate RLRP is a highly efficient data placement strategy for modern distributed storage systems. RLRP can reduce read latency by 10%∼50% in heterogeneous environment compared with existing strategies. In addition, RLRP is used in the real-world system Ceph, which improves the read performance of Ceph by 30%∼40%.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114662767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Task-based Acceleration of Bidirectional Recurrent Neural Networks on Multi-core Architectures 基于任务的多核结构双向递归神经网络加速

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00096

Robin Kumar Sharma, Marc Casas

引用次数: 0

Coloring the Vertices of 9-pt and 27-pt Stencils with Intervals 用间隔给9-pt和27-pt模板的顶点上色

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI: 10.1109/ipdps53621.2022.00098

Dante Durrman, Erik Saule

{"title":"Coloring the Vertices of 9-pt and 27-pt Stencils with Intervals","authors":"Dante Durrman, Erik Saule","doi":"10.1109/ipdps53621.2022.00098","DOIUrl":"https://doi.org/10.1109/ipdps53621.2022.00098","url":null,"abstract":"Graph coloring is commonly used to schedule computations on parallel systems. Given a good estimation of the computational requirement for each task, one can refine the model by adding a weight to each vertex. Instead of coloring each vertex with a single color, the problem is to color each vertex with an interval of colors. In this paper, we are interested in studying this problem for particular classes of graphs, namely stencil graphs. Stencil graphs appear naturally in the parallelisation of applications where the location of an object in a space affects the state of neighboring objects. Rectilinear decompositions of a space generate conflict graphs that are 9-pt stencils for 2D problems and 27-pt stencils for 3D problems. We show that the 5-pt stencil and 7-pt stencil relaxations of the problem can be solved in polynomial time. We prove that the decision problem on 27-pt stencil is NP-Complete. We discuss approximation algorithms with a ratio of 2 for the 9-pt stencil case, and 4 for the 27-pt stencil case. We identify two lower bounds for the problem that are used to design heuristics. We evaluate the effectiveness of several different algorithms experimentally on a set of real instances. Furthermore, these algorithms are integrated into a real application to demonstrate the soundness of the approach.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121635889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1