{"title":"DCS5: Diagonal Coding Scheme for Enhancing the Endurance of SSD-Based RAID-5 Systems","authors":"Yubiao Pan, Yongkun Li, Yinlong Xu, Weitao Zhang","doi":"10.1109/NAS.2014.16","DOIUrl":"https://doi.org/10.1109/NAS.2014.16","url":null,"abstract":"Solid-state drives (SSDs) have been widely deployed in large-scale storage systems. To guarantee high reliability for SSD-based storage systems, it still requires data redundancy schemes, e.g., RAID schemes. Traditional RAID-5 shows its benefits in load-balancing and I/O parallelism, and so it is still the first choice for enhancing the reliability of SSD RAID arrays. However, some SSDs under the RAID-5 configuration may age much faster than others because of the non-uniformity of workloads, which makes them be worn out very quickly and so decreases the endurance of SSD-based RAID arrays. To address this problem, we develop a diagonal coding scheme, DCS5, to improve the wear-leveling among devices in an SSD-based RAID-5 array. DCS5 can efficiently improve the array endurance if accesses are aligned with the stripe size, i.e., When data symbols in the same stripe receive the same number of writes, while the number could be different for different stripes. To relax the above assumption, we further propose an enhanced scheme which is called as DCS5+. DCS5+ can improve the wear-leveling among devices under general access patterns via triggering different responses to different kinds of requests. We conduct extensive trace-driven evaluations based on real-world workloads, and results show that our coding scheme efficiently enhances the endurance of SSD-based RAID-5 arrays.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127583618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MCRTREE: A Mutually Cooperative Recovery Scheme for Multiple Losses in Distributed Storage Systems Based on Tree Structure","authors":"Xiaoqiang Pei, Yijie Wang, Xingkong Ma, Yongquan Fu, Fangliang Xu","doi":"10.1109/NAS.2014.33","DOIUrl":"https://doi.org/10.1109/NAS.2014.33","url":null,"abstract":"To guarantee the reliability of distributed storage systems, erasure coding, as a redundant scheme, has received increasingly attention because it can greatly improve the space efficiency compared with the replica schemes. However, it takes a long time and consumes a lot of network bandwidth for erasure coding to repair the lost data on failed nodes. The state-of-art studies focus on the repairing optimization for the single-node-failure context. Real-world experiments have clearly shown that multi-node failures indeed happen in cloud storage systems. Borrowing single-node repairing techniques to the multi-node setting faces challenges on the efficiency. We propose a mutually cooperative recovery scheme MCRTREE based on the tree structure for multiple node failures. MCRTREE improves the bandwidth utilization and reduces the repair time by the construction of regeneration trees between each new node (denoted as newcomers) and alive nodes (denoted as providers). Further, MCRTREE reduces the size of the data volumes to be transmitted for the repair process. Numerical experiments show that MCRTREE consumes less storage cost and the maintenance bandwidth compared with other redundancy recovery schemes. Trace-driven simulation results reveal that the MCRTREE reduces the regeneration time by 30% - 50%, improves the successful regeneration probability by 10% - 20% and the data availability by 10% - 20% compared with the typical repair schemes.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"144 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128783852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Enhanced Kad Protocol Resistant to Eclipse Attacks","authors":"Qiang Li, Jie Yu, Zhoujun Li","doi":"10.1109/NAS.2014.19","DOIUrl":"https://doi.org/10.1109/NAS.2014.19","url":null,"abstract":"Kad is a P2P protocol which has about 1 million concurrent online users. The eclipse attack is one of the most severe threats in Kad. In this paper, we propose a distributed verification approach to defend against the eclipse attack in Kad. Previous works mostly concentrate on ID generation or secure routing algorithm. Our approach utilizes many benign peers to prove that the storage peer is valid. The attacker has to provide massive malicious hosts and IP addresses to break our defense. In contrast, it is hard for the attacker to get these resources. Moreover, our solution could be applied to the open-source software and centralized services are not needed in our system. Simulation results show that the attacker has to get 1000 IP addresses to launch the attack successfully.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123841346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rong Ge, Xizhou Feng, Martin Burtscher, Ziliang Zong
{"title":"Performance and Energy Modeling for Cooperative Hybrid Computing","authors":"Rong Ge, Xizhou Feng, Martin Burtscher, Ziliang Zong","doi":"10.1109/NAS.2014.42","DOIUrl":"https://doi.org/10.1109/NAS.2014.42","url":null,"abstract":"Accelerator-based heterogeneous systems can provide high performance and energy efficiency, both of which are key design goals in high performance computing. To fully realize the potential of heterogeneous architectures, software must optimally exploit the hosts' and accelerators' processing and power-saving capabilities. Yet, previous studies mainly focus on using hosts and accelerators to boost application performance. Power-saving features to improve the energy efficiency of parallel programs, such as Dynamic Voltage and Frequency Scaling (DVFS), remain largely unexplored. Recognizing that energy efficiency is a different objective than performance and should therefore be independently pursued, we study how to judiciously distribute computation between hosts and accelerators for energy optimization. We further explore energy-saving scheduling in combination with computation distribution for even larger gains. Moreover, we present PEACH, an analytical model for Performance and Energy Aware Cooperative Hybrid computing. With just a few system- and application-dependent parameters, PEACH accurately captures the performance and energy impact of computation distribution and energy-saving scheduling to quickly identify the optimal coupled strategy for achieving the best performance or the lowest energy consumption. PEACH thus eliminates the need for extensive profiling and measurement. Experimental results from two GPU-accelerated heterogeneous systems show that PEACH predicts the performance and energy of the studied codes with less than 3% error and successfully identifies the optimal strategy for a given objective.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121176306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SFTB: Scheduling Flows to Quickly Fit Traffic Burst in Data Center Networks: A Traffic Balancing Framework Based on Congestion Feedback","authors":"G. Deng, Z. Gong, Hong Wang","doi":"10.1109/NAS.2014.10","DOIUrl":"https://doi.org/10.1109/NAS.2014.10","url":null,"abstract":"A modern data center may host tens of thousands of machines, mixing with hundreds of thousands flows. Large number of flows concurrently traverse in a data center network may frequently cause traffic burst and unbalance, which may further induce congestion, packet losing, and therefore low efficiency. Generally speaking, there are three large families of scheduling algorithms to address this problem: (i) stochastic load balancing, (ii) optimizing traffic distribution by VM migration, (iii) scheduling flows to spread in different paths. But all of them suffer from some shortages. Stochastic load balancing is load-agnostic, it may encounter local congestion, and VM migration is time-consuming, making it unable to fit burst quickly. Meanwhile, because the number of concurrent flows is large, the third category may also fall into inefficiency. An alternative approach is only scheduling those large flows, but usually we don't know how large it is until a flow has finished. We propose SFTB, a flow scheduling framework to quickly fit the traffic burst in data center networks. In fact, SFTB belongs to the third family, but by intelligently leveraging the Explicit Congestion Notification (ECN), SFTB can quickly respond to the burst and congestion. Especially, STFB performs in a fully distributed manner and require no traffic matrix information, making it suitable for any traffic patterns. We evaluate SFTB via large scale simulation. The results show that our method outperform single path TCP and ECMP from more than 10% to more than 70% in average throughput.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125053422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fema: A Fairness and Efficiency Caching Management Algorithm in Shared Cache","authors":"Yong Li, D. Feng, Lingfang Zeng, Zhan Shi","doi":"10.1109/NAS.2014.14","DOIUrl":"https://doi.org/10.1109/NAS.2014.14","url":null,"abstract":"This paper is motivated by our three key observations: (1) there exists a degradation of performance as the interleaved accesses of heterogeneous streams, (2) for the slow stream, sequential accesses suffer huge misses in the prefetching cache, (3) in concurrence paradigm, providing fairness and QoS to concurrent streams is very important which always ignored by the traditional prefetching algorithms. Therefore, we present Fema, a caching management algorithm that enforces the fairness and efficiency for concurrent heterogeneous streams. Fema focuses on three key designs: (1) An adaptive framework (Fema Ada) for prefetching. In the Fema Ada, we propose a rate-aware adjustment of prefetching degree and analysis the optimal partition size. (2) A novel replacement scheme (Fema Rep) in which the accessed data will be firstly evicted to improve the performance. (3) A round robin allocation scheme (Fema Rou) to achieve fairness while as least performance degradation as possible. Results show that Fema is able to achieve averages 81.4% performance improvement over the LRU algorithm, 53.5% over the default Linux Kernel prefetching (LKP) algorithm and 19.0% over the recently proposed practical AMP (adaptive multi-stream prefetching) algorithm. Fema achieves average 74.2% fairness improvement (metric in fair speedup) over the LKP algorithm and 56.5% over the AMP algorithm.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"72 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120982546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"iCHAT: Inter-cache Hardware-Assistant Data Transfer for Heterogeneous Chip Multiprocessors","authors":"Junli Gu, Bradford M. Beckmann, Ting Cao, Yu Hu","doi":"10.1109/NAS.2014.43","DOIUrl":"https://doi.org/10.1109/NAS.2014.43","url":null,"abstract":"Modern heterogeneous multiprocessors integrate CPU and GPU together to provide a boost to computational performance. Data sharing and communication between CPU and GPU has been a critical issue for the final speedup. With tighter integration of CPU and GPU, it has the advantage of sharing and moving data more efficiently in order to leverage the computational power that a GPU can provide. Initially, DMA or PCIe devices were used to transfer data between CPU and GPU with low efficiency and little flexibility. Recently a single address space and coherent cache hierarchies are being adopted in heterogeneous architectures to share data more efficiently. Thus it poses new challenge to understand the communication overheads in this new context and to improve communication efficiencies for these architectures. This paper proposes a novel approach called iCHAT (inter-Cache Hardware-Assistant data Transfer) to manage data transfer between the CPU cache and the GPU cache efficiently. The iCHAT technique proposed in this paper detects the communication patterns and eagerly evicts data from the owner's caches and prepares for the requestor's demand. We implement the iCHAT design in a simulator based on gem5 and an AMD in-house GPU simulator. Experimental results show that the communication related eviction traffic is reduced by an average of 40% and the total directory traffic is reduced by 8% on average. We implement a bounding experiment that provides a quantitative evaluation of inter CPU-GPU transfers and requests to communication data, which indicates that iCHAT could achieve on average 1.4x speedup for Rodinia benchmark suite and 1.2x speedup for AMD SDK APPs.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121362038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaoning Zhang, Kui Wu, Huiba Li, Jinghua Feng, Yuxing Peng, Xicheng Lu
{"title":"RAFlow: Read Ahead Accelerated I/O Flow through Multiple Virtual Layers","authors":"Zhaoning Zhang, Kui Wu, Huiba Li, Jinghua Feng, Yuxing Peng, Xicheng Lu","doi":"10.1109/NAS.2014.13","DOIUrl":"https://doi.org/10.1109/NAS.2014.13","url":null,"abstract":"Virtualization is the foundation for cloud computing, and the virtualization can not be achieved without software defined, elastic, flexible and scalable virtual layers. Unfortunately, if multiple virtual storage devices are chained together, the system may be subject to severe performance degradation. While the read-ahead (RA) mechanism in storage devices plays a very important role to improve I/O performance, RA may not be effective as expected for multiple virtualization layers, since it is originally designed for one layer only. When I/O requests are passed through a long I/O path, they may trigger a chain reaction and lead to unnecessary data transmission and thus bandwidth waste. In this paper, we study the dynamic behavior of RA through multiple I/O layers and demonstrate that if controlled well, RA can greatly accelerate I/O speed. We present RAFlow, a RA control mechanism, to effectively improve I/O performance by strategically expanding RA window at each layer. Our real-world experiments show that it can achieve 20% to 50% performance improvement in I/O paths with up to 8 virtualized storage devices.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123008890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Distance-Based Scheme for Broadcast Storm Suppression in VANETs","authors":"Pei-Hsuan Lee, Tsung-Chuan Huang","doi":"10.1109/NAS.2014.38","DOIUrl":"https://doi.org/10.1109/NAS.2014.38","url":null,"abstract":"Broadcasting is a common operation for disseminating traffic-related information in vehicular ad hoc networks. However, broadcasting in wireless networks can easily cause the broadcast storm problem especially when the vehicular density is high in the specific area. Therefore, most of the broadcast storm suppression schemes aim to decrease the number of forwarders so as to reduce the redundant packet and mitigate the broadcast storm problem. One of the broadcast storm suppression techniques is the distance-based scheme [14] in which the distance between the sender and the receiver is used to decide whether to rebroadcast a message or not. However, this conventional distance-based scheme may cause the Improper Measurement Problem called in this paper. So we propose N Hops Weighted p-Persistence Broadcasting to resolve this problem. The simulation results show the proposed broadcast storm suppression scheme can reduce the number of forwarders and thus the redundant packets.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128003113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DR-SNBot: A Social Network-Based Botnet with Strong Destroy-Resistance","authors":"Tao Yin, Yongzheng Zhang, Shuhao Li","doi":"10.1109/NAS.2014.37","DOIUrl":"https://doi.org/10.1109/NAS.2014.37","url":null,"abstract":"Social network-based botnets have become an important research direction of botnets. To avoid the single-point failure of existing centralized botnets, we propose a Social Network-based Botnet with strong Destroy-Resistance (DR-SNBot). By enhancing the security of the Command and Control (C&C) channel and introducing a divide-and-conquer and automatic reconstruction mechanism, we greatly improve the destroy-resistance of DR-SNBot. Moreover, we design the pseudo code for nickname generation algorithm, botmaster and bot respectively. Then, we construct the DR-SNBot via sin a blog and make simulated experiments to evaluate it. Furthermore, we make comparisons of controllability between botnets Mrrbot and DR-SNBot. The experimental results indicate that DRSNBot is more resilient. It is not only available in real-world environment, but also resistant enough to varying degrees of C&C-server removals in simulated environment.","PeriodicalId":186621,"journal":{"name":"2014 9th IEEE International Conference on Networking, Architecture, and Storage","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131238969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}