{"title":"On Optimal and Balanced Sparse Matrix Partitioning Problems","authors":"Anaël Grandjean, J. Langguth, B. Uçar","doi":"10.1109/CLUSTER.2012.77","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.77","url":null,"abstract":"We investigate one dimensional partitioning of sparse matrices under a given ordering of the rows/columns. The partitioning constraint is to have load balance across processors when different parts are assigned to different processors. The load is defined as the number of rows, or columns, or the nonzeros assigned to a processor. The partitioning objective is to optimize different functions, including the well-known total communication volume arising in a distributed memory implementation of parallel sparse matrix-vector multiplication operations. The difference between our problem in this work and the general sparse matrix partitioning problem is that the parts should correspond to disjoint intervals of the given order. Whereas the partitioning problem without the interval constraint corresponds to the NP-complete hyper graph partitioning problem, the restricted problem corresponds to a polynomial-time solvable variant of the hyper graph partitioning problem. We adapt an existing dynamic programming algorithm designed for graphs to solve two related partitioning problems in graphs. We then propose graph models for a given hyper graph and a partitioning objective function so that the standard cut size definition in the graph model exactly corresponds to the hyper graph partitioning objective function. In extensive experiments, we show that our proposed algorithm is helpful in practice. It even demonstrates performance superior to the standard hyper graph partitioners when the number of parts is high.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114661377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Bautista-Gomez, Thomas Ropars, N. Maruyama, F. Cappello, S. Matsuoka
{"title":"Hierarchical Clustering Strategies for Fault Tolerance in Large Scale HPC Systems","authors":"L. Bautista-Gomez, Thomas Ropars, N. Maruyama, F. Cappello, S. Matsuoka","doi":"10.1109/CLUSTER.2012.71","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.71","url":null,"abstract":"Future high performance computing systems will need to use novel techniques to allow scientific applications to progress despite frequent failures. Checkpoint-Restart is currently the most popular way to mitigate the impact of failures during long-running executions. Different techniques try to reduce the cost of Checkpoint-Restart, some of them such as local check pointing and erasure codes aim to reduce the time to checkpoint while others such as uncoordinated checkpoint and message-logging aim to decrease the cost of recovery. In this paper, we study how to combine all these techniques together in order to optimize both: check pointing and recovery. We present several clustering and topology challenges that lead us to an optimization problem in a four-dimensional space: reliability level, recovery cost, encoding time and message logging overhead. We propose a novel clustering method inspired from brain topology studies in neuroscience and evaluate it with a Tsunami simulation application in TSUBAME2. Our evaluation with 1024 processes shows that our novel clustering method can guarantee good performance for all of the four mentioned dimensions of our optimization problem.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122161287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gangyong Jia, Xi Li, Chao Wang, Xuehai Zhou, Zongwei Zhu
{"title":"Memory Affinity: Balancing Performance, Power, Thermal and Fairness for Multi-core Systems","authors":"Gangyong Jia, Xi Li, Chao Wang, Xuehai Zhou, Zongwei Zhu","doi":"10.1109/CLUSTER.2012.33","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.33","url":null,"abstract":"Main memory is expected to grow significantly in both speed and capacity for it is a major shared resource among cores in a multi-core system, which will lead to increasing power consumption. Therefore, it is critical to address the power issue without seriously decreasing performance in the memory subsystem. In this paper, we firstly propose memory affinity which retains the active and low power memory ranks as long as possible to avoid frequently switching between active and low power status, and then present a memory affinity aware scheduling (MAS) to balance performance, power, thermal and fairness for multi-core systems. Experimental results demonstrate our memory affinity aware scheduling algorithms well adapt to system loading to maximize power saving and avoid memory hotspot at the same time while sustaining the system bandwidth demand and preserving fairness among threads.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128422750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Network Forecasting Using SimGrid Simulations","authors":"Matthieu Imbert, E. Caron","doi":"10.1109/CLUSTER.2012.40","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.40","url":null,"abstract":"To be able to efficiently schedule network transfers on computing platforms such as clusters, grids or clouds, accurate and timely predictions of network transfers completion times are needed. We designed a new metrology and performance prediction framework called Pilgrim which offers a service predicting the completion times of current and concurrent TCP transfers. We describe Pilgrim and show some experimental results comparing the predictions to the real transfer completion times.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123531405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Replication Based QoS Framework for Flash Arrays","authors":"Nihat Altiparmak, A. Tosun","doi":"10.1109/CLUSTER.2012.53","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.53","url":null,"abstract":"The increasing popularity of the storage cloud is leading organizations to move their applications and enterprise data into the cloud. It is desirable to move time-critical applications demanding high performance I/O operations. Flash based storage arrays have emerged to address the high performance I/O requirements, however, providing predictable Quality of Service (QoS) for applications with real time data requirements is a challenging open problem. This paper introduces a QoS framework for flash based storage arrays. Our framework provides deterministic and statistical response time guarantees through a combination of techniques including replication, data mining, and online retrieval. We evaluated the framework using synthetic and real-world traces. The QoS performance of the system is compared to the existing high-throughput RAID designs. Numerical results show that under the synthetic traces, QoS performance of the proposed system outperforms the existing high performance RAID designs. Real world traces indicate that the proposed QoS mechanism is tunable to support the guarantees required by various real world applications.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132350672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synergy: A Middleware for Energy Conservation in Mobile Devices","authors":"Harshit Kharbanda, Manoj Krishnan, R. Campbell","doi":"10.1109/CLUSTER.2012.64","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.64","url":null,"abstract":"The combined effect of Moore's law and the failure of Den nard scaling have led to multi-core mobile devices with immense computation capabilities. The biggest limitation of the computation capability for any mobile device is its battery. Mobile cloud computing is used to offload compute intensive tasks that affect a mobile device's battery. Mobile ad-hoc computing can be used as an alternative to mobile cloud computing in cases where cloud access is not available or is inhibitive to application performance, although battery drain remains a critical argument against mobile ad-hoc computing. In this paper, we present Synergy, a middleware that increases the battery life for a system of mobile devices connected in a peer-to-peer ad-hoc network. Synergy conserves energy by scaling core frequencies and by intelligently distributing the computation among peer devices. The middleware is not restricted to mobile phones and in no way restricts the mobility of the devices. Synergy considers the mobile devices connected in a peer-to-peer fashion as a single multicore device with Wifi as the interconnect. With Synergy running on Google Nexus phones we were able to conserve up to 30.6% of the system battery while incurring a latency penalty of less than 5%.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132425901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiulei Qin, Wen-bo Zhang, Wei Wang, Jun Wei, Xin Zhao, Tao Huang
{"title":"Towards a Cost-Aware Data Migration Approach for Key-Value Stores","authors":"Xiulei Qin, Wen-bo Zhang, Wei Wang, Jun Wei, Xin Zhao, Tao Huang","doi":"10.1109/CLUSTER.2012.14","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.14","url":null,"abstract":"Live data migration is an important technique for key-value stores. However, due to the stateful feature, new virtualization technology, stringent low latency requirements and unexpected workload changes, key-value stores deployed in cloud environment have to face new challenges for data migration: effects of VM interference, and the need to trade off between the two ingredients of migration cost, say migration time and performance impact. To address these challenges, we focus on the data migration problem in a load rebalancing scenario and build a new framework that aims to rebalance load while minimizing migration costs. We build two interference-aware prediction models to predict the migration time and performance impact for each action using statistical machine learning and then create a cost model to strike a right balance between the two ingredients of cost. A cost-aware migration algorithm is designed to utilize the cost model and balance rate to guide the choice of possible migration actions. We demonstrate the effectiveness of the data migration approach as well as the cost model and two prediction models using YCSB.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133946686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Built-in Device Simulator for OS Performance Evaluation","authors":"Junjie Mao, Yu Chen, Yaozu Dong","doi":"10.1109/CLUSTER.2012.30","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.30","url":null,"abstract":"I/O devices are evolving rapidly, while OS optimization is always slower because of its dependence on physical devices. This inevitably prevents latest devices from working with their rating performance, which remains a big problem for performance-critical applications. Though I/O device simulators can help carry out performance evaluation before physical devices are ready, the existing simulator implementations are still unsatisfactory, either having too big overhead or requiring too much extra work. In this paper, we propose kernel built-in device simulation to provide accurate real time evaluations with acceptable extra effort. With the work of simulation well isolated, the overhead is reasonable compared to native environment. A bonding Ethernet interface is implemented in this way and experiments on it confirm the close-to-native performance of the idea.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130879221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BWCC: A FS-Cache Based Cooperative Caching System for Network Storage System","authors":"Liu Shi, Zhenjun Liu, Lu Xu","doi":"10.1109/CLUSTER.2012.41","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.41","url":null,"abstract":"A cooperative caching system, using disks as its cache media, is proposed for network storage system. This system is called Blue Whale Cooperative Caching System (BWCC). Through sharing the cached data among clients of cluster file system, the load on the centralized storage server is lowered, therefore, the BWCC significantly enhances the scalability of the network storage system. The advantages of BWCC are as follows: 1) direct data positioning technology without the participation of the centralized storage server guarantees low latency of cooperative data access, 2) supporting several granularities of data sharing is applicable to multiple data access patterns, 3) a global cache management strategy for Video-on-Demand service is designed according to the characteristics of the video data access pattern. BWCC has been implemented as a module in Linux Kernel-2.6.32. The preliminary experimental results verify the effectiveness of BWCC.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130842019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trong-Tuan Vu, B. Derbel, Ali Asim, A. Bendjoudi, N. Melab
{"title":"Overlay-Centric Load Balancing: Applications to UTS and B&B","authors":"Trong-Tuan Vu, B. Derbel, Ali Asim, A. Bendjoudi, N. Melab","doi":"10.1109/CLUSTER.2012.17","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.17","url":null,"abstract":"To deal with dynamic load balancing in large scale distributed systems, we propose to organize computing resources following a logical peer-to-peer overlay and to distribute the load according to the so-defined overlay. We use a tree as a logical structure connecting distributed nodes and we balance the load according to the size of induced sub trees. We conduct extensive experiments involving up to 1000 computing cores and provide a throughout analysis of different properties of our generic approach for two different applications, namely, the standard Unbalanced Tree Search and the more challenging parallel Branch-and-Bound algorithm. Substantial improvements are reported in comparison with the classical random work stealing and two finely tuned application specific strategies taken from the literature.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"45 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133686041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}