Young Choon Lee, Albert Y. Zomaya, Mazin S. Yousif
{"title":"Reliable workflow execution in distributed systems for cost efficiency","authors":"Young Choon Lee, Albert Y. Zomaya, Mazin S. Yousif","doi":"10.1109/GRID.2010.5697959","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697959","url":null,"abstract":"Reliability is of great practical importance in distributed computing systems (DCSs) due to its immediate impact on system performance, i.e., quality of service. The issue of reliability becomes more crucial particularly for ‘cost-conscious’ DCSs like grids and clouds. Unreliability brings about additional—often excessive—capital and operating costs. Resource failures are considered as the main source of unreliability in this study. In this study, we investigate the reliability of workflow execution in the context of scheduling and its effect on operating costs in DCSs, and present the reliability for profit assurance (RPA) algorithm as a novel workflow scheduling heuristic. The proposed RPA algorithm incorporates a (operating) cost-aware replication scheme to increase reliability. The incorporation of cost awareness greatly contributes to efficient replication decisions in terms of profitability. To the best of our knowledge, the work in this paper is the first attempt to explicitly take into account (monetary) reliability cost in workflow scheduling.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76734894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metrics and task scheduling policies for energy saving in multicore computers","authors":"J. Mair, K. Leung, Z. Huang","doi":"10.1109/GRID.2010.5697984","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697984","url":null,"abstract":"In this paper, we have proposed three new metrics, Speedup per Watt (SPW), Power per Speedup (PPS) and Energy per Target (EPT), to guide task schedulers to select the best task schedules for energy saving in multicore computers. Based on these metrics, we have proposed the novel Sharing Policies, the Hare and the Tortoise Policies, which have taken into account parallelism and Dynamic Voltage Frequency Scaling (DVFS) in their schedules. Our experiments show that, on a modern multicore computer, the Hare Policy can save energy up to 72% in a system with low utilization. On a busier system the Sharing Policy can make a saving up to 20% of energy over standard scheduling policies.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84519553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wolfgang Fenz, J. Dirnberger, C. Watzl, M. Krieger
{"title":"Parallel simulation and visualization of blood flow in intracranial aneurysms","authors":"Wolfgang Fenz, J. Dirnberger, C. Watzl, M. Krieger","doi":"10.1109/GRID.2010.5697965","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697965","url":null,"abstract":"Our aim is to develop a physically correct simulation of blood flow through intracranial aneurysms. It shall provide means to estimate rupture risks by calculating the distribution of pressure and shear stresses in an intracranial aneurysm, in order to support the planning of clinical interventions. Due to the time-critical nature of the application, we are forced to use the most efficient state-of-the-art numerical methods and technologies together with high performance computing (HPC) infrastructures. The Navier-Stokes equations for the blood flow are discretized via the finite element method (FEM), and the resulting linear equation systems are handled by an algebraic multigrid (AMG) solver. First comparisons of our simulation results with commercial CFD (computational fluid dynamics) software already show good medical relevance for diagnostic decision support. Another challenge is the visualization of our simulation results at acceptable interaction response rates. Physicians require quick and highly interactive visualization of velocity, pressure and stress to be able to assess the rupture risk of an individual vessel morphology. To meet these demands, parallel visualization techniques and high performance computing resources are utilized. In order to provide physicians with access to remote HPC resources which are not available at every hospital, computing infrastructure of the Austrian Grid is utilized for simulation and visualization.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80435245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SLA compliance monitoring through semantic processing","authors":"L. Coppolino, D. Mari, L. Romano, V. Vianello","doi":"10.1109/GRID.2010.5697975","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697975","url":null,"abstract":"For IT-services providers, user satisfaction is the key for their company's success. Service providers need to understand the requirements of their users and translate them into their own business goals. Service malfunctions could have negative impact on user satisfaction, therefore to detect and resolve failures of the business process level has become a mission critical requirement for any IT-company. Unfortunately, even if a failure manifests itself at the business level, the data describing this failure are scattered into low level components of the system and stored with a formalism incomprehensible to any business analyst. In forensic analysis, the semantic gap between collected data and business analysts' knowledge is closed by the adoption of data-mining and data-warehousing techniques, but such techniques are unsuitable for real-time business process analysis due to their long latencies. The purpose of this paper is to present a framework that allows business process analysts investigating the delivery status of business services in near real-time. The framework requires a first set up phase where domain specialists define ontologies describing low level concepts and the mapping among business events and data gathered into the system, and then it provides business process analysts, aware only of business logics, with a way to investigate service delivery status in near real time. The capability of the framework of processing data in near real time is ensured by the use of emerging technologies such as complex event processing (CEP) engines, which are able to process in real time huge amount of data. Furthermore in the paper, it is also showed a case study from the telecommunication industry aiming to demonstrate the applicability of the framework in a real word scenario.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73361541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting multi-row distributed transactions with global snapshot isolation using bare-bones HBase","authors":"Chen Zhang, H. Sterck","doi":"10.1109/GRID.2010.5697970","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697970","url":null,"abstract":"Snapshot isolation (SI) is an important database transactional isolation level adopted by major database management systems (DBMS). Until now, there is no solution for any traditional DBMS to be easily replicated with global SI for distributed transactions in cloud computing environments. HBase is a column-oriented data store for Hadoop that has been proven to scale and perform well on clouds. HBase features random access performance on par with open source DBMS such as MySQL. However, HBase only provides single atomic row writes based on row locks and very limited transactional support. In this paper, we show how multi-row distributed transactions with global SI guarantee can be easily supported by using bare-bones HBase with its default configuration so that the high throughput, scalability, fault tolerance, access transparency and easy deployability properties of HBase can be inherited. Through performance studies, we quantify the cost of adopting our technique. The contribution of this paper is that we provide a novel approach to use HBase as a cloud database solution with global SI at low added cost. Our approach can be easily extended to other column-oriented data stores.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79738367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Connecting arbitrary data resources to the grid","authors":"Shunde Zhang, P. Coddington, A. Wendelborn","doi":"10.1109/GRID.2010.5697958","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697958","url":null,"abstract":"Many scientific grid systems have been running and serving researchers for many years around the world. Among them, Globus Toolkit and its variants are playing an important role as the basis of most of those existing grid systems. However, the way data is stored and accessed varies. Proprietary protocols have been designed and developed to serve data by different storage systems or file systems. One example is the integrated Rule Oriented Data System (iRODS), which is a data grid system with the non-standard iRODS protocol and has its own client tools and API. Consequently, it is difficult for the grid to connect to it directly and stage data to computers in the grid for processing. It is usually an ad hoc process to transfer data between two data systems with different protocols. In addition, existing data transfer services are mostly designed for the grid and do not understand proprietary protocols. This requires users to transfer data from the source to a temporary space, and then transfer it from the temporary space to the destination, which is complex, inefficient and error-prone. Some work has been done on the client side to address this issue. In order to address the issues of data staging and data transfer in one solution, this paper describes a different but easy and generic approach to connect any data systems to the grid, by providing a service with an abstract framework to convert any underlying data system protocol to the GridFTP protocol, a de facto standard of data transfer for the grid.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81052056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Yigitbasi, M. Gallet, Derrick Kondo, A. Iosup, D. Epema
{"title":"Analysis and modeling of time-correlated failures in large-scale distributed systems","authors":"N. Yigitbasi, M. Gallet, Derrick Kondo, A. Iosup, D. Epema","doi":"10.1109/GRID.2010.5697961","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697961","url":null,"abstract":"The analysis and modeling of the failures bound to occur in today's large-scale production systems is invaluable in providing the understanding needed to make these systems fault-tolerant yet efficient. Many previous studies have modeled failures without taking into account the time-varying behavior of failures, under the assumption that failures are identically, but independently distributed. However, the presence of time correlations between failures (such as peak periods with increased failure rate) refutes this assumption and can have a significant impact on the effectiveness of fault-tolerance mechanisms. For example, the performance of a proactive fault-tolerance mechanism is more effective if the failures are periodic or predictable; similarly, the performance of checkpointing, redundancy, and scheduling solutions depends on the frequency of failures. In this study we analyze and model the time-varying behavior of failures in large-scale distributed systems. Our study is based on nineteen failure traces obtained from (mostly) production large-scale distributed systems, including grids, P2P systems, DNS servers, web servers, and desktop grids. We first investigate the time correlation of failures, and find that many of the studied traces exhibit strong daily patterns and high autocorrelation. Then, we derive a model that focuses on the peak failure periods occurring in real large-scale distributed systems. Our model characterizes the duration of peaks, the peak inter-arrival time, the inter-arrival time of failures during the peaks, and the duration of failures during peaks; we determine for each the best-fitting probability distribution from a set of several candidate distributions, and present the parameters of the (best) fit. Last, we validate our model against the nineteen real failure traces, and find that the failures it characterizes are responsible on average for over 50% and up to 95% of the downtime of these systems.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84572718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost-efficient hosting and load balancing of Massively Multiplayer Online Games","authors":"Vlad Nae, R. Prodan, T. Fahringer","doi":"10.1109/GRID.2010.5697956","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697956","url":null,"abstract":"Massively Multiplayer Online Games (MMOG) are a class of computationally-intensive client-server applications with severe real-time Quality of Service (QoS) requirements, such as the number of updates per second each client needs to receive from the servers for a fluent and realistic experience. To guarantee the QoS requirements, game providers over-provision to game sessions a large amount of their resources, which is very inefficient and prohibits any but the largest providers from joining the market. In this paper, we present a new approach for cost-efficient hosting of MMOG sessions on Cloud resources, provisioned on-demand in the correct amount based on the current number of connected players. Simulation results on real MMOG traces demonstrate that compute Clouds can reduce the hosting costs by a factor between two and five. The resource allocation is driven by a load balancing algorithm that appropriately distributes the load such that the QoS requirements are fulfilled at all times. Experimental results on a fast-paced game demonstrator executed on resources owned by a specialised hosting company demonstrate that our algorithm is able to adjust the number of game servers and load distribution to the highly dynamic client load, while maintaining the QoS in 99.34% of the monitored events.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82605068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methodology of measurement for energy consumption of applications","authors":"Georges Da Costa, H. Hlavacs","doi":"10.1109/GRID.2010.5697987","DOIUrl":"https://doi.org/10.1109/GRID.2010.5697987","url":null,"abstract":"For IT systems, energy awareness can be improved in two ways, (i) in a static or (ii) in a dynamic way. The first way leads to building energy efficient hardware that runs fast and consumes only a few watts. The second way consists of reacting to instantaneous power consumption, and of taking decisions that will reduce this consumption.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84434392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of virtual machine granularity on cloud computing workloads performance","authors":"Ping Wang, Wei Huang, Carlos A. Varela","doi":"10.1109/GRID.2010.5698018","DOIUrl":"https://doi.org/10.1109/GRID.2010.5698018","url":null,"abstract":"This paper studies the impact of VM granularity on workload performance in cloud computing environments. We use HPL as a representative tightly coupled computational workload and a web server providing content to customers as a representative loosely coupled network intensive workload. The performance evaluation demonstrates VM granularity has a significant impact on the performance of the computational workload. On an 8-CPU machine, the performance obtained from utilizing 8VMs is more than 4 times higher than that given by 4 or 16 VMs for HPL of problem size 4096; whereas on two machines with a total of 12 CPUs 24 VMs gives the best performance for HPL of problem sizes from 256 to 1024. Our results also indicate that the effect of VM granularity on the performance of the web system is not critical. The largest standard deviation of the transaction rates obtained from varying VM granularity is merely 2.89 with a mean value of 21.34. These observations suggest that VM malleability strategies where VM granularity is changed dynamically, can be used to improve the performance of tightly coupled computational workloads, whereas VM consolidation for energy savings can be more effectively applied to loosely coupled network intensive workloads.","PeriodicalId":6372,"journal":{"name":"2010 11th IEEE/ACM International Conference on Grid Computing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80734362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}