2011 IEEE/ACM 12th International Conference on Grid Computing最新文献_第3页

MARIANE: MApReduce Implementation Adapted for HPC Environments MARIANE:适用于高性能计算环境的MApReduce实现

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/Grid.2011.20

Zacharia Fadika, Elif Dede, M. Govindaraju, L. Ramakrishnan

{"title":"MARIANE: MApReduce Implementation Adapted for HPC Environments","authors":"Zacharia Fadika, Elif Dede, M. Govindaraju, L. Ramakrishnan","doi":"10.1109/Grid.2011.20","DOIUrl":"https://doi.org/10.1109/Grid.2011.20","url":null,"abstract":"MapReduce is increasingly becoming a popular framework, and a potent programming model. The most popular open source implementation of MapReduce, Hadoop, is based on the Hadoop Distributed File System (HDFS). However, as HDFS is not POSIX compliant, it cannot be fully leveraged by applications running on a majority of existing HPC environments such as Teragrid and NERSC. These HPC environments typically support globally shared file systems such as NFS and GPFS. On such resourceful HPC infrastructures, the use of Hadoop not only creates compatibility issues, but also affects overall performance due to the added overhead of the HDFS. This paper not only presents a MapReduce implementation directly suitable for HPC environments, but also exposes the design choices for better performance gains in those settings. By leveraging inherent distributed file systems' functions, and abstracting them away from its MapReduce framework, MARIANE (MApReduce Implementation Adapted for HPC Environments) not only allows for the use of the model in an expanding number of HPC environments, but also allows for better performance in such settings. This paper shows the applicability and high performance of the MapReduce paradigm through MARIANE, an implementation designed for clustered and shared-disk file systems and as such not dedicated to a specific MapReduce solution. The paper identifies the components and trade-offs necessary for this model, and quantifies the performance gains exhibited by our approach in distributed environments over Apache Hadoop in a data intensive setting, on the Magellan test bed at the National Energy Research Scientific Computing Center (NERSC).","PeriodicalId":308086,"journal":{"name":"2011 IEEE/ACM 12th International Conference on Grid Computing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116264177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

HisT/PLIER: A Two-Fold Provenance Approach for Grid-Enabled Scientific Workflows Using WS-VLAM HisT/PLIER:使用WS-VLAM实现网格化科学工作流的双重来源方法

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/GRID.2011.39

M. Gerhards, S. Skorupa, V. Sander, A. Belloum, Dmitry Vasunin, A. Benabdelkader

引用次数: 3

Scalable and Distributed Processing of Scientific XML Data 科学XML数据的可扩展和分布式处理

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/Grid.2011.24

Elif Dede, Zacharia Fadika, Chaitali Gupta, M. Govindaraju

{"title":"Scalable and Distributed Processing of Scientific XML Data","authors":"Elif Dede, Zacharia Fadika, Chaitali Gupta, M. Govindaraju","doi":"10.1109/Grid.2011.24","DOIUrl":"https://doi.org/10.1109/Grid.2011.24","url":null,"abstract":"A seamless and intuitive search capability for the vast amount of datasets generated by scientific experiments is critical to ensure effective use of such data by domain specific scientists. Currently, searches on enormous XML datasets is done manually via custom scripts or by using hard-to-customize queries developed by experts in complex and disparate XML query languages. Such approaches however do not provide acceptable performance for large-scale data since they are not based on a scalable distributed solution. Furthermore, it has been shown that databases are not optimized for queries on XML data generated by scientific experiments, as term kinship, range based queries, and constraints such as conjunction and negation need to be taken into account. There exists a critical need for an easy-to-use and scalable framework, specialized for scientific data, that provides natural-language-like syntax along with accurate results. As most existing search tools are designed for exact string matching, which is not adequate for scientific needs, we believe that such a framework will enhance the productivity and quality of scientific research by the data reduction capabilities it can provide. This paper presents how the MapReduce model should be used in XML metadata indexing for scientific datasets, specifically TeraGrid Information Services and the NeXus datasets generated by the Spallation Neutron Source (SNS) scientists. We present an indexing structure that scales well for large-scale MapReduce processing. We present performance results using two MapReduce implementations, Apache Hadoop and LEMO-MR, to emphasize the flexibility and adaptability of our framework in different MapReduce environments.","PeriodicalId":308086,"journal":{"name":"2011 IEEE/ACM 12th International Conference on Grid Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128857934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Performance Evaluation of Overload Control in Multi-cluster Grids 多簇网格过载控制性能评价

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/GRID.2011.30

N. Yigitbasi, Omer Ozan Sonmez, A. Iosup, D. Epema

{"title":"Performance Evaluation of Overload Control in Multi-cluster Grids","authors":"N. Yigitbasi, Omer Ozan Sonmez, A. Iosup, D. Epema","doi":"10.1109/GRID.2011.30","DOIUrl":"https://doi.org/10.1109/GRID.2011.30","url":null,"abstract":"Multi-cluster grids are widely employed to execute workloads consisting of compute- and data-intensive applications in both research and production environments. Such workloads, especially when they are bursty, may stress shared system resources, to the point where overload conditions occur. Overloads can severely degrade the system performance and responsiveness, potentially causing user dissatisfaction and perhaps even revenue loss. However, the characteristics of multi-cluster grids, such as their complexity and heterogeneity, raise numerous nontrivial issues while controlling overload in such systems. In this work we present an extensive performance evaluation of overload control in multi-cluster grids. We adapt a dynamic throttling mechanism that enforces a concurrency limit indicating the maximum number of tasks running concurrently for every application. Using diverse workloads we evaluate several throttling mechanisms including our dynamic mechanism in our DAS-3 multi-cluster grid. Our results show that throttling can be used for effective overload control in multi-cluster grids, and in particular, that our dynamic technique improves the application performance by as much as 50% while also improving the system responsiveness by up to 80%.","PeriodicalId":308086,"journal":{"name":"2011 IEEE/ACM 12th International Conference on Grid Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132476698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Energy-Aware Ant Colony Based Workload Placement in Clouds 基于能量感知蚁群的云工作负载分配

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/Grid.2011.13

Eugen Feller, Louis Rilling, C. Morin

{"title":"Energy-Aware Ant Colony Based Workload Placement in Clouds","authors":"Eugen Feller, Louis Rilling, C. Morin","doi":"10.1109/Grid.2011.13","DOIUrl":"https://doi.org/10.1109/Grid.2011.13","url":null,"abstract":"With increasing numbers of energy hungry data centers energy conservation has now become a major design constraint. One traditional approach to conserve energy in virtualized data centers is to perform workload (i.e., VM) consolidation. Thereby, workload is packed on the least number of physical machines and over-provisioned resources are transitioned into a lower power state. However, most of the workload consolidation approaches applied until now are limited to a single resource (e.g., CPU) and rely on simple greedy algorithms such as First-Fit Decreasing (FFD), which perform resource-dissipative workload placement. Moreover, they are highly centralized and known to be hard to distribute. In this work, we model the workload consolidation problem as an instance of the multi-dimensional bin-packing (MDBP) problem and design a novel, nature-inspired workload consolidation algorithm based on the Ant Colony Optimization (ACO). We evaluate the ACO-based approach by comparing it with one frequently applied greedy algorithm (i.e., FFD). Our simulation results demonstrate that ACO outperforms the evaluated greedy algorithm as it achieves superior energy gains through better server utilization and requires less machines. Moreover, it computes solutions which are nearly optimal. Finally, the autonomous nature of the approach allows it to be implemented in a fully distributed environment.","PeriodicalId":308086,"journal":{"name":"2011 IEEE/ACM 12th International Conference on Grid Computing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122999853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 350

Exploiting Inherent Task-Based Parallelism in Object-Oriented Programming 利用面向对象编程中基于任务的并行性

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/Grid.2011.19

E. Tejedor, F. Lordan, Rosa M. Badia

引用次数: 2

A Fast Location Service for Partial Spatial Replicas 部分空间副本的快速定位服务

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/Grid.2011.32

Yun Tian, P. J. Rhodes

{"title":"A Fast Location Service for Partial Spatial Replicas","authors":"Yun Tian, P. J. Rhodes","doi":"10.1109/Grid.2011.32","DOIUrl":"https://doi.org/10.1109/Grid.2011.32","url":null,"abstract":"This paper describes a design and implementation of a distributed high-performance partial spatial replica location service. Our replica location service identifies the set of partial replicas that intersect with a region of interest, an important component of partial spatial replica selection. We find that using an R-Tree data structure is superior to relying on a relational database alone when handling spatial data queries. We have also added a collection of optimizations that together improve performance. In particular, database Query Aggregation and using a Morton curve during R-tree construction produce significant performance gains. Experimental results show that the proposed partial spatial replica location service scales well for multi-client and distributed large spatial queries, queries that return more than 10,000 replicas. Individual servers with one million pieces of replica metadata in the backend database can support up to 100 clients concurrently when handling large spatial queries. Our previous work solved the same problem using an unmodified Globus Toolkit, but the work described here modifies and extends existing Globus Toolkit code to handle spatial metadata operations.","PeriodicalId":308086,"journal":{"name":"2011 IEEE/ACM 12th International Conference on Grid Computing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127624584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

An Adaptable In-advance and Fairshare Meta-scheduling Architecture to Improve Grid QoS 一种改进网格QoS的自适应超前和公平共享元调度架构

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/Grid.2011.37

Luis Tomás, Per-Olov Östberg, María Blanca Caminero, C. Carrión, E. Elmroth

引用次数: 4

SAT Over BOINC: An Application-Independent Volunteer Grid Project SAT Over BOINC:一个独立于应用程序的志愿者网格项目

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/Grid.2011.40

M. Black, G. Bard

引用次数: 11

Particle Therapy Simulation Framework on GRID Environments 网格环境下的粒子治疗模拟框架

2011 IEEE/ACM 12th International Conference on Grid Computing Pub Date : 2011-09-21 DOI: 10.1109/Grid.2011.38

T. Aso, Ryosuke Noto, G. Iwai, W. Takase, Takashi Sasaki

引用次数: 1