{"title":"DIMVisual: Data Integration Model for Visualization of Parallel Programs Behavior","authors":"L. Schnorr, P. Navaux, B. Stein","doi":"10.1109/CCGRID.2006.34","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.34","url":null,"abstract":"The development of high performance parallel applications for clusters is considered a complex task. This can happen because the influence of the execution environment and the non-deterministic natural behavior of this kind of applications. In such development, the programmer uses application traces and cluster monitoring tools to register the events of the application and the underlying execution environment. Generally, the analysis of the information from each source is made independently, making the correlation of events from the application with events from the execution environment difficult. This paper presents DIMVisual, a Data Integration Model which addresses this problem by integrating information from different sources and providing a unified visualization. An implementation of this model is also presented, using as data sources traces from MPI and DECK applications, events from Ganglia and Performance Co-Pilot cluster monitoring tools and operating system context switches. The results show the information gathered by these data sources integrated and visualized together in the generic visualization tool Paj´e, allowing the programmer a more complete view of his application behavior.","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124696585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Closing cluster attack windows through server redundancy and rotations","authors":"Y. Huang, David Arsenault, A. Sood","doi":"10.1109/CCGRID.2006.126","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.126","url":null,"abstract":"It is well-understood that increasing redundancy in a system generally improves the availability and dependability of the system. In server clusters, one important form of redundancy is spare servers. Cluster security, while universally recognized as an important subject in its own right, has not often been associated with the issue of redundancy. In prior work, we developed a self-cleansing intrusion tolerance (SCIT) architecture that strengthens cluster security through periodic server rotations and self-cleansing. In this work, we consider the servers in the cleansing mode as redundant, spare hardware and develop a unified control algorithm that manages the requirements of both security and service availability. We show the advantages of our algorithm in the following areas: (1) Intrusion tolerance through constant server rotations and cleansing, (2) Survivability in events of server failures, (3) Guarantee of service availability as long as the cluster has a minimum number of functioning servers, and (4) Scalability, the support of using high degrees of hardware/server redundancy to improve security and fault tolerance. We provide proofs for important properties of the proposed algorithm. The effects of varying degrees of server redundancy in reducing attack windows are investigated through simulation","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126152929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Network bandwidth predictor (NBP): a system for online network performance forecasting","authors":"A. Eswaradass, Xian-He Sun, Ming Wu","doi":"10.1109/CCGRID.2006.72","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.72","url":null,"abstract":"The applicability of network-based computing depends on the availability of the underlying network bandwidth. However, network resources are shared and the available network bandwidth varies with time. There is no satisfactory solution available for network performance predictions. In this research, we propose, design, and implement the NBP (network bandwidth predictor) for rapid network performance prediction. NBP is a new system that employs a neural network based approach for network bandwidth forecasting. This system is designed to integrate with most advanced technologies. It employs the NWS (network weather service) monitoring subsystem to measure the network traffic, and provides an improved, more accurate performance prediction than that of NWS, especially with applications with a network usage pattern. The NBP system has been tested on real time data collected by NWS monitoring subsystem and on trace files. Experimental results confirm that NBP has an improved prediction.","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122031767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a secure, tamper-proof grid platform","authors":"Andrew Cooper, Andrew P. Martin","doi":"10.1109/CCGRID.2006.103","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.103","url":null,"abstract":"Security concerns currently deter or prohibit many organisations from leveraging the benefits of the grid. When sensitive data is placed under the control of third-party infrastructure it is difficult to obtain assurances that it will be appropriately protected. We develop a grid platform architecture based on a secure root of trust. This component provides a tamper-resistant environment for grid job execution that resists attack even if the host itself is compromised. We use trusted computing, a security technology currently being integrated into an increasing number of mainstream PCs, for dynamic trust establishment within the grid. These elements are combined to create a novel and practical solution for the grid malicious host problem, ensuring that data integrity and confidentiality are appropriately protected for jobs that span multiple administrative domains.","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124643143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Fletcher, Thomas W. Jackson, M. Jessop, B. Liang, J. Austin
{"title":"The signal data explorer: a high performance grid based signal search tool for use in distributed diagnostic applications","authors":"M. Fletcher, Thomas W. Jackson, M. Jessop, B. Liang, J. Austin","doi":"10.1109/CCGRID.2006.102","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.102","url":null,"abstract":"We describe a high performance grid based signal search tool for distributed diagnostic applications developed in conjunction with Rolls-Royce plc for civil aero engine condition monitoring applications. With the introduction of advanced monitoring technology into engineering systems, healthcare, etc., the associated diagnostic processes are increasingly required to handle and consider vast amounts of data. An exemplar of such a diagnosis process was developed during the DAME project, which built a proof of concept demonstrator to assist in the enhanced diagnosis and prognosis of aero-engine conditions. In particular it has shown the utility of an interactive viewing and high performance distributed search tool (the signal data explorer) in the aeroengine diagnostic process. The viewing and search techniques are equally applicable to other domains. The signal data explorer and search services have been demonstrated on the Worldwide Universities Network to search distributed databases of electrocardiograph data.","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129447254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transparent adaptive library-based checkpointing for master-worker style parallelism","authors":"G. Cooperman, Jason Ansel, Xiaoqin Ma","doi":"10.1109/CCGRID.2006.106","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.106","url":null,"abstract":"We present a transparent, system-level checkpointing solution for master-worker parallelism that automatically adapts, upon restart, to the number of processor nodes available. This is important, since nodes in a cluster fail. It also allows one to adapt to using multiple cluster partitions and multiple resources from the computational grid, as they become available. Checkpointing a master-worker computation has the additional advantage of needing to checkpoint only the master process. This is both fast and more economical of disk space. This has been demonstrated by checkpointing Geant4, a million line C++ program. Our solution has been implemented in the context of TOP-C (task oriented parallel C/C++), a free, open-source parallel package, although it can easily be ported to additional master-worker packages.","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126987387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters","authors":"D. Mogilevsky, G. Koenig, W. Yurcik","doi":"10.1109/CCGRID.2006.125","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.125","url":null,"abstract":"Recently shifts in high-performance computing have increased the use of clusters built around cheap commodity processors. A typical cluster consists of individual nodes, containing one or several processors, connected together with a high-bandwidth, low-latency interconnect. There are many benefits to using clusters for computation, but also some drawbacks, including a tendency to exhibit low Mean Time To Failure (MTTF) due to the sheer number of components involved. Recently, a number of fault-tolerance techniques have been proposed and developed to mitigate the inherent unreliability of clusters. These techniques, however, fail to address the issue of detecting non-obvious faults, particularly Byzantine faults. At present, effectively detecting Byzantine faults is an open problem. We describe the operation of ByzwATCh, a module for run-time detecting Byzantine hardware errors as part of the Charm++ parallel programming framework","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123959834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic co-scheduling of distributed computation and replication","authors":"Huadong Liu, Micah Beck, Jian Huang","doi":"10.1109/CCGRID.2006.36","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.36","url":null,"abstract":"We are interested in developing the infrastructural tools that allow a distributed data intensive computing environment to be shared by a group of collaborating but geographically separated researchers in an interactive manner, as opposed to a batch mode of operation. However, without advanced reservation, it is difficult to assure a certain level of performance on a large number of shared and heterogeneous servers. To achieve scalable parallel speedups in this scenario, we must closely integrate the management of computation and runtime data movement. In this paper, we first define the canonical scheduling problem for datasets distributed with k-way replication in the wide area. We then develop a dynamic co-scheduling algorithm that integrates the scheduling of computation and data movement. Using time-varying visualization as the driving application, we demonstrate that our co-scheduling approach improves not only application performance but also server utilization at a very reasonable cost.","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114727329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Theppaya, Pichaya Tandayya, Chatchai Jantaraprim
{"title":"Integrating the HLA RTI Services with Scilab","authors":"T. Theppaya, Pichaya Tandayya, Chatchai Jantaraprim","doi":"10.1109/CCGRID.2006.148","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.148","url":null,"abstract":"This paper describes the integration of the high level architecture (HLA), an IEEE standard for distributed interactive simulation, with a scientific software package (Scilab) and its use for collaborative simulation development. This work is an on-going work that aims to facilitate the interoperability and reusability for Scilab simulation models. Integrating HLA with the engineering and scientific software package will enable users to apply simulation techniques to larger and more complex interactive and independent simulation models using networked computers, including virtual environments. As the HLA employs the technique of callback functions as a means for communication amongst the run-time infrastructure (RTI), the HLA middleware, and the simulation nodes, the integration was not simple. The paper also gives an example application showing how the HLA services are used to enable Scilab to construct distributed interactive simulations","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115929392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ji Zhu, Ang Guo, Zhonghua Lu, Yongwei Wu, Bin Shen, Xue-bin Chi
{"title":"Analysis of the bioinformatics grid technique applications in China","authors":"Ji Zhu, Ang Guo, Zhonghua Lu, Yongwei Wu, Bin Shen, Xue-bin Chi","doi":"10.1109/CCGRID.2006.122","DOIUrl":"https://doi.org/10.1109/CCGRID.2006.122","url":null,"abstract":"Grid computing has been playing key roles in the area of scientific computing. The two main computing grids in China are CNGrid (China National Grid) and ChinaGrid. This paper introduces these two main grids and their corresponding key technique, the architecture of grid middleware of bioinformatics applications, the bioinformatics service of these two grids, and then compares the two main grids and points out the future works","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134348695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}