Ana Aviles-González, J. Piernas, Pilar González-Férez
{"title":"A Metadata Cluster Based on OSD+ Devices","authors":"Ana Aviles-González, J. Piernas, Pilar González-Férez","doi":"10.1109/SBAC-PAD.2011.12","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.12","url":null,"abstract":"We present the design and implementation of both an enhanced type of OSD device, the OSD+ device, and a metadata cluster based on it. OSD+s support data objects and directory objects. A directory object stores file names and attributes, and supports metadata--related operations. OSD+s profit the directory implementation and features of the underlying file systems used by the storage nodes, achieving a great flexibility, simplicity and small overhead. By using OSD+ devices, we show how a metadata cluster can effectively be managed by all the servers in a system, improving the performance, scalability and availability of the metadata service. The performance of our new metadata cluster has been evaluated and compared with Lustre's. The results show that our proposal obtains a better throughput than Lustre when both use a single metadata server, easily getting improvements of more than 60--80%, and that the performance scales with the number of OSD+s.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116567978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco Heron de Carvalho Junior, Cenez Araújo Rezende
{"title":"Component-Based Refactoring of Parallel Numerical Simulation Programs: A Case Study on Component-Based Parallel Programming","authors":"Francisco Heron de Carvalho Junior, Cenez Araújo Rezende","doi":"10.1109/SBAC-PAD.2011.28","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.28","url":null,"abstract":"Component-based programming has been applied to address the requirements of large scale applications from sciences and engineering with high performance computing (HPC) requirements. However, parallelism has been poorly supported in usual component infrastructures. This paper evidences the efficacy of an HPC platform of parallel components for development and execution of numerical simulation code, mostly found in these applications.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128813501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rapid Development of Error-Free Architectural Simulators Using Dynamic Runtime Testing","authors":"Sasa Tomic, A. Cristal, O. Unsal, M. Valero","doi":"10.1109/SBAC-PAD.2011.23","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.23","url":null,"abstract":"Architectural simulator platforms are particularly complex and error-prone programs that aim to simulate all hardware details of a given target architecture. The development of a stable cycle-accurate architectural simulator can easily take several man-years. Discovering and fixing all visible errors in the simulator often requires significant effort, much higher than for writing the simulator in the first place. In addition, there are no guarantees that all programming errors will be eliminated, no matter how much effort is put into it. This paper presents dynamic runtime testing, a methodology for rapid development and accurate error detection in architectural cycle-accurate simulators. In dynamic runtime testing, the simulator execution is dynamically compared with a simple and functionally equivalent emulator. A possible error is detected if any instruction produces different results in the simulator and the emulator. Dynamic testing can help the developers of architectural simulators to get a reliable and accurate verification of functional correctness. Based on our experience, dynamic testing reduced the simulator modification time from 12-18 person-months to 3-4 person-months, and it only modestly reduced the simulator performance (in our case under 20%).","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122857893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing Properties of Large Scalable and Fault-Tolerant Logical Networks","authors":"C. Cérin, Yu Lei, Michel Koskas","doi":"10.1109/SBAC-PAD.2011.22","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.22","url":null,"abstract":"As the number of processors embedded in high performance computing platforms becomes higher and higher, it is vital to force the developers to enhance the scalability of their codes in order to exploit all the resources of the platforms. This often requires new algorithms, techniques and methods for code development that add to the application code new properties: the presence of faults is no more an occasional event but a challenge. Scalability and Fault-Tolerance issues are also present in hidden part of any platform: the overlay network that is necessary to build for controlling the application or in the runtime system support for messaging which is also required to be scalable and fault tolerant. In this paper, we focus on the computational challenges to experiment with large scale (many millions of nodes) logical topologies. We compute Fault-Tolerant properties of different variants of Binomial Graphs (BMG) that are generated at random. For instance, we exhibit interesting properties regarding the number of links regarding some desired Fault-Tolerant properties and we compare different metrics with the Binomial Graph structure as the reference structure. A software tool has been developed for this study and we show experimental results with topologies containing 21000 nodes. We also explain the computational challenge when we deal with such large scale topologies and we introduce various probabilistic algorithms to solve the problems of computing the conventional metrics.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"260 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132904436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel P. Silva, Juliana C. Correa, C. Bentes, Sergio Guedes, Mariela Gabioux
{"title":"The Experience in Designing and Building the High Performance Cluster Netuno","authors":"Gabriel P. Silva, Juliana C. Correa, C. Bentes, Sergio Guedes, Mariela Gabioux","doi":"10.1109/SBAC-PAD.2011.11","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.11","url":null,"abstract":"This paper presents a description and the evaluation of the Netuno supercomputer, a high-performance cluster installed at Federal University of Rio de Janeiro in Brazil. The results for the High Performance Linpack (HPL) benchmark and two real applications are reported. Since building a high-performance cluster for running a wide range of applications is a non-trivial task, some lessons learned from assembling and operating this cluster, such as the excelent performance of the OpenMPI library, and the importance of the use an efficient parallel file system over the traditional NFS system, can be useful knowledge to support the design of new systems. Currently, Netuno is being heavily used to run large scale simulations in the areas of ocean modeling, meteorology, engineering, physics, and geophysics.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116905999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}