M. Tatineni, J. Greenberg, R. Wagner, Eva Hocks, Christopher Irving
{"title":"Hadoop deployment and performance on Gordon data intensive supercomputer","authors":"M. Tatineni, J. Greenberg, R. Wagner, Eva Hocks, Christopher Irving","doi":"10.1145/2484762.2484831","DOIUrl":null,"url":null,"abstract":"The Hadoop framework is extensively used for scalable distributed processing of large datasets. This extended abstract provides information on the optimization of the Hadoop deployment on the Gordon data intensive supercomputer, at the San Diego Supercomputer Center (SDSC) at the University of California San Diego, using the myHadoop software. The details of the system configuration, the storage and network options (1 Gig-E, IPOIB, and UDA), tuning options considered, results using the TestDFSIO, TeraSort benchmarks, and bulk copy tests with distcp are presented in this extended abstract.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"1970 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484762.2484831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The Hadoop framework is extensively used for scalable distributed processing of large datasets. This extended abstract provides information on the optimization of the Hadoop deployment on the Gordon data intensive supercomputer, at the San Diego Supercomputer Center (SDSC) at the University of California San Diego, using the myHadoop software. The details of the system configuration, the storage and network options (1 Gig-E, IPOIB, and UDA), tuning options considered, results using the TestDFSIO, TeraSort benchmarks, and bulk copy tests with distcp are presented in this extended abstract.